This article describes how to transfer the data of different character sets, primarily Unicode, from a text file to an IBM Lotus Notes database, using the IBM Lotus Enterprise Integrator for Domino (LEI) Text Connector.
It is assumed you are familiar with previous versions of LEI. If not, refer to the Lotus Enterprise Integrator product page
for more information.
Overview of the LEI Text Connector
LEI is primarily used to integrate the transfer of data across different databases, for example, transferring data from DB2 to Lotus Notes or from Lotus Notes to DB2. In addition to relational databases, LEI is also used to integrate Notes databases with text files; for example, transferring the data from a text file to a Notes database, and from the Notes database to the text file.
The Text Connector can be used with both LEI and the Lotus Connectors LotusScript Extensions (LCLSX). The Lotus Text Connector allows data transfer between text files and Lotus Connectors, in which you define formats for the input (source) and output (destination) data in a ZID file.
The ZID file is referenced by file name from the Text Connection form, or you can specify the actual ZID file contents entirely within the Text Connection form. You can also select various options for processing the source and destination data, and you can select internationalization of days of the week and months of the year.
The LEI Text Connector reads data mainly in two different formats, fixed-field format or floating-field format:
Fixed-field format lets you manually specify the start and end character position for each field in the text file, as well as letting you specify the width for each field. Here’s an example of how fixed field format can be used:
Suppose you need to transfer data from a text file to a Notes database in two different fields, Empno and Empname. If the maximum length of data in both these fields are known, then you can specify the following in the ZID file:
EmpNo type Integer Start 01 End 10 OR EmpNo Integer Start 01 Width 10
Ename type Text Start 11 End 20 OR Ename Text Start 11 Width 10,
Both the above specifications will read the first 10 characters from the text file and transfer it into the EMPNO field in the Notes database, and will consider the next 10 characters from the text file to be transferred to the EMPNAME field of the Notes database.
To transfer the data correctly (without getting truncated), you must use the fixed-length characters for each field when using this specification.
Floating- or variable-field positions format.
You can use floating or variable-field positions format when you’re not sure about the length of characters to be transferred in each field, that is, when the data to be transferred from the text file does not have a fixed length for each field.
If you want to use this format, you must use the field delimiter between each set of characters and mention the same in the ZID file. LEI Text Connector will read and recognize this field delimiter and LEI will place those sets of characters in different fields as specified in the ZID file.
Here’s an example:
Suppose you want to transfer the following data from the text file to two fields in a Notes database:
00481920; bitte nicht;
As shown above, field 1 of the first row has 10 characters, the same field for the second row has 8 characters, and both fields are separated by a semicolon (“;”). If we use the following specification, then all the characters will be transferred correctly in their respective field at the destination Notes database.
EMPNO TYPE text Until “;”
Empname Type text Until “;”
By mentioning the above specification, LEI will read the single field until it comes across the
Working with Text Connector for Unicode
To transfer Unicode characters successfully, you must verify that the following operating system settings are in place (we refer to the Windows operating system in the entire article):
1. The complex scripts pack must be installed on the machine from which data is to be transferred, that is, the machine where the text file with Unicode characters resides. To verify the complex scripts pack:
a. Open the Control Panel and open Regional and Language Options.
b. Click the Languages tab (see figure 1) and verify that both the options listed under the Supplemental language support section are checked.
c. If either or both of them is unchecked, you must install them, using the Windows operating system disk.
Figure 1. Supplemental language support options
2. If the complex scripts as well as East Asian Languages packs are installed, open the Text file on that machine and verify all the characters in the text file that are to be transferred are displayed as expected, and that there is no data loss in viewing the characters.
3. Save the text file as UTF-8 (see figure 2). Note that, if the above-mentioned packs are not installed, you might not see some or all of the Unicode characters correctly.
The figure displays the sample records of different customers’ data containing Unicode characters, in Chinese, French, Russian, Japanese, English, and Korean languages. The text file containing these characters is in the References section.
Figure 2. Sample data records containing Unicode characters
Direct Transfer activity using Text Connector
The first step in creating the Direct Transfer activity is to create the source and the destination Connection documents. Since we are planning to transfer data from a Text file to the Notes database, the Text Connection document will act as a source Connection document, and the Notes Connection document will be the target.
1. Start the LEI task on the Domino Server Console and open the LEI Administrator Document, located in the Domino Data directory.
2. Open the Text Connection document (see figure 3), by selecting the Add Connection drop-down in the LEI Administrator and follow these steps:
a. In the Name field, provide the name of the Text Connection document.
b. In the File Name field, enter the full path of the source file name (the one with Unicode and/or non-Unicode characters; for example, C:\cust_data.txt).
c. In the “Specify field options or ZID file name…” field, enter the full path of the ZID file in the format @; for example, @c:\uni.zid.Details for creating the ZID file are in Step 2 below).
d. In the File Text Character Set field, enter UTF8. If you leave this field blank, LEI will default to the ASCII mode and transfer in the data, assuming ASCII format at the source end (text connector end).
Figure 3. Typical Text Connection document
3. The ZID file describes the columnar format of the data, which includes the column name, data type of the data to be transferred, the length of each data column, as well as the character set of the data. The character set option may or may not be included, depending on the way in which the ZID file is created. For more information on creating ZID files, refer to Chapter 11, Lotus Connector for Text, in the Lotus Connectors and Connectivity Guide.
4. Since we have used COLON (‘:’) as a delimiter, the ZID file will look like that shown in figure 4. In this file, Cust_Name, Cust_Address etc… determine the field name used in the Notes form and represents the order in which the data is presented in the Text file.
In our example the order in the text file will be:
CUST_NAME : CUST_ADDRESS : COUNTRY : CUST_NUMBER :
NOTE: For more details on the settings of the ZID file, refer to Chapter 11 of the Lotus Connectors and Connectivity Guide.
Figure 4. ZID file example
5. After creating the Text Connection document on the LEI Administrator, create a Notes Connection Document (see figure 5) from the "Add Connection" drop-down list of LEI Administrator and enter the following informaiton:
Name: You can provide any name for this document
Domino Server: Provide the Domino server name on which the LEI is running as a task.
Notes Database: Select the appropriate Notes database that is expected to store the UTF8 characters.
Note that the field in the Notes database that will store the Unicode characters may or may not be a Rich Text field.
Figure 5. Notes Connection document
6. Now you can create the Direct Transfer Activity document (see figure 6) from the LEI Administrator’s Add Activity drop-down list and provide an appropriate name to that document in the Name field.
7. Click the drop-down button in the Source section and select the Source Connection. This displays a list of the Connection Documents existing in the LEI Administrator, from which you select the Text Connection document as the source.
8. Follow same procedure for selecting the Target Connection document, and from the list select the Notes Connection document you just created. Enable the Automatic check box to ensure that field mapping will be automatic. In the Select Statement field provide a select @all statement to select all the rows from the text file.
Figure 6. Direct Transfer activity
This section discusses the limitations of the LEI Text Connector when transferring the UTF8 characters. As mentioned in the Overview section, the data from the text file can be transferred or read in two different ways, that is, fixed-length field position and variable-length field position. However, when the Text Connector is used to transfer the UTF8 characters, LEI supports only the option of transferring characters as variable-length field positions.
This is the limitation of LEI: you will not be able to transfer the UTF8 characters correctly. If you use the option of fixed-length field position, the data in all the fields may get distorted and, in some cases, the data gets truncated. This behavior is mainly due to two reasons, as follows:
- The first reason for the UTF8 characters getting disordered is due to byte-order-mark (BOM) characters. A byte-order mark is a “zero-width, no-break space” character containing the following set of characters: “ï»¿”. These characters are appended at the beginning of the text file, for example, as the first three characters of first line.
LEI reads the text file byte-by-byte, and hence these three characters are appended in the first field of the first row. The characters are not seen in the destination database, but they occupy the space of three characters, and so the actual last three characters are truncated. For example:
Suppose you want to transfer “LOTUSINDIA” from a text file to a Notes database in two different fields, specifying the first five characters as part of first field of Notes and the last five as part of the second field of Notes. As a result of the above limitation, they will be seen as follows at the Notes end:
LO TUSIN DIA
As we can see, LO will be a part of first field, and TUSIN will be a part of the second field at the Notes end; however, DIA (last three underlined characters) will be truncated and will not be a part of the destination database.
- The second reason for the data truncation is due to multibyte characters like Ä or Ü that occupy more than one byte. As explained above, since LEI reads the data byte-by-byte, the multibyte characters are also read byte-wise, and the length of these characters is also calculated byte-wise. This leads to the incorrect number of characters being picked up by LEI, as demonstrated in this example:
Suppose you need to transfer the characters LŐTUS INDIA from a text file to a Notes database. The Ő will occupy two bytes, and you want your first five characters to be a part of the first field, and the next five characters to be part of the second field.
However, LEI will read Ő as two bytes and hence two characters, leading to character distortion and the characters being transferred to the Notes database as follows:
LŐTU SINDI A
As you can see, only the first four characters, LŐTU,
are transferred to the first field of the Notes database because of the two-byte character present in this field. The second field contains the next five characters, SINDI
, because there is no special character, and hence the last character is truncated and will not be part of the Notes database.
This article has outlined the steps for transferring data of different character sets—primarily Unicode—from a text file to a Notes database, using the LEI Text Connector. It has also explained the limitations involved in this process.
Appendix A: Byte Order Mark
The following is excerpted from the Microsoft Developer Network site
Always prefix a Unicode plain text file with a byte order mark, which informs an application receiving the file that the file is byte-ordered. Because Unicode plain text is a sequence of 16-bit code values, it is sensitive to the byte ordering used when the text is written.
A byte order mark is not a control character that selects the byte order of the text.
The following table lists the available byte order marks.
Byte order mark
EF BB BF
UTF-16, little endian
UTF-16, big endian
FF FE 00 00
UTF-32, little endian
00 00 FE FF
Appendix B: ZID File
The following is excerpted from the Lotus Connectors and Connectivity Guide
Zid file is the field attributes specifications file. This file contains all the information on the following:
a. Total number of fields in the destination database
b. The datatype of each of the fields
c. How many number of characters will be a part of each field
d. Where condition similar to the one used in SQL.
Example of the Zid file:
Suppose the user needs to transfer the data to 7 different fields in the notes database and the field delimiter is a single new-line character (\n). The following example depicts a sample ZID field definition for output:
(Idn): TYPE TEXT VALUE "IDN:"
Idno: TYPE TEXT START 5 UNTIL "\nNm:"
Name: TYPE TEXT UNTIL "\nTel:"
Tel: TYPE TEXT UNTIL "\nBal:"
Balance: TYPE NUMBER.2 UNTIL "\nPmt:"
Payment: TYPE NUMBER.2 UNTIL "\nDate:"
Date: TYPE DATE FORMAT MM.DD.yy
As shown in the first four lines of the example, when you output document data you use the record delimiter to define a character to mark the end of each record. In this example, the escape sequence is a paragraph symbol indicated by the hexadecimal notation \0x14.
The UNTIL specifications are multiple character strings with a special escape sequence (\n). These allow the definition of the line feed characters that delimit the input field values and, ultimately, create our multiple line output file.
Appendix C: Sample Data
This appendix contains the Sample Customer Data text file used in the Customer data example used in this article.
The text file contains the data in the following format, with each field is separated by a colon (‘:’)
Customer Name : Customer Address : Country : Customer Number : Customer Phone:
· For more information about LEI, including demos, customer stories, and white papers, refer to the Lotus Enterprise Integrator for Domino product page.
· The LEI documentation is a good learning resource for detailed technical information.
· Get involved in the developerWorks community by participating in developerWorks blogs.
About the author
Hardik Brahmbhatt is a systems software developer working on the Lotus Enterprise Integration team of IBM Software Group in Pune, India. He joined IBM in 2007, since which time he has focused on developing Lotus Enterprise Integration products.