Release Notes
Abstract
The documentation is updated to support the Multi-Value and Choice List options that were added to the Content Engine Bulk Import Tool in the 5.2.0.4 release.
Content
The following steps how to use the new features that were added to support the Multi-Value and Choice List options.
Preparing to use Bulk Import Tool
- Open the CEBI.cfg file and enter the following log in information in the LogonAttribute section:
UserName
The user name for logging in to Content Platform Engine.
Password
The password for logging in to Content Platform Engine.
CeUri
The Content Platform Engine URI. Example: http://server12:9080/wsi/FNCEWS40MT
ObjStore
The name of the object store that contains the document classes and documents.
For example: bkmOS8nomove
To specify multiple object stores, create a CEBI.cfg file for each object store. Start Bulk Import tool with the -h<file list of home directories> flag. Each home directory that you specify requires a CEBI.cfg that uses a specific object store.
- Enter the document class and index information in the DocClassAttribute section. You can enter multiple DocClassAttributes sections. Each DocClassAttributes section must describe only one document class. However, you can enter duplicate DocClassAttributes sections for the same class if the sections have different class codes. For example, you can add documents that have specific sets of properties that are configured for each DocClassAttributes section to the same document class.
To generate files that contain your document class information for the specified object store, enter the following command:
java -cpclasspath;BulkImport.jar bulkImport.BI_Start -h /home path -G
Enter your class path information and the Bulk Import tool directory for home path. The files DocClassAttributes.txt and DocClassAttributes.xml are created in the home directory that you specified in the command. Use the information in either of these files to update your CEBI.cfg file to specify the following parameters:
ClassName
The symbolic name of the class in the object store where you want to import your documents.
ClassCode
The class code. The class code can be a number 1 - 999. This number is generated by the -G option in the Bulk Import Tool. It does not come from the Content Platform Engine server. You can manually set this code to any valid number that is unique in the class codes that are listed in the CEBI.cfg file.
IndexName
The symbolic name of the Content Platform Engine class property that is associated with that Content Platform Engine class. You can add as many IndexName parameter entries as you need. However, you must include the properties that are set as required on Content Platform Engine. These required properties must have a valid entry in the transact.dat file for the batch to process correctly.
Multi-valued properties must be contained in braces and separated by the multi-value delimiter. For example, {multi_value1|multi_value2|multi_value3}
Property fields that are of the type Date must provide a valid date mask that follows the property name.
The date mask must follow this format:
CEBI.cfg format example: IndexName=CEBIMVdatetime:"MM/dd/yyy'T'HH:mm:ss.SSS'z"
transact.dat format example that uses the same date/time formatting for multi-valued properties with two values:02;{03/07/2012T13:28:49.123z|09/17/2011T02:01:00:888z},CEBIMVDateTimeFile;;test.txt
Restriction: If milliseconds are specified, they are rounded to the nearest second when the document is imported to Content Platform Engine.
You can modify the date format, for example yyyy/MM/dd or dd/MM/yyyy. But your format must match the date format that is used in the .dat file. 'T' must be uppercase and 'z' can be either case if it matches the format in the .dat file.
The order of the IndexName entries is the expected order of the values that are found in the transact.dat file.
You can use multi-valued properties and choice lists.
- Enter the following batch information:
MaxSubBatchSize
The maximum size of a sub-batch in 1 K units. The MaxSubBatchSize setting controls the size of network packets. The default is 1024 bytes.
MaxDocPerSubBatch
The maximum number of documents of a sub-batch. The default is 999.
WorkingDirectory
The working directory where Bulk Import tool searches for the batches of documents that you want to import. The directory can be on a local drive or a shared network drive. This setting is required.
JournalDirectory
The directory where the Bulk Import tool log files are written. If not specified, the journals directory is created in the WorkingDirectory.
FieldDelimiter
Sets the major delimiter in the transact.dat file. The default is colon. Special characters can be used: ASCII 0x01 to 0xFF, except for 0x20 (space). To be able to use the extended ASCII codes from 0x80 to 0xFF, the CEBI.cfg and transact.dat files must be created with ANSI encoding.
ItemDelimiter
Sets the minor delimiter in the transact.dat file. The default is comma. Special characters can be used: ASCII 0x01 to 0xFF, except for 0x20 (space). To use the extended ASCII code from 0x80 to 0xFF, you must create the CEBI.cfg and transact.dat file files with ANSI encoding.
MultiDelimiter
Sets the Multi_Value delimiter in the transact.dat file to separate multi-value property lists. The default is a vertical bar (|). Special characters can be used: ASCII 0x01to 0xFF except for 0x20 (space). To use the extended ASCII code from 0x80 to 0xFF, you must create the CEBI.cfg and transact.dat files with ANSI coding.
SleepInterval
The total number of seconds that Bulk Import tool waits from the initial search for available batchname.eob files to the next search for available batchname.eob files. This interval is not affected by how long it takes Bulk Import tool to process its current work unless the current work exceeds the sleep interval time setting. If the work exceeds the sleep interval, then another sleep interval is appended to the end of the last sleep interval before Bulk Import tool does another search for an available batchname.eob file. The default and suggested setting is 60 seconds. The maximum setting is 3600 seconds, 1 hour.
DelayProcess
Creates a delay in processing between the physical time stamp on a batchname.eob and when Bulk Import tool recognizes the batchname.eob file for processing. Setting this keyword to 300 causes a 5-minute delay between the time that the batchname.eob file is created and when Bulk Import tool reads the file for processing. This setting is often referred to as the "batchname.eob age". Set this parameter to zero seconds. The maximum setting is 3600 seconds, 1 hour. The default is 30 seconds.
OSNice
Sets Bulk Import tool to “sleep” mode between batch processing. Using this setting slows down Bulk Import tool and allows other processes to get processing time. This parameter is useful when you run Bulk Import tool in normal production time. The default is zero seconds and the maximum is 21600 seconds, 6 hours.
Timing
Log output that is used in performance analysis. The default value is FALSE. A setting of TRUE provides extra logging details. You can use the timer.awk program with the added log information for performance analysis.
UNIX systems and Windows systems with UNIX tools for DOS loaded: Some newer UNIX systems might require the use of nawk instead of awk. Example:
awk –f timer.awk journals/imp20020328
nawk –f timer.awk journals/imp2004012
Solaris 5.8 requires the use of nawk, as awk gives poor results. You can also use the nawk program instead of awk on other UNIX operating systems when the programs are present.
ExternalPassDirectory
The directory where a copy of the batchname.pass file is written. This directory is usually an externally mounted directory that is visible to third-party programs.
ExternalErrorDirectory
The directory where a copy of the batchname.err file is written. This directory is usually an externally mounted directory that is visible to third-party programs.
ExternalRptDirectory
The directory where a copy of the batchname.rpt file is written. This directory is usually an externally mounted directory that is visible to third-party programs.
PassCopyDirectory
The directory where a copy of the batchname.pass file is written is usually a local directory.
Create and modify the transact.dat file
The syntax for the transact.dat file is as follows:
class_code:document_properties:external_index:files|+file_name
where:
class_code
Is one of the codes that is specified in the CEBI.cfg file for a document class that is involved in the import.
document_properties
Includes values for each of the properties that are specified in the IndexNames parameter in the CEBI.cfg file. The values must be listed in the same order that they are listed in the DocClassAttribute parameter in the CEBI.cfg file. If a property is set to Required on the Content Platform Engine server, the Bulk Import Tool fails a batch if that property is not set in the transact.dat file. Spaces are considered valid characters, so a required property that contains only spaces can be processed by Bulk Import Tool. Additionally, if there are other characters in the field, the leading and trailing spaces are not removed before they are processed by Bulk Import Tool. Setting a space or non-integer in a property field of an integer returns the error message Value = Bad number conversion and causes the batch to fail.
Multi-valued entries with examples for the transact.dat file that use | as a delimiter:
Integer (integer): 19:{3|1088|45},DocNameMVint::test.txt
String (string): 03:{This|is|test},DocNameMVstr::test.txt
Binary (Base64): 10:{9876|54321},DocNameMVbin::test.txt
Date/Time (date or date and time): 02:{03/07/2012T13:28:49.123z|09/17/2011T02:01:00.888z},CEBIMVDateTimeFile::test.txt
Float64 (exponential) - 06:{12345678901234567890|12345678901234567890|3.21E5},CEBIMVFloat64::test.txt
ID (GUID): 07:{562F02F1-E172-41AB-B041-E02A6C0DA9EC|562F02F1-E172-41AB-B041-E02A6C0DA9ED},CEBIMVid::test.txt
Boolean (zero/one):05:{no|yes|true},CEBIMVboo::test.txt
Choice list entries in the transact.dat file where Green, Blue, and Red are valid options from the choice list.
Single-valued entry: 17:Green,CEBIStr::test.text
Multi-valued entry: 17:{Green|Blue|Red},CEBIStr::test.text
Important: To process choice lists entries by using the Bulk Import Tool, all of the entries in the choice list must be the specified data type. For example, if the data type is String, all of the entries in the choice list must be strings. The entries cannot be a combination of string, integer, or any other data type.
[{"Product":{"code":"SSNVNV","label":"FileNet Content Manager"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Content Platform Engine","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"5.2.1;5.2.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSGLW6","label":"IBM Content Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Content Platform Engine","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg27046399