IBM Support

How to configure and run the Content Engine Bulk Import tool

Technical Blog Post


Abstract

How to configure and run the Content Engine Bulk Import tool

Body

The IBM Content Engine Bulk Import (CEBI) Tool is a command-line tool that can be used to import large volumes of documents into a Content Platform Engine object store.  The FileNet P8 Platform knowledge center (http://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.0/com.ibm.p8.c…) has detailed instructions on using the CEBI tool.  The purpose of this is to present a walkthrough of using the CEBI tool.

 

Installation

1. Using the CE Server installer, select Tools. This puts down all the tools including Enterprise Manager, Configuration Manager and CE Bulk Import.

2. Install the Java API client (or just copy over the Jace.jar, stax-api.jar, xlxpScanner.jar, xlxpScannerUtils.jar and log4j.jar files)

The CEBI tool is installed into C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI (or /opt/IBM/FileNet/ContentEngine/tools/CEBI for UNIX) in Windows.
 

Configuration

1. Copy the CEBI_cfg.sample file in the installation directory to create a new file called CEBI.cfg

2. Edit the LOGON ATTRIBUTE INFORMATION section at the top of the file to provide username, password, CE_URI and object store information

3. Generate a file that contains your document class information, by running the tool with the -G option

For example (on one line):

 

java -cp "C:\P8CPEJava52\Client\lib\Jace.jar;C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI\BulkImport.jar" bulkImport.BI_Start

-h "C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI" -G

4. Use the information in the DocClassAttributes.txt file to update the DOCUMENT CLASS AND INDEXING INFORMATION section of the CEBI.cfg.

You can enter multiple DocClassAttributes sections. Each DocClassAttributes section must describe only one document class.

For each document class which will be involved in the import, search the DocClassAttributes.txt for the class name and identify the ClassCode and IndexName values to include in the CEBI.cfg. The ClassCode is required. Only include the IndexName entries for which you will specify values during import.

For example:

 

DocClassAttribute {

        ClassName=test_CEBI

        ClassCode=23

        IndexName=test_string

        IndexName=test_integer

}

Note: multiple value properties, binary properties and choice lists are not supported

5. In the BATCH INFORMATON section specify a value for WorkingDirectory

For example:

 

        WorkingDirectory=data

Other values in the BATCH INFORMATON section are optional and are documented in the Knowledge Center.
 

Setting Up the Batch

1. In the WorkingDirectory, create a <batchname>.eob file with the following syntax:

<path> <# docs> <# objects> [<transact.dat name> optional]

where:
<path> = location of the transact.dat file
<# docs> = number of documents to be imported
<# objects> = total number of content elements to be imported
<transact.dat name> = optionally specifies a different name for the transact.dat file

For example:
In the data directory, the batch1.eob file contains the following line:
batch1_data 2 4

which means:

  • the transact.dat file is located in C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI\data\batch1_data
  • there will be 2 documents imported
  • altogether there will be 4 files imported (for instance, if there were one content element for the first document and three for the second document, or if both documents have two content elements, then the total number of content elements is four)
  • the default file name will be used for the transact.dat file

2. In the directory specified in the eob file, create a transact.dat file with the following syntax:

<class code> : <document properties> : <external index> : <files>

where:
<class code> = one of the codes specified in the CEBI.cfg file for a document class involved in the import
<document properties> = values for each of the properties specified as IndexName's in the CEBI.cfg. They must be listed in the same order as in the DocClassAttribute in CEBI.cfg
<external index> = value which will be recorded in the report when the batch is completed but is not imported as part of the document. This can be left blank, but make sure you keep the right number of delimiters for the four major fields.
<files> = name(s) of the content element file(s) associated with the document. Alternatively, this can be the name of the file which contains a list of the content element files

For example, the transact.dat file in the batch1_data directory may contain the following:
23:Bulk import test 1,42:CEBI 1:test1.gif
23:Bulk import test 2,24:CEBI 2:+filelist.txt

And filelist.txt contains:
test2.gif
test3.gif
test4.gif

In this example the content element files are in the same directory as transact.dat, but you could specify a different path with the filename.

The second line of the transact.dat file above could also be written as:
23:Bulk import test 2,24:CEBI 2:test2.gif,test3.gif,test4.gif

Notes:
i) you can use a different name for the transact.dat file if you specified one in the eob file.
ii) the delimiters used in the transact.dat file are configurable in CEBI.cfg. The major fields by default are delimited using a colon, and the items in the document properties and files fields are separated by a comma

From the CEBI.cfg file:

; FieldDelimiter=":"
; ItemDelimiter=","

 

Running the Batch

To start the bulk import tool:

java -cp "C:\P8CPEJava52\Client\lib\Jace.jar;C:\P8CPEJava52\Client\lib\log4j.jar;C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI\BulkImport.jar" bulkImport.BI_Start

Test mode (validates but does not import):

java -cp "C:\P8CPEJava52\Client\lib\Jace.jar;C:\P8CPEJava52\Client\lib\log4j.jar;C:\Program Files\IBM\FileNet\ContentEngine\tools\CEBI\BulkImport.jar" bulkImport.BI_Start -T

Tip: if you are performing tests and may want to rerun the same batch again, make a copy of the <batchname>.eob file before running the import, since that file will be removed as part of the import process when the batch executes.

 

Stopping the Batch

The batch will continue to run in a loop waiting for more work. To stop the batch, create a .CEBI.stop file.

On Windows this must be done from the command line:

echo > .CEBI.stop

 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSNVNV","label":"FileNet Content Manager"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11280356