Bulk load parameters tuning

You can customize the behavior of the Bulk Loader by specifying particular parameters at run time or configuring the bulkload.properties file.

There are three distinct phases for loading data using the Bulk Loader:
  1. Analyze the objects and relationships to determine the graphs in the data.

    Typically, 1 - 5% of execution time

  2. Construct model objects and build graphs.

    Typically, 2 - 5% of execution time

  3. Pass the data to the application programming interface (API) server.

    Typically, 90 - 99% of execution time

There are two options for loading data:
  • Data can be loaded one record at a time. This is the default mode. You must load records one at a time for the following files:
    • Files with errors.
    • Files with extended attributes.
  • Data can be loaded in bulk. This is called graph writing because a entire graph is loaded, rather than just one record.
    Bulk loading with the graph write option is faster than loading records one at a time. (Reference the Bulk Load measurements for details). The following example shows the graph write option, where -g=buffer and blocks of data are passed to the API server:
    ./loadidml.sh –g –f /home/confignia/testfiles/sample.xml
    The following parameters in the bulkload.properties can be used to improve performance when loading data in bulk:
    com.ibm.cdb.bulk.cachesize=2000
    The cachesize parameter controls the number of objects processed in a single write operation when bulk loading with the graph write option. Increasing the cache size value improves performance at the risk of running out of memory either on the client or at the server. Alter the number only when specific information is available to indicate that processing a file with a larger cache provides some benefit in performance. The default cache size value is 2000, and the maximum cache size value is 40000.
    com.ibm.cdb.bulk.allocpoolsize=1024
    This value specifies the maximum amount of memory that can be allocated to the Bulk Loader process. It is an Xmx value that is passed to the main Java™ class of the Bulk Loader. Specify the value in megabytes.

    Make sure that a Java virtual machine is not running out of memory. You can do that by collecting thread dumps of TADDM processes and reviewing them. If necessary, increase the memory size.

    Tip: Tests that were run on the ITNMIP book indicate that the performance is optimal when you set the bulk load process properties and parameters to the following values:
    com.ibm.cdb.bulk.cachesize=4000
    com.ibm.cdb.bulk.allocpoolsize=4096   
    value-Xms768M|-Xmx1512M|-DTaddm.xmx64=6g| 
    It is also important that you run the RUNSTATS command frequently during the bulk loading process.