idsbulkload, bulkload

Use the bulkload command to load directory data from an LDIF file to a directory server instance.

Description

The idsbulkload command loads the directory data from an LDIF file to a directory server instance. This command is faster than idsldif2db to load data in LDIF format, and is available for bulk-loading large amounts of data.

Attention: To import LDIF data from another instance, you must cryptographically synchronize with the instance that is importing the LDIF file. Otherwise, any AES encrypted values in the LDIF file do not get imported. For information about synchronizing directory server instances, see Synchronizing two-way cryptography between server instances.

You must consider the following points before you use the bulkload command.

Note:
  • Stop the directory server instance before you run the server import utilities.
  • Ensure that no applications are attached to the database associated with the directory server instance. If there are applications that are attached, the server import utilities might not run.
  • Environment variables that are associated with idsbulkload are no longer supported in IBM® Security Directory Server, version 6.0 and later. The ACLCHECK, ACTION, LDAPIMPORT, SCHEMACHECK, and STRING_DELIMITER environment variables are replaced with the -A, -a, -L, -S, -s command-line parameters. The command-line switches are case-sensitive.

    The ACL processing enhancement in idsbulkload with the -A parameter is deprecated in IBM Security Directory Server, version 6.0 and later. The following parameters are also deprecated.

    • -c
    • -C
    • -e
  • You must run the idsbulkload command with dbadm or sysadm privilege. On a Windows system, you must run the idsbulkload command within the DB2® command-line interpreter (CLI). To start the DB2 CLI, click Start > Run, type db2cmd, and click OK.
  • If archival logging is set in DB2, the idsbulkload command might fail. Make sure that the archival log is disabled before you run the idsbulkload command. To disable archival logging, run the following command.
    update database configuration for ldapdb2 using LOGRETAIN OFF USEREXIT OFF
  • When you load the data that contains unique attributes, the DB2 unique constraints for the modified attributes are dropped. After you load the data, the DB2 unique constraints are established for the following attributes:
    • Attributes with unique constraints dropped.
    • Unique attributes that are listed in the unique attribute entry in the file.
    If duplicate values are loaded for attributes that are specified as unique attributes, the DB2 unique constraint is not created for that attribute. This log is recorded in the idsbulkload.log file.
  • If you are loading data to an instance already containing data, make sure that you take a backup before you run idsbulkload to add entries.
  • By default, the action of bulkoad is unrecoverable. If data loading fails for any reason, all data in the database is lost. Therefore, it is better to take a backup before and after a large bulkload activity.

Synopsis

idsbulkload | bulkload -i ldiffile [-I instancename
            [-a <parse_and_load|parseonly|loadonly>] [-A <yes|no>] 
            [-b] [-c | -C <yes|no>] [-d <debuglevel>] [-e drop_index]
            [-E <number>] [-f configfile] [-g] [-G] [-k <number>] 
            [-L <path>] [-n | -N] [-o <filename>] 
            [-s <character>] [-R <yes|no>] [-S <yes|no|only>]
            [-t <filename>] [-v] 
            [-W outputfile] [-x|-X <yes|no>]] | [-?]

Options

The idsbulkload command takes the following parameters.
-a <parse_and_load|parseonly|loadonly>
Specifies the load action mode.
-A <yes|no>
Specifies whether to process the ACL information that is contained in the LDIF file. The default is yes. The no parameter loads the default ACL.
Note: This parameter is deprecated.
-b
Specifies to suppress the progress indicator.
-c | -C <yes|no>
Skips index recreation.
If you are running successive bulkload operations and you want to skip index recreation between loads, you can postpone index creation until the last bulkload. Issue the last idsbulkload command with -c yes.
-d debuglevel
Specifies the debuglevel to assign and to set the debug mode. Use this parameter to determine the data records that might have a problem and is causing parsing errors. For more information about debug levels, see Debugging levels.
Note: Ensure that the ldtrc command is run before you use the -d parameter with the command. Otherwise, no messages are shown. To run tracing, issue the ldtrc on command.
-e drop_index
Specifies whether to drop indexes before load.
-E number

Specifies a number limit for parsing the errors reported. When the limit is reached, the idsbulkload command exits.

By default when -E is not provided idsbulkload exits on first error. If “0” is specified as a value, then the number of errors to ignore is infinity.

Note: When -E option is provided then -S option with value 'yes' or 'only' is mandatory.

-f configfile
Specifies the directory server instance configuration file.
-g
Specifies not to strip the trailing spaces in attribute values.
-G
Specifies to add members to existing static groups. This parameter must not be specified when the -k parameter is specified.
-i ldiffile
Specifies the name of the LDIF file with path with data to load into the directory server instance. The IDS_LDAP_HOME/examples/sample.ldif file contains sample data in the LDIF format. The IDS_LDAP_HOME variable contains the path of IBM Security Directory Server installation location. The value of IDS_LDAP_HOME varies depending on the operating system. The default path on various operating system is listed.
  • AIX® operating systems: /opt/IBM/ldap/V6.4
  • Linux® operating systems: /opt/ibm/ldap/V6.4
  • Solaris operating systems: /opt/IBM/ldap/V6.4
  • Windows operating systems: C:\Program Files\IBM\ldap\V6.4
    Note: The C:\Program Files\IBM\ldap\V6.4 path is the default installation location. The actual IDS_LDAP_HOME is determined during the installation.
-I instancename
Specifies the name of the directory server instance.
-k number
Specifies the number of entries to process in one parse-load cycle. The-a parameter must be set to parse_and_load. This parameter must not be specified when the -G parameter is specified.
-L path
Specifies the directory for storing temporary data. The default path for the temporary storage location varies depending on the operating system.
On AIX, Linux, and Solaris systems
The default location is instance_home_directory/idsslapd-instance_name/tmp/ldapimport.
Note: If you log in as root, the idsbulkload command fails when you specify the location of the temporary directory by using the –L parameter. You must log in as an instance owner to create a temporary directory, and then run the idsbulkload command with root privileges. To log in as an instance owner, issue the following command.
su instance_name
On Windows systems
On Windows systems, the default location is instance_home_directory\idsslapd-instance_name\tmp\ldapimport.
-n | -N
Specifies that the load is unrecoverable. With this parameter, idsbulkload uses less disk space and runs faster. If data loading fails for any reason, all the data in the database is lost.
-o filename
Specifies to generate an output file to preserve the IBM-ENTRYUUID entry and the timestamp values created during the parsing phase of idsbulkload.
-R <yes|no>
Specifies whether to remove the directory that was used for storing temporary data. The directory to remove is the default directory or the one specified by using the -L parameter. The default value is yes.
Note: Even if the default is yes for the parameter, there are two exceptions. If idsbulkload ends in an error condition, the temporary files are not deleted on error. It is because the files are required for recovery. If a user chooses the -a parseonly parameter, the temporary files are not deleted because the files are needed for the load phase.
-s character
Specifies the string delimiting character that is used for importing.
Note: The idsbulkload command might fail to load LDIF files that contain certain UTF-8 characters. The reason for the failure is because when the DB2 LOAD tool parses the default idsbulkload string delimiter, which is a vertical bar (|), in multi-byte character sets. In such scenarios, reassign the string delimiter to any of the supported delimiters except the vertical bar (|).

For example, the following symbols are supported: "%&'()*,./:;<>?.

To assign a delimiting character, see the following example:

idsbulkload -i ldiffile -I instancename -s <

idsbulkload -i ldiffile -I instancename -s <any of the supported delimiters except |>

To avoid this failure, ensure that the new delimiting character is not present in your ldif file.

-S <yes|no|only>
Verifies whether the directory entries are valid based on the object class definitions and attribute type definitions in the configuration files.
Schema checking verifies that all object classes and attributes are defined. It also checks whether the attributes that are specified for each entry comply with the list of required and allowed attributes in the object class definition. Also verifies whether the binary attribute values are in the correct 64-bit encoded form.
yes
Specifies to run schema check on the data before the command adds it to the directory server instance.
no
Specifies not to run schema check on the data before the command adds it to the directory server instance. It is the default option. This option improves the operational performance. This option assumes that the data in the file is valid.
only
Specifies to run schema check on the data only and not to add data to the directory server instance. This option provides the most feedback and reports errors.
It is advisable to use the -S only parameter to validate the data first, and then to use -S no to load the data.
-t filename
Specifies to use the IBM-ENTRYUUID entry and the timestamp values from the file instead of generating them during the parsing. If the values are present in the LDIF file in the form of controls, the controls are ignored.
-v
Specifies the verbose mode for the command.
-W outputfile
Specifies the full path of a file to redirect output.
-x | -X <yes|no>
Specifies whether to translate entry data to database code page. The default value is no.
Note: This parameter is necessary only when you use a database other than UTF-8.
-?
Specifies to show the syntax help.

Usage

To load considerable large amount of data to a directory server instance, you must use the idsbulkload command. To improve operational performance of the idsbulkload command, you can ignore schema check of the data in the file. During parsing and loading, the idsbulkload command run some basic checks on the data.

When you run the idsbulkload command, you must stop the directory server instance (the idsslapd process).

The idsbulkload command requires disk space for storing temporary data during the parse and load stage. The idsbulkload command also requires temporary storage for data manipulation before it loads the data into the database. The default path of the temporary storage location varies depending on the operating system. See the -L parameter description for the path names. You can change the path by using the -L parameter. For example:

idsbulkload -i ldiffile -I instancename -L /newpath

Before you run the command, ensure that you set write permission to the directory specified by using the -L parameter. You must also ensure that a minimum temporary storage size of 2.5 times the size of the LDIF file is available in the directory. More temporary storage might be required depending on your data. If you receive the following error, for example:

SQL3508N Error in accessing a file of type "SORTDIRECTORY" during load
or load query.Reason code: "2".Path: "/u/ldapdb2/sqllib/tmp/".

You must set the DB2SORTTMP environment variable to point to a directory with more space for usage during the bulkload operation. You can also specify multiple directories that are separated by a comma (,). For example:

export DB2SORTTMP=/sortdir1,/sortdir2

The -o and -t parameters are useful when you add large amounts of data into existing replication environments. If servers A and B are peer servers and you want to add entries under the replication context of an instance, do the following steps.

  1. Generate the LDIF file.
  2. Run idsbulkload with the -o parameter on server A to load the data and to create a file with all operational attributes during bulkload.
  3. Copy the operational attributes output file to server B.
  4. Run idsbulkload with the -i and -t parameters to import the LDIF file with the same operational attributes. This command ensures that the operational attribute values are preserved across the replicating servers under the same replication context.

The -G parameter is useful when you expand an existing static group with many members. The existing entry must have an object class that accepts member or uniquemember as its attribute. For example, if you wanted to add 5 million members from the static group, ou=static group 1, o=company1, to another group, ou=static group A, o=companyA, do the following steps.

  1. Create an LDIF file from the source server. Use an editor to remove any attributes other than member or uniquemember from the file. For example:
    dn: ou=static group 1, o=company1, c=us
    member: cn=member1, o=company1, c=us
    member: cn=member2, o=company1, c=us
    member: cn=member3, o=company1, c=us
    ...
    member: cn=member5000000, o=company1, c=us
  2. Modify the DN of the group in the file to match the DN of the existing group entry on the target server. For example:
    dn: ou=static group A, o=companyA, c=us
    member: cn=member1, o=company1, c=us
    member: cn=member2, o=company1, c=us
    member: cn=member3, o=company1, c=us
    ...
    member: cn=member5000000, o=company1, c=us
  3. Make the necessary global changes to the file. In this case, the company name must be changed for each member attribute.
    dn: ou=static group A, o=companyA, c=us
    member: cn=member1, o=companyA, c=us
    member: cn=member2, o=companyA, c=us
    member: cn=member3, o=companyA, c=us
    ...
    member: cn=member5000000, o=companyA, c=us
  4. To avoid memory issues, divide the file into multiple files of manageable size. In this example, a source file is divided into five files of 1 million attributes. Later, copy the DN as the first line in each file.
    For example, file1:
    dn: ou=static group A, o=companyA, c=us
    member: cn=member1, o=companyA, c=us
    member: cn=member2, o=companyA, c=us
    member: cn=member3, o=companyA, c=us
    ...
    member: cn=member1000000, o=companyA, c=us
    For example, file2:
    dn: ou=static group A, o=companyA, c=us
    member: cn=member1000001, o=companyA, c=us
    member: cn=member1000002, o=companyA, c=us
    member: cn=member1000003, o=companyA, c=us
    ...
    member: cn=member2000000, o=companyA, c=us
    file3:
    dn: ou=static group A, o=companyA, c=us
    member: cn=member2000001, o=companyA, c=us
    member: cn=member2000002, o=companyA, c=us
    member: cn=member2000003, o=companyA, c=us
    ...
    member: cn=member3000000, o=companyA, c=us
    ...
  5. Run the idsbulkload command with the -G parameter to load the files to the target server.

The idsbulkload command verifies whether the DN exists and that its object class and attributes are valid before you load the file.

Note: Theidsbulkload command does not check for duplicate attributes.

You must inspect the output messages from the idsbulkload command carefully. If errors occur during the operation, the instance might not get populated. You might require to drop all the LDAP tables, or drop the database (re-create an empty database), and start over. If no data is added to the instance, then bulkload process must be attempted again. If you drop all the LDAP tables, you might lose any existing data in the instance.

The IDS_LDAP_HOME/examples/sample.ldif file includes sample data. You can use data in this file to experiment with populating a directory by using the idsbulkload command, or you can use the idsldif2db command. The idsldif2db command is considerably slower than the idsbulkload command for large amounts of data.

For performance reasons, the idsbulkload command does not check for duplicate entries. Ensure that your LDIF file does not contain duplicate entries. If any duplicates exist, remove the duplicate entries.

If idsbulkload fails at the DB2 LOAD phase, see the db2load.log file to determine the cause. The location of the log file varies depending on the operating system.

  • On Windows systems, the log file is in the instance_home_directory\idsslapd-instance_name\tmp\ldapimport directory.
    Note: You can change the default path on Windows systems.
  • On AIX, Linux, and Solaris systems the log file is the instance_home_directory/idsslapd-instance_name/tmp/ldapimport directory.

If the -L parameter is specified, the file in the directory that is defined by the -L parameter. Correct the problem and rerun idsbulkload. The idsbulkload command loads the files from the last successful load consistency point.

If idsbulkload fails, the recovery information is stored in the following file. This file is not removed until all of the data is successfully loaded, and ensures the data integrity of the directory server instance. If you configure the database again and start over, the idsbulkload_status file must be removed manually. Otherwise, idsbulkload tries to recover from the last successful load point.

  • On Windows systems, the file is the instance_home_directory\idsslapd-instance_name\logs\bulkload_status directory.
  • On AIX, Linux, and Solaris systems the file is in the instance_home_directory/idsslapd-instance_name/logs/bulkload_status directory.