Basic Initial Load installation
IBM® InfoSphere® Information Server must be configured correctly in order for the BIL assets to compile and run. The following steps must be performed to ensure that BIL can be successfully installed.
Step 1. Prepare the file systems
File systems should be allocated following established best practice methodology for InfoSphere Information Server (source files, target files, datasets, and scratch/sort work areas).
Move the contents of the Java/lib directory to $DSHOME/java/lib, where $DSHOME is the home directory of your DataStage installation.
The Java/lib directory holds specific Java™ library (JAR) assets used in the generation of phonetic values. An example of the $DSHOME/java/lib directory is /opt/IBM/InformationServer/Server/DSEngine/java/lib.
- BL_021_Prep_Address
- BL_021_Prep_OrgName
- BL_021_Prep_PersonName
- BL_020_VS_Address
- BL_020_VS_OrgName
- BL_020_VS_PersonName
Create the following directories specifically for use with BIL. Place them in the directory structure /home/dsadm/<your BIL DataStage project name>/.
Directory Name | Usage |
---|---|
ARCHIVE | Used to store archive of runs intermediate data sets part of clean up process |
DATA | Typically used for storing temporary datasets, inter-job datasets, and load-ready datasets |
ERROR | Holds BIL error records |
LOG | Holds BIL log files, Error Report XML files |
REJECT | Used for storing rejected records |
SIF_IN | Input SIF files |
SK | Surrogate Key “state files” |
TEMP | Used to hold temporary items |
Job parameter values will be set later to point to these directories.

Ensure that the directory $APT_ORCHHOME/user_lib exists. If not, create it with proper security.
Step 2. Create the InfoSphere DataStage project for BIL
Create an InfoSphere DataStage® project using DataStage Administrator, following any preferred or required naming standard. In the absence of a standard, a suggested name is MDMIS_BIL_R10.
Step 3. Verifying path requirements
Ensure that the project LIBPATH or LD_LIBRARY_PATH include the following paths:
- $APT_ORCHHOME/lib—Required to run the orchadmin utility from a Job Sequence
- $APT_ORCHHOME/user_lib—Required to run the custom operators to be installed.
Ensure that the project PATH includes the following paths
- $APT_ORCHHOME/bin—Required to run the orchadmin utility from Job Sequence.
Step 4. Install the BIL custom routine and Java libraries
Operators are located in the asset tarball in the following directories:
- AIX®: Information Server 8.5 – MDMIS-aix/CAB/PXEngine/user_lib64/IIS8.5
This release contains one routine:
- UTCCvt.o
The routine should be installed in the following location: $APT_ORCHHOME/user_lib (such as /opt/IBM/InformationServer/Server/PXEngine/user_lib)
Append the contents of CAB/PXEngine/etc/append_to_operator.apt to $APT_ORCHHOME/etc/operator.apt (make a backup and then edit operator.apt).
Verify that LIBPATH or LD_LIBRARY_PATH contain $APT_ORCHHOME/user_lib.
The contents of the Java/lib folder (see the BIL Manifest description) should be copied to -. The resulting folder will consist of 5 jar files and 2 property files: common-codec-1.3, icu4j, Linguistic, Logging, PhoneticGen, JDKLog.properties, and Log4J.properties.
- BL_021_Prep_Address
- BL_021_Prep_OrgName
- BL_021_Prep_PersonName
- BL_020_VS_Address
- BL_020_VS_OrgName
- BL_020_VS_PersonName
Step 5. Install the clean up script into the ARCHIVE directory
The clean up script (DSClean.sh) is located in the asset tarball in the SCRIPT directory. Copy the DSClean.sh script into the ARCHIVE directory. Verify that the execution permissions are set. Verify that the system supports bash shell scripting. If the system utilizes korn shell scripting (i.e. Solaris) it is will be necessary to change the shell scripting declaration (#!/bin/sh) to appropriately reflect korn shell (#!/bin/ksh). Once scripting has been verified insure that the Cleanup_Env execute command activity stage in BL_000_BASIC_LOAD refers appropriately to DSClean.sh.
Step 6. Import the BIL InfoSphere DataStage jobs to the new project
Using the InfoSphere DataStage client, import the BIL jobs into the newly-created InfoSphere DataStage project. The BIL jobs are contained in ZIP files within the BIL tarball, which will unzip into *.dsx (InfoSphere DataStage eXport) files.
The MDMIS-BIL.zip file contains the main *.dsx file that should be installed.
BIL supports four options regarding database loading that are dependent on the values set by the following job parameters:
- MDMIS_DB_DATABASE_TYPE
- DB_DATABASE_VARIANT
Depending on customer requirements, any one of the following database options can be chosen:
- ODBC – default
- DB2® – can be configured for bulk loading
- Oracle
Step 7. Set up the ODBC/DB2/Oracle configuration for connection to the database
Setup and configure the appropriate connection type for the InfoSphere MDM database on the InfoSphere Information Server.
For ODBC connections the files $DSHOME/uvodbc.config and $DSHOME/.odbc.ini will need to be modified to include definitions for the appropriate databases. The ODBCINI environment variable should point to $DSHOME/.odbc.ini.
For Oracle, appropriate Oracle client software must be installed on the InfoSphere Information Server server and the ORACLE_HOME environment variable must point to the Oracle installation directory. PATH should include $ORACLE_HOME/bin, LIBPATH, or LD_LIBRARY_PATH should point to the appropriate 32-bit or 64-bit Oracle libraries. A typical ORACLE_HOME value might be /u01/app/oracle/product/11.2.0. If you are using TNSNAMES for connection information, TNS_ADMIN should point to the directory containing the file tnsnames.ora. The ORACLE_HOME, PATH, and load library path variables must be configured in the dsenv file.
For DB2, appropriate DB2 client software must be installed on the InfoSphere Information Server machine. The DB2DIR environment variable should point to the DB2 installation directory, DB2INSTANCE should contain the appropriate instance name, PATH and LIBPATH/LD_LIBRARY_PATH should point to the appropriate DB2 bin and lib folders.
If the database is a DB2 for z/OS® instance, you must configure the DB2 Connector stage in the BLDBINHISTORY and BLDBINTABLE database shared containers so that the Partitioning tab Partition Type setting is set to "(AUTO)".
If you wish to use the bulk loading settings for either DB2 or Oracle, then you must make corresponding changes in the related database shared containers BLDBINTABLE and BLDBINHISTORY to the DRS stage.
Step 8. Test the database connection
A temporary InfoSphere DataStage job can be created to confirm that the server can connect to the InfoSphere MDM database, or other methods such as importing table definitions can be employed. Command-line tools such as sqlplus for Oracle or db2 for DB2 can also be used to verify connectivity from the server.
Step 9. Create the InfoSphere MDM default configuration files
MDM_1X1.apt and MDM_Default.apt are provided in the MDMIS tarball (see the earlier manifest for their exact locations by operating system). Copy these files to the $APT_ORCHHOME/../Configurations directory on the InfoSphere Information Server machine and modify as required for the specific server installation.
The distributed configuration files must be edited to reflect not only the existing environment but also the availability of system resources. While the default configuration file presents a typical two node configuration, you must consider the existing system resources. The default configuration is not appropriate for all environments.
The following sample configuration file includes placeholder values that should be changed. Additional configuration files can be created as desired. Additional configuration options may be added to support a particular system configuration, such as separate sort and scratch space allocations.
MDM_1X1.apt:
{
node "node1"
{
fastname "<servername>"
pools ""
resource disk "<resource disk path>" {pools ""}
resource scratchdisk "<resource scratchdisk path>" {pools ""}
}
}
MDM_Default.apt
{
node "node1"
{
fastname "[servername]"
pools ""
resource disk "<resource disk path>" {pools ""}
resource scratchdisk "<resource scratchdisk path>" {pools ""}
}
node "node2"
{
fastname "[servername]"
pools ""
resource disk "<resource disk path>" {pools ""}
resource scratchdisk "<resource scratchdisk path>" {pools ""}
}
}
Step 10. Update the MDMIS parameter set
Set the default values for the MDMIS set’s parameters. The parameters in MDMIS are broken down into several categories, identified by the parameter name prefix:
- DB – Database-related parameters, such as connection information
- DS – DataStage-related parameters, such as job control and date format
- FS – File system-related parameters, such as SIF file location
- SK – Surrogate Key-related parameters, such as file location and format
MDMIS Parameter Name | Description |
---|---|
BATCH_ID | Used to uniquely identify intermediate datasets, error files and other files pertaining to a particular BIL batch run. All these files names contain the value of the parameter BATCH_ID. |
DS_PROCESSING_DATE | The date specified here is what is used as the current date during processing. It must be set before running the batch. |
DB_CONNECT_STRING | The name of the cataloged database. |
DB_DATABASE_TYPE | The database type. Available types are:
|
DB_DATABASE_VARIANT | The database variant. Variants can differ by
database type.
|
DB_USERID | The database user ID. |
DB_PASSWORD | The database user password. |
DB_SCHEMA | The database schema. The value entered in this field should end in a period (‘.’). |
DB_CLIENT_INSTANCE | The client instance name. This value is dependant on the database installation set up. |
DB_SERVER_INSTANCE | The server instance name. This value is dependant on the database installation set up. |
DB_ALIAS | For DB2, this alias name is determined during configuration using the IBM DB2 Configuration Assistant. For Oracle, this is a name created using the Oracle Configuration Assistant. |
DS_EMAIL_ERROR_CHECK_DISTRIBUTION | Space-separated list of email addresses to receive reports |
DS_EMAIL_ERROR_CHECK_REPORT | Flag to indicate whether the error report should be emailed |
DS_LANGUAGE_TYPE_CODE | The InfoSphere MDM language ID. |
DS_PERFORM_VALIDATION | Used to decide whether or not to run validation.
The following options are available:
|
DS_PARALLEL_APT_CONFIG_FILE | InfoSphere DataStage Configuration file for parallel jobs |
DS_SEQUENTIAL_APT_CONFIG_FILE | InfoSphere DataStage Configuration file for sequential jobs |
DS_SOURCE_DATE_FORMAT | Timestamp format in the SIF Files. The default value is %yyyy-%mm-%dd. |
DS_USE_NATIVE_KEY | Used to determine if the NATIVEKEY table is
used for Contract Identifier resolution. The following options are
available:
|
FS_SIF_FILE_PATTERN | Directory location of the Party SIF Files with naming pattern used to identify them |
FS_CONTRACT_SIF_FILE_PATTERN | Directory location of the Contract SIF Files with naming pattern used to identify them |
FS_DATA_SET_HEADER_DIR | Directory location for intermediate data sets to be written to during processing. |
FS_ERROR_DIR | Directory location for error message text files to be written to during processing. |
FS_LOG_DIR | Directory location for log files |
FS_REJECT_DIR | Directory location for rejected record data sets |
FS_ARCHIVE_DIR | Directory location for archival of populated intermediate data sets and error message files during clean up |
FS_SK_FILE_DIR | Directory location for surrogate key files |
FS_TMP_DIR | Directory location for temporary files |
LOAD_HISTORY_FLAG | Used to decide whether or not to create history
records. The following options are available:
|
LOAD_METHOD | Used to set which method is used to load data
into the InfoSphere MDM
database. The following options are available:
|
MTZ_ENABLED | Used to determine if the Multi Time Zone feature
is enabled in InfoSphere MDM. The following options are available:
|
PHONETIC_CLASS | Used to identify which Phonetic Class to use in the generation of phonetic values. It requires the fully qualified class name. Possible values are com.ibm.imc.phonetics.impl.NysiisExtension, com.ibm.imc.phonetics.impl.SoundexExtension and blank if no phonetic values are to be generated. |
ADDRESS_PHONETIC_KEY_MAX_LENGTH | Used to set the maximum length of the phonetic key values generated for Address fields. |
ORGNAME_PHONETIC_KEY_MAX_LENGTH | Used to set the maximum length of the phonetic key values generated for OrgName fields. |
PERSON_NAME_PHONETIC_KEY_MAX_LENGTH | Used to set the maximum length of the phonetic key values generated for Person Name fields. |
The following parameters should be reviewed for advanced configuration:
MDMIS Advanced Default Parameter Name | Description |
---|---|
$APT_DB2INSTANCE_HOME | DB2 installation home directory. Necessary when the db2instance is not accessible via the conductor node and a client interface is used. |
$APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL | When set, allows zero length null_field value with fixed length fields. This should be used with care as poorly formatted data will cause incorrect results. By default a zero length null_field value will cause an error. |
$APT_IMPORT_PATTERN_USES_FILESET_MOUNTED | When set, sequential read (import) will turn any file pattern into a file set before processing the files. This allows the files to be processed in parallel as opposed to sequentially. By default, file pattern processing will concatenate the files to be used as the input. |
$APT_IMPORT_PATTERN_USES_FILESET | When set, sequential read (import) will turn any file pattern into a file set before processing the files. This allows the files to be processed in parallel as opposed to sequentially. By default, file pattern processing will concatenate the files to be used as the input. |
$APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS | Reject records that have string or ustring fields that are longer than the defined maximum length. |
$APT_SORT_INSERTION_OPTIMIZATION | |
$APT_NO_PART_INSERTION | Turns off automatic insertion of partitions based on requirements of stage operators. |
$APT_NO_SORT_INSERTION | Turns off automatic insertion of sorting based on requirements of stage operators. |
$APT_OLD_BOUNDED_LENGTH | Insures that variable field lengths are handled consistent with InfoSphere Information Server settings earlier than version 8. |
DS_DETECTED_DUPLICATES_ACTION | The available options to take if duplicates
are detected are:
|
DS_FAILED_COLUMNIZATION_ACTION | Not used |
DS_FAILED_RECORDIZATION_ACTION | Not used |
DS_PARTY_DROP_SEVERITY_LEVEL | Party records will be dropped if there are errors with a severity greater than or equal to the value set for this parameter |
DS_DROP_MAX_ITERATIONS | Number of times that the logic to remove record in errors found in the incoming data is run. The job will abort if there are remaining errors after this level is reached. Incoming data needs to be corrected. |
DS_MD5_CRITICAL_ADDRESS_COLUMNS | The columns used to construct an MD5 identifier for Address records for the identification of potential duplicate address records. If not set correctly uniquely identified address records can be mistaken for duplicate addresses. |
SK_LOAD_SUFFIX | Constant value that is appended to each surrogate key. This avoids possible key collisions with InfoSphere MDM generated IDs. |
SK_MASK | Mask to describe format of surrogate keys. Example:
PPPMMMMMMMMMMMSS where:
The default value is PPPMMMMMMMMMMMSS. |
SK_MID_ADDRESS_ID_NEXT_VAL | Next surrogate key value |
SK_MID_ADDRESS_ID_SF | Surrogate key state file name |
SK_MID_ALERT_ID_NEXT_VAL | Next surrogate key value |
SK_MID_ALERT_ID_SF | Surrogate key state file name |
SK_MID_CONTACT_METHOD_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONTACT_METHOD_ID_SF | Surrogate key state file name |
SK_MID_CONT_REL_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONT_REL_ID_SF | Surrogate key state file name |
SK_MID_CONT_EQUIV_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONT_EQUIV_ID_SF | Surrogate key state file name |
SK_MID_CONT_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONT_ID_SF | Surrogate key state file name |
SK_MID_CONTRACT_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONTRACT_ID_SF | Surrogate key state file name |
SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONTR_COMP_VAL_ID_SF | Surrogate key state file name |
SK_MID_CONTR_COMPONENT_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONTR_COMPONENT_ID_SF | Surrogate key state file name |
SK_MID_CONTRACT_ROLE_ID_NEXT_VAL | Next surrogate key value |
SK_MID_CONTRACT_ROLE_ID_SF | Surrogate key state file name |
SK_MID_IDENTIFIER_ID_NEXT_VAL | Next surrogate key value |
SK_MID_IDENTIFIER_ID_SF | Surrogate key state file name |
SK_MID_HIERARCHY_ID_NEXT_VAL | Next surrogate key value |
SK_MID_HIERARCHY_ID_SF | Surrogate key state file name |
SK_MID_HIERARCHY_NODE_ID_NEXT_VAL | Next surrogate key value |
SK_MID_HIERARCHY_NODE_ID_SF | Surrogate key state file name |
SK_MID_HIERARCHY_REL_ID_NEXT_VAL | Next surrogate key value |
SK_MID_HIERARCHY_REL_ID_SF | Surrogate key state file name |
SK_MID_HIER_ULT_PAR_ID_NEXT_VAL | Next surrogate key value |
SK_MID_HIER_ULT_PAR_ID_SF | Surrogate key state file name |
SK_MID_LOCATION_GROUP_ID_NEXT_VAL | Next surrogate key value |
SK_MID_LOCATION_GROUP_ID_SF | Surrogate key state file name |
SK_MID_LOB_REL_ID_NEXT_VAL | Next surrogate key value |
SK_MID_LOB_REL_ID_SF | Surrogate key state file name |
SK_MID_MISCVALUE_ID_NEXT_VAL | Next surrogate key value |
SK_MID_MISCVALUE_ID_SF | Surrogate key state file name |
SK_MID_NATIVE_KEY_ID_NEXT_VAL | Next surrogate key value |
SK_MID_NATIVE_KEY_ID_SF | Surrogate key state file name |
SK_MID_ORG_NAME_ID_NEXT_VAL | Next surrogate key value |
SK_MID_ORG_NAME_ID_SF | Surrogate key state file name |
SK_MID_PERSON_NAME_ID_NEXT_VAL | Next surrogate key value |
SK_MID_PERSON_NAME_ID_SF | Surrogate key state file name |
SK_MID_PERSON_SEARCH_ID_NEXT_VAL | Next surrogate key value |
SK_MID_PERSON_SEARCH_ID_SF | Surrogate key state file name |
SK_MID_PPREF_ID_NEXT_VAL | Next surrogate key value |
SK_MID_PPREF_ID_SF | Surrogate key state file name |
SK_MID_ROLE_LOCATION_ID_NEXT_VAL | Next surrogate key value |
SK_MID_ROLE_LOCATION_ID_SF | Surrogate key state file name |
SK_MID_SUSPECT_ID_NEXT_VAL | Next surrogate key value |
SK_MID_SUSPECT_ID_SF | Surrogate key state file name |
SK_PREFIX_CONTRACT_ID_NEXT_VAL | Next surrogate key value |
SK_PREFIX_CONTRACT_ID_SF | Surrogate key state file name |
SK_PREFIX_CONT_ID_NEXT_VAL | Next surrogate key value |
SK_PREFIX_CONT_ID_SF | Surrogate key state file name |
SK_PREFIX_HIERARCHY_ID_NEXT_VAL | Next surrogate key value |
SK_PREFIX_HIERARCHY_ID_SF | Surrogate key state file name |
Step 11. (OPTIONAL) Enable BIL for NLS usage
By default, BIL does not support the use of NLS data (ustring datatype). Therefore, multi-byte characters sets are not recognized as valid character data during processing. In order to support NLS data, BIL jobs must be modified, patches applied to InfoSphere Information Server and InfoSphere MDM table sizes must be increased.
NLSTool for BIL, available through the secure FTP site where BIL assets are accessible, can be used to "NLS-Enable" the BIL InfoSphere DataStage assets. For details about how to configure BIL for supporting NLS data, please consult the NLS documentation that comes with the NLSTool.
For information on how to obtain the NLSTool, please contact the Services District Practice Manager.
Step 12. Compile the Parallel and Sequencer jobs
After the configurations have been completed, the BIL parallel and sequencer jobs can be compiled. While jobs can be compiled individually, due to the number of jobs contained in BIL, it is recommended that the Multiple Job Compile tool be used. This is available within the DataStage Designer Tools menu:
.Step 13. Create the Surrogate Key state file
The BIL jobs make use of the InfoSphere DataStage Surrogate Key stage to generate both temporary and persistent keys utilized within BIL and InfoSphere MDM processing. The Surrogate Key stage uses a “state” file to regulate the assignment of these keys both during a job and from one job run to another.
Users must create the required surrogate key state files at least once prior to running the BIL for the first time.
In order to do that, the Job Sequencer BL_061__AI_SK_State_File_Control should be run. Default values for parameters can be used at submission time assuming that they have been configured correctly.
Once the imports have completed, a folder will appear in the InfoSphere DataStage Repository named MDMIS BIL R1.0. This folder will contain all BIL assets and can be exported in its entirety for backup or replication purposes, such as deployments from Quality Assurance to Production.