Basic Initial Load installation

IBM® InfoSphere® Information Server must be configured correctly in order for the BIL assets to compile and run. The following steps must be performed to ensure that BIL can be successfully installed.

Step 1. Prepare the file systems

File systems should be allocated following established best practice methodology for InfoSphere Information Server (source files, target files, datasets, and scratch/sort work areas).

Move the contents of the Java/lib directory to $DSHOME/java/lib, where $DSHOME is the home directory of your DataStage installation.

The Java/lib directory holds specific Java™ library (JAR) assets used in the generation of phonetic values. An example of the $DSHOME/java/lib directory is /opt/IBM/InformationServer/Server/DSEngine/java/lib.

Tip: If the Java/lib directory contents are placed in a directory other than $DSHOME/java/lib, then the class path variables for the following jobs will be affected and must be altered to match the different file structure:

BL_021_Prep_Address
BL_021_Prep_OrgName
BL_021_Prep_PersonName
BL_020_VS_Address
BL_020_VS_OrgName
BL_020_VS_PersonName

Create the following directories specifically for use with BIL. Place them in the directory structure /home/dsadm/<your BIL DataStage project name>/.

Directory Name	Usage
ARCHIVE	Used to store archive of runs intermediate data sets part of clean up process
DATA	Typically used for storing temporary datasets, inter-job datasets, and load-ready datasets
ERROR	Holds BIL error records
LOG	Holds BIL log files, Error Report XML files
REJECT	Used for storing rejected records
SIF_IN	Input SIF files
SK	Surrogate Key “state files”
TEMP	Used to hold temporary items

Job parameter values will be set later to point to these directories.

An image of the directories required by BIL Installation

Ensure that the directory $APT_ORCHHOME/user_lib exists. If not, create it with proper security.

Step 2. Create the InfoSphere DataStage project for BIL

Create an InfoSphere DataStage® project using DataStage Administrator, following any preferred or required naming standard. In the absence of a standard, a suggested name is MDMIS_BIL_R10.

Step 3. Verifying path requirements

Ensure that the project LIBPATH or LD_LIBRARY_PATH include the following paths:

$APT_ORCHHOME/lib—Required to run the orchadmin utility from a Job Sequence
$APT_ORCHHOME/user_lib—Required to run the custom operators to be installed.

Ensure that the project PATH includes the following paths

$APT_ORCHHOME/bin—Required to run the orchadmin utility from Job Sequence.

Step 4. Install the BIL custom routine and Java libraries

Operators are located in the asset tarball in the following directories:

AIX®: Information Server 8.5 – MDMIS-aix/CAB/PXEngine/user_lib64/IIS8.5

This release contains one routine:

UTCCvt.o

The routine should be installed in the following location: $APT_ORCHHOME/user_lib (such as /opt/IBM/InformationServer/Server/PXEngine/user_lib)

Append the contents of CAB/PXEngine/etc/append_to_operator.apt to $APT_ORCHHOME/etc/operator.apt (make a backup and then edit operator.apt).

Verify that LIBPATH or LD_LIBRARY_PATH contain $APT_ORCHHOME/user_lib.

The contents of the Java/lib folder (see the BIL Manifest description) should be copied to -. The resulting folder will consist of 5 jar files and 2 property files: common-codec-1.3, icu4j, Linguistic, Logging, PhoneticGen, JDKLog.properties, and Log4J.properties.

Important: In Sun Solaris environments, you may be required to add the JAR org.eclipse.equinox.common_3.3.0.v20070426.jar to the class path variables in the Java transformer stage of the following InfoSphere DataStage jobs. This will be evident if the CoreException class is not found, causing any of the following jobs to abort:

BL_021_Prep_Address
BL_021_Prep_OrgName
BL_021_Prep_PersonName
BL_020_VS_Address
BL_020_VS_OrgName
BL_020_VS_PersonName

Step 5. Install the clean up script into the ARCHIVE directory

The clean up script (DSClean.sh) is located in the asset tarball in the SCRIPT directory. Copy the DSClean.sh script into the ARCHIVE directory. Verify that the execution permissions are set. Verify that the system supports bash shell scripting. If the system utilizes korn shell scripting (i.e. Solaris) it is will be necessary to change the shell scripting declaration (#!/bin/sh) to appropriately reflect korn shell (#!/bin/ksh). Once scripting has been verified insure that the Cleanup_Env execute command activity stage in BL_000_BASIC_LOAD refers appropriately to DSClean.sh.

Step 6. Import the BIL InfoSphere DataStage jobs to the new project

Using the InfoSphere DataStage client, import the BIL jobs into the newly-created InfoSphere DataStage project. The BIL jobs are contained in ZIP files within the BIL tarball, which will unzip into *.dsx (InfoSphere DataStage eXport) files.

The MDMIS-BIL.zip file contains the main *.dsx file that should be installed.

BIL supports four options regarding database loading that are dependent on the values set by the following job parameters:

MDMIS_DB_DATABASE_TYPE
DB_DATABASE_VARIANT

Depending on customer requirements, any one of the following database options can be chosen:

ODBC – default
DB2® – can be configured for bulk loading
Oracle

Step 7. Set up the ODBC/DB2/Oracle configuration for connection to the database

Setup and configure the appropriate connection type for the InfoSphere MDM database on the InfoSphere Information Server.

For ODBC connections the files $DSHOME/uvodbc.config and $DSHOME/.odbc.ini will need to be modified to include definitions for the appropriate databases. The ODBCINI environment variable should point to $DSHOME/.odbc.ini.

For Oracle, appropriate Oracle client software must be installed on the InfoSphere Information Server server and the ORACLE_HOME environment variable must point to the Oracle installation directory. PATH should include $ORACLE_HOME/bin, LIBPATH, or LD_LIBRARY_PATH should point to the appropriate 32-bit or 64-bit Oracle libraries. A typical ORACLE_HOME value might be /u01/app/oracle/product/11.2.0. If you are using TNSNAMES for connection information, TNS_ADMIN should point to the directory containing the file tnsnames.ora. The ORACLE_HOME, PATH, and load library path variables must be configured in the dsenv file.

Tip: If you wish to use the Oracle database parameters, you must configure a message handler to demote the message ID "IIS-CONN-NGBR-00019" from a warning into an informational message. Failure to do so will result in jobs aborting due to warning message limits being exceeded.

For DB2, appropriate DB2 client software must be installed on the InfoSphere Information Server machine. The DB2DIR environment variable should point to the DB2 installation directory, DB2INSTANCE should contain the appropriate instance name, PATH and LIBPATH/LD_LIBRARY_PATH should point to the appropriate DB2 bin and lib folders.

If the database is a DB2 for z/OS® instance, you must configure the DB2 Connector stage in the BLDBINHISTORY and BLDBINTABLE database shared containers so that the Partitioning tab Partition Type setting is set to "(AUTO)".

If you wish to use the bulk loading settings for either DB2 or Oracle, then you must make corresponding changes in the related database shared containers BLDBINTABLE and BLDBINHISTORY to the DRS stage.

Important: The bulk loading settings are dependent on both database configuration and permissions, so changes must be made only after careful consideration.

Step 8. Test the database connection

A temporary InfoSphere DataStage job can be created to confirm that the server can connect to the InfoSphere MDM database, or other methods such as importing table definitions can be employed. Command-line tools such as sqlplus for Oracle or db2 for DB2 can also be used to verify connectivity from the server.

Step 9. Create the InfoSphere MDM default configuration files

MDM_1X1.apt and MDM_Default.apt are provided in the MDMIS tarball (see the earlier manifest for their exact locations by operating system). Copy these files to the $APT_ORCHHOME/../Configurations directory on the InfoSphere Information Server machine and modify as required for the specific server installation.

The distributed configuration files must be edited to reflect not only the existing environment but also the availability of system resources. While the default configuration file presents a typical two node configuration, you must consider the existing system resources. The default configuration is not appropriate for all environments.

The following sample configuration file includes placeholder values that should be changed. Additional configuration files can be created as desired. Additional configuration options may be added to support a particular system configuration, such as separate sort and scratch space allocations.

MDM_1X1.apt:
{
	node "node1"
	{
	    fastname "<servername>"
	    pools ""
	    resource disk "<resource disk path>" {pools ""}
	    resource scratchdisk "<resource scratchdisk path>" {pools ""}
	}
}
MDM_Default.apt
{
	node "node1"
	{
	   fastname "[servername]"
	   pools ""
	   resource disk "<resource disk path>" {pools ""}
	   resource scratchdisk "<resource scratchdisk path>" {pools ""}
	}
	
	node "node2"
	{
	   fastname "[servername]"
	   pools ""
	   resource disk "<resource disk path>" {pools ""}
	   resource scratchdisk "<resource scratchdisk path>" {pools ""}
	}
}

Note: For more information on InfoSphere Information Server Configuration Files and their options, refer to the InfoSphere DataStage OEM documentation booklet Parallel Job Developer’s Guide.

Step 10. Update the MDMIS parameter set

Set the default values for the MDMIS set’s parameters. The parameters in MDMIS are broken down into several categories, identified by the parameter name prefix:

DB – Database-related parameters, such as connection information
DS – DataStage-related parameters, such as job control and date format
FS – File system-related parameters, such as SIF file location
SK – Surrogate Key-related parameters, such as file location and format

The following MDMIS parameters must reviewed prior to running BIL:

MDMIS Parameter Name	Description
BATCH_ID	Used to uniquely identify intermediate datasets, error files and other files pertaining to a particular BIL batch run. All these files names contain the value of the parameter BATCH_ID.
DS_PROCESSING_DATE	The date specified here is what is used as the current date during processing. It must be set before running the batch.
DB_CONNECT_STRING	The name of the cataloged database.
DB_DATABASE_TYPE	The database type. Available types are: ODBC IBM DB2 Oracle
DB_DATABASE_VARIANT	The database variant. Variants can differ by database type. For ODBC – 3.5 For IBM DB2 – 9.1 For Oracle – either 10 or 11
DB_USERID	The database user ID.
DB_PASSWORD	The database user password.
DB_SCHEMA	The database schema. The value entered in this field should end in a period (‘.’).
DB_CLIENT_INSTANCE	The client instance name. This value is dependant on the database installation set up.
DB_SERVER_INSTANCE	The server instance name. This value is dependant on the database installation set up.
DB_ALIAS	For DB2, this alias name is determined during configuration using the IBM DB2 Configuration Assistant. For Oracle, this is a name created using the Oracle Configuration Assistant.
DS_EMAIL_ERROR_CHECK_DISTRIBUTION	Space-separated list of email addresses to receive reports
DS_EMAIL_ERROR_CHECK_REPORT	Flag to indicate whether the error report should be emailed
DS_LANGUAGE_TYPE_CODE	The InfoSphere MDM language ID.
DS_PERFORM_VALIDATION	Used to decide whether or not to run validation. The following options are available: 0 – No validation (default) 1 – Perform validation jobs
DS_PARALLEL_APT_CONFIG_FILE	InfoSphere DataStage Configuration file for parallel jobs
DS_SEQUENTIAL_APT_CONFIG_FILE	InfoSphere DataStage Configuration file for sequential jobs
DS_SOURCE_DATE_FORMAT	Timestamp format in the SIF Files. The default value is %yyyy-%mm-%dd.
DS_USE_NATIVE_KEY	Used to determine if the NATIVEKEY table is used for Contract Identifier resolution. The following options are available: true false
FS_SIF_FILE_PATTERN	Directory location of the Party SIF Files with naming pattern used to identify them
FS_CONTRACT_SIF_FILE_PATTERN	Directory location of the Contract SIF Files with naming pattern used to identify them
FS_DATA_SET_HEADER_DIR	Directory location for intermediate data sets to be written to during processing.
FS_ERROR_DIR	Directory location for error message text files to be written to during processing.
FS_LOG_DIR	Directory location for log files
FS_REJECT_DIR	Directory location for rejected record data sets
FS_ARCHIVE_DIR	Directory location for archival of populated intermediate data sets and error message files during clean up
FS_SK_FILE_DIR	Directory location for surrogate key files
FS_TMP_DIR	Directory location for temporary files
LOAD_HISTORY_FLAG	Used to decide whether or not to create history records. The following options are available: N – None C – Compound
LOAD_METHOD	Used to set which method is used to load data into the InfoSphere MDM database. The following options are available: BULK ODBC_INSERT
MTZ_ENABLED	Used to determine if the Multi Time Zone feature is enabled in InfoSphere MDM. The following options are available: 0 – Disabled 1 – Enabled. This is the default value.
PHONETIC_CLASS	Used to identify which Phonetic Class to use in the generation of phonetic values. It requires the fully qualified class name. Possible values are com.ibm.imc.phonetics.impl.NysiisExtension, com.ibm.imc.phonetics.impl.SoundexExtension and blank if no phonetic values are to be generated.
ADDRESS_PHONETIC_KEY_MAX_LENGTH	Used to set the maximum length of the phonetic key values generated for Address fields.
ORGNAME_PHONETIC_KEY_MAX_LENGTH	Used to set the maximum length of the phonetic key values generated for OrgName fields.
PERSON_NAME_PHONETIC_KEY_MAX_LENGTH	Used to set the maximum length of the phonetic key values generated for Person Name fields.

The following parameters should be reviewed for advanced configuration:

MDMIS Advanced Default Parameter Name	Description
$APT_DB2INSTANCE_HOME	DB2 installation home directory. Necessary when the db2instance is not accessible via the conductor node and a client interface is used.
$APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL	When set, allows zero length null_field value with fixed length fields. This should be used with care as poorly formatted data will cause incorrect results. By default a zero length null_field value will cause an error.
$APT_IMPORT_PATTERN_USES_FILESET_MOUNTED	When set, sequential read (import) will turn any file pattern into a file set before processing the files. This allows the files to be processed in parallel as opposed to sequentially. By default, file pattern processing will concatenate the files to be used as the input.
$APT_IMPORT_PATTERN_USES_FILESET	When set, sequential read (import) will turn any file pattern into a file set before processing the files. This allows the files to be processed in parallel as opposed to sequentially. By default, file pattern processing will concatenate the files to be used as the input.
$APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS	Reject records that have string or ustring fields that are longer than the defined maximum length.
$APT_SORT_INSERTION_OPTIMIZATION
$APT_NO_PART_INSERTION	Turns off automatic insertion of partitions based on requirements of stage operators.
$APT_NO_SORT_INSERTION	Turns off automatic insertion of sorting based on requirements of stage operators.
$APT_OLD_BOUNDED_LENGTH	Insures that variable field lengths are handled consistent with InfoSphere Information Server settings earlier than version 8.
DS_DETECTED_DUPLICATES_ACTION	The available options to take if duplicates are detected are: E – Error all duplicates K – Keep first, error others
DS_FAILED_COLUMNIZATION_ACTION	Not used
DS_FAILED_RECORDIZATION_ACTION	Not used
DS_PARTY_DROP_SEVERITY_LEVEL	Party records will be dropped if there are errors with a severity greater than or equal to the value set for this parameter
DS_DROP_MAX_ITERATIONS	Number of times that the logic to remove record in errors found in the incoming data is run. The job will abort if there are remaining errors after this level is reached. Incoming data needs to be corrected.
DS_MD5_CRITICAL_ADDRESS_COLUMNS	The columns used to construct an MD5 identifier for Address records for the identification of potential duplicate address records. If not set correctly uniquely identified address records can be mistaken for duplicate addresses.
SK_LOAD_SUFFIX	Constant value that is appended to each surrogate key. This avoids possible key collisions with InfoSphere MDM generated IDs.
SK_MASK	Mask to describe format of surrogate keys. Example: PPPMMMMMMMMMMMSS where: P - set size of Cyclical Sequence M - set size of midSequence S - set size of load suffix The default value is PPPMMMMMMMMMMMSS.
SK_MID_ADDRESS_ID_NEXT_VAL	Next surrogate key value
SK_MID_ADDRESS_ID_SF	Surrogate key state file name
SK_MID_ALERT_ID_NEXT_VAL	Next surrogate key value
SK_MID_ALERT_ID_SF	Surrogate key state file name
SK_MID_CONTACT_METHOD_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONTACT_METHOD_ID_SF	Surrogate key state file name
SK_MID_CONT_REL_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONT_REL_ID_SF	Surrogate key state file name
SK_MID_CONT_EQUIV_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONT_EQUIV_ID_SF	Surrogate key state file name
SK_MID_CONT_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONT_ID_SF	Surrogate key state file name
SK_MID_CONTRACT_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONTRACT_ID_SF	Surrogate key state file name
SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONTR_COMP_VAL_ID_SF	Surrogate key state file name
SK_MID_CONTR_COMPONENT_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONTR_COMPONENT_ID_SF	Surrogate key state file name
SK_MID_CONTRACT_ROLE_ID_NEXT_VAL	Next surrogate key value
SK_MID_CONTRACT_ROLE_ID_SF	Surrogate key state file name
SK_MID_IDENTIFIER_ID_NEXT_VAL	Next surrogate key value
SK_MID_IDENTIFIER_ID_SF	Surrogate key state file name
SK_MID_HIERARCHY_ID_NEXT_VAL	Next surrogate key value
SK_MID_HIERARCHY_ID_SF	Surrogate key state file name
SK_MID_HIERARCHY_NODE_ID_NEXT_VAL	Next surrogate key value
SK_MID_HIERARCHY_NODE_ID_SF	Surrogate key state file name
SK_MID_HIERARCHY_REL_ID_NEXT_VAL	Next surrogate key value
SK_MID_HIERARCHY_REL_ID_SF	Surrogate key state file name
SK_MID_HIER_ULT_PAR_ID_NEXT_VAL	Next surrogate key value
SK_MID_HIER_ULT_PAR_ID_SF	Surrogate key state file name
SK_MID_LOCATION_GROUP_ID_NEXT_VAL	Next surrogate key value
SK_MID_LOCATION_GROUP_ID_SF	Surrogate key state file name
SK_MID_LOB_REL_ID_NEXT_VAL	Next surrogate key value
SK_MID_LOB_REL_ID_SF	Surrogate key state file name
SK_MID_MISCVALUE_ID_NEXT_VAL	Next surrogate key value
SK_MID_MISCVALUE_ID_SF	Surrogate key state file name
SK_MID_NATIVE_KEY_ID_NEXT_VAL	Next surrogate key value
SK_MID_NATIVE_KEY_ID_SF	Surrogate key state file name
SK_MID_ORG_NAME_ID_NEXT_VAL	Next surrogate key value
SK_MID_ORG_NAME_ID_SF	Surrogate key state file name
SK_MID_PERSON_NAME_ID_NEXT_VAL	Next surrogate key value
SK_MID_PERSON_NAME_ID_SF	Surrogate key state file name
SK_MID_PERSON_SEARCH_ID_NEXT_VAL	Next surrogate key value
SK_MID_PERSON_SEARCH_ID_SF	Surrogate key state file name
SK_MID_PPREF_ID_NEXT_VAL	Next surrogate key value
SK_MID_PPREF_ID_SF	Surrogate key state file name
SK_MID_ROLE_LOCATION_ID_NEXT_VAL	Next surrogate key value
SK_MID_ROLE_LOCATION_ID_SF	Surrogate key state file name
SK_MID_SUSPECT_ID_NEXT_VAL	Next surrogate key value
SK_MID_SUSPECT_ID_SF	Surrogate key state file name
SK_PREFIX_CONTRACT_ID_NEXT_VAL	Next surrogate key value
SK_PREFIX_CONTRACT_ID_SF	Surrogate key state file name
SK_PREFIX_CONT_ID_NEXT_VAL	Next surrogate key value
SK_PREFIX_CONT_ID_SF	Surrogate key state file name
SK_PREFIX_HIERARCHY_ID_NEXT_VAL	Next surrogate key value
SK_PREFIX_HIERARCHY_ID_SF	Surrogate key state file name

Step 11. (OPTIONAL) Enable BIL for NLS usage

By default, BIL does not support the use of NLS data (ustring datatype). Therefore, multi-byte characters sets are not recognized as valid character data during processing. In order to support NLS data, BIL jobs must be modified, patches applied to InfoSphere Information Server and InfoSphere MDM table sizes must be increased.

NLSTool for BIL, available through the secure FTP site where BIL assets are accessible, can be used to "NLS-Enable" the BIL InfoSphere DataStage assets. For details about how to configure BIL for supporting NLS data, please consult the NLS documentation that comes with the NLSTool.

For information on how to obtain the NLSTool, please contact the Services District Practice Manager.

Step 12. Compile the Parallel and Sequencer jobs

After the configurations have been completed, the BIL parallel and sequencer jobs can be compiled. While jobs can be compiled individually, due to the number of jobs contained in BIL, it is recommended that the Multiple Job Compile tool be used. This is available within the DataStage Designer Tools menu: Tools > Multiple Job Compile.

Step 13. Create the Surrogate Key state file

The BIL jobs make use of the InfoSphere DataStage Surrogate Key stage to generate both temporary and persistent keys utilized within BIL and InfoSphere MDM processing. The Surrogate Key stage uses a “state” file to regulate the assignment of these keys both during a job and from one job run to another.

Users must create the required surrogate key state files at least once prior to running the BIL for the first time.

In order to do that, the Job Sequencer BL_061__AI_SK_State_File_Control should be run. Default values for parameters can be used at submission time assuming that they have been configured correctly.

Note: The Job Sequencer BL_061__AI_SK_State_File_Control can also be run to reset all surrogate keys to zero.

Once the imports have completed, a folder will appear in the InfoSphere DataStage Repository named MDMIS BIL R1.0. This folder will contain all BIL assets and can be exported in its entirety for backup or replication purposes, such as deployments from Quality Assurance to Production.