Designing InfoSphere DataStage jobs that use the MpxData interaction

Use the IBM® InfoSphere® DataStage® and QualityStage® Designer client to create jobs that use the MDM Java Integration stage with the MpxData interaction. MpxData processes several steps that include parsing data into UNL files, deriving data, and organizing member records into buckets, and, if run in MEMPUT mode, load data to the MDM database.

About this task

You can run MpxData in one of two modes. When MpxData is run in the MEMPUT mode, the data is directly loaded in MDM. When run in MEMCOMPUTE mode, it derives the data and creates the UNL files that can be loaded with the MDM bulk load utilities.

Designing a job that uses MpxData involves the following steps.
  • Creating the MpxData job template XML file in MDM Workbench, which contains the operation parameters for the InfoSphere DataStage MpxData job
  • Adding Java Integration and Sequential File stages to your job in the Designer client and adding links between the stages
  • Configuring each stage
  • Compiling and running your InfoSphere DataStage job
Note: This job must be run with a single node configuration. Running the MpxData job with multiple nodes increases the time to process by the number of nodes because the job does not split the work. Running the job with multiple nodes duplicates the entire process. The execution mode of this job must be set to Sequential.

Procedure

Create the MpxData job template XML file:

  1. From MDM Workbench, run the Derive Data and Create UNLs (mpxdata) job. Set the operational parameters to match what you want the InfoSphere DataStage job to process.
  2. When the job finishes, select the job template XML file that was created by the job and copy it to the machine where InfoSphere Information Server is installed.
    Tip: After the template file is created, you can manually edit the properties in the file if necessary.

Add stages to your job:

  1. Open the Designer client.
  2. Go to File > New and select Parallel Job.
  3. From the Palette, select Realtime. Then, select the Java Integration stage and drag it on to the job editor.
  4. Select File from the Palette. Drag the Sequential File stage on to the job editor.
  5. Join the Sequential File and Java Integration stages with a DSLink. Right-click on Sequential_File and drag a link to Java_Integration.
    Screen capture of job editor with Sequential File and Java Integration stages that are connected with a DSLink

Configure the Sequential File stage:

  1. Double-click the Sequential File stage to open the configuration window.
    1. Select the Properties tab.
    2. Highlight Source > File in the first column. In the File field, type the path and name of the output file from which data is read or use the browse option to locate the file.
      Screen capture of Sequential File configuration dialog with the Properties tab selected.
    3. Highlight Options > First Line is Column Name. In the First Line is Column Name field, select True or False. In most cases, this selection is False.
    4. Click OK.

Define runtime parameters for the job:

  1. On the Designer client toolbar, click Compile button to define the runtime parameters for the job. Two variables must be present to use the MpxData functionality: JAR file location and JVM arguments.
    1. On the Job Properties dialog, select the Parameters tab.
    2. Click Add Environment Variable and select User Defined > Java Class Path from the list.
      Screen capture of the Choose environment variable dialog.
    3. On the Job Properties window, add the location of the JAR files in the Default Values field. Separate the file paths with a colon. For example:
      /home/mdmds/files/madapi.jar:/home/mdmds/files/com.ibm.mdm.iis.datastage.mdmapi.
      jar:/home/mdmds/files/com.ibm.mdm.mds.jmx.jar:/home/mdmds/files/com.ibm.mdm.mds.
      job.client.jar:/home/mdmds/files/com.ibm.mdm.mds.messages.jar:/home/mdmds/files
      /com.ibm.ws.admin.client_8.0.0.jar:/home/mdmds/files/was_dependencies.jar:
      /home/mdmds/files/was_public.jar
    4. To include JVM arguments that identify the location of truststore and keystore files, click Create Parameter Set. On the Parameter Set dialog, type the parameter name. Add the information for the truststore and keystore files. For example:
      -Djavax.net.ssl.trustStore=/home/mdmds/files/ClientTrustFile.jks
      -Djavax.net.ssl.trustStorePassword=WebAS 
      -
      Djavax.net.ssl.keyStore=/home/mdmds/files/ClientKeyFile.jks 
      	-Djavax.net.ssl.keyStorePassword=WebAS
      	-Djavax.net.ssl.trustStoreType=JKS
      	-Djavax.net.ssl.keyStoreType=JK
      Screen capture of the Parameter Set dialog.
    5. Click OK.

Configure the Java Integration stage:

  1. On the job editor, double-click the Java Integration stage to open the Java_Integration configuration window.
  2. Set the execution mode from Parallel to Sequential.
    1. Select Stage > Advanced tab.
    2. Change the execution mode to Sequential.
  3. Select the Stage > Properties tab.
    Screen capture of the Java Integration stage configuration Properties tab.
    1. Classpath - Specify the location of the com.ibm.mdm.iis.datastage.mpxdata.jar file.
    2. To use Java arguments at run time, add a parameter in the Optional JVM options field. The parameter can be used in the stage by referring to it as #parameter_name# or #JVM_ARGS# in the Optional JVM options field. If you did not previously create the parameter, follow the instructions in step 9.d.
  4. On the Java_Integration configuration window, click Configure to open the Configuration Window for MDM Stage.
  5. On the Resolve Job Parameters dialog, click OK. This dialog displays if you set any optional parameters, such JVM options.
  6. On the Configuration Window for MDM Stage, complete these fields.
    Screen capture of the MDM stage configuration dialog.
    1. Host Name - Specify the name of the machine that is hosting the MDM database.
    2. Host Port - Type the machine port number.
    3. Soap Port - Type the SOAP port number.
    4. User Name - Type the user ID that you use to access the MDM database.
    5. Password - Type the password for the user ID.
    6. Template File - Enter the full path and name of the MpxData job template XML file.
    7. Context Id - Enter any unique string that identifies the job. This property is typically set to the MDM project/imm name.
    8. Click OK.
  7. Click OK on the Java_Integration configuration window.

Compile and run the job:

  1. From the Designer client toolbar, click Compile button to compile your job. If you encounter errors, correct them and recompile.
  2. From the toolbar, click Run button to run the job. The first time that you run the job, the following processes occur.
    • A directory for the specified context ID is created in the work directory. Files that are related to the current project and context_id are stored in the work directory and are accessed on the computer where MDM is running. The work directory can be used to provide a second level of separation or partitioning of the job results associated with a particular project and context_id. By default, this path is relative to the operational server instance directory. For example: %Profile_Path%\installedApps\%Cell_Name%\MDM-native-E001.ear\native.war\\work\ContextID_Name
    • An mpxdata.cfg file is created from the XML file and placed in the %Profile_Path%\installedApps\%Cell_Name%\MDM-native-E001.ear\native.war\\work\ContextID_Name directory.

    Make sure that you create the work directory and copy the mpxdata.cfg input file to the directory before you run the job. If the directory does not exist before the first time you run the job, the job creates the directory. However, without the input file, the job can fail.

  3. Copy the input file to the work directory and run the job again.
  4. To view the job log, select View > Job Log.


Last updated: 10 Jan 2018