Creating a custom data format for IBM DataStage

You can customize the data that is being generated by CDC Replication and sent to IBM® DataStage® by specifying a Java™ class.

About this task

This is ideal for users who have existing IBM DataStage jobs that expect a particular data format. If you use a custom data format that changes the default format of the flat file sent from CDC Replication to IBM DataStage, the .dsx file described in Generating an IBM DataStage definition file for a subscription will not be useful because it contains only default data formatting. If you have created and imported a .dsx file into IBM DataStage, you must ensure that it is still relevant in IBM DataStage Designer.

For example, you may have an existing IBM DataStage file-based job that will not read the default data format generated by Management Console. In this case, it may be easier for you to specify a Java class to customize the data format rather than modifying your existing IBM DataStage job.

A Java class that creates a custom data format must implement the DataStageDataFormatIF interface.
Requirement: If you have an existing custom data format, and you are upgrading from a CDC Replication Engine for InfoSphere® DataStage version 6.3 to version 6.5 or later, you must modify the existing custom data format because of changes to the DataStageDataFormatIF interface in version 6.5 and later.
Important: If you are using the direct connect connection method, the custom data format is not supported for version 6.5 or later.

For more information on the DataStageDataFormatIF interface, see the Javadocs that are installed with your installation of CDC Replication Engine for InfoSphere DataStage.

For more information on these requirements, refer to the IBM DataStage product technical documentation.

Procedure

  1. Click Configuration > Subscriptions.
  2. Select the subscription.
  3. Click the Table Mappings view and select the table mapping from the Source Table column.
  4. Right-click and select Open Details....
  5. Click the Flat File or Direct Connect tab depending on your mapping type.
    Note: Starting with Version 11.4 of the CDC Replication Engine for InfoSphere DataStage, the Direct Connect method is no longer supported. You can still use the Management Console in V11.4 to configure this method for the V11.3.3 engine.
  6. Enter the name of the Java class that implements the DataStageDataFormatIF interface in the Class Name box.
    For example, you may have imported the DataStageDataFormatIF interface, and the class that implements this interface in your function has the following definition:
    public class CustomFormat1 implements DataStageDataFormatIF

    In the Class Name box in the Custom Data Format area, you must type:

    • CustomFormat1—Identifies a stand-alone class.
    • <Java package>.CustomFormat1—Identifies that class is included in a Java package (for example, com.ibm.interface.CustomFormat1 ).
    The files you generate from compiling the class must be located in a library or folder that is referenced by the CLASSPATH environment variable.
  7. Click Save.