Getting Started with Java Integration Stage API Samples

This document describes end-to-end scenarios using Java Integration stage with sample jobs and Java code on IBM® Information Server® DataStage®.

The Java Integration stage API Samples consists of the following directory layout:

  samples
   +--- LICENSE, build.xml, Sample.dsx
   +--- lib  // contains Java Integration stage API jar
   +--- docs // contains documentation - SampleGettingStarted.html
   |      |
   |      +--- images  // images for html documentation
   |      
   +--- javadoc // sample code javadoc (will be created when building samples)
   +--- src  // contains sample Java code
   +--- data // contains input data file used in sample jobs

Preparing Java Sample Code

Before you begin

Prepare the environment with following installation:

If you are compiling Java sample code on domain tier, Java 6 JDK is installed with IBM® WebSphere® Application Server:
e.g. /opt/IBM/WebSphere/AppServer/java

For Apache Ant installation, please refer to Installing Apache Ant.

Compile Java Sample Code

  1. Copy JavaIntegration_Samples.zip from <ISDIR>/Clients/Samples/Connectors/ directory
  2. Unzip JavaIntegration_Samples.zip
  3. Compile Java sample code and create jar file to deploy

Deploying Java Sample Code

Copy samples/jars/Samples.jar to somewhere on engine tier.

Copying input data files used in the sample jobs

Copy samples/data directory to somewhere on engine tier.

Importing Java Integration Stage Samples

  1. Copy samples/Sample.dsx to client tier.
  2. From client tier, launch DataStage Designer.
  3. Import job from designer.
  4. Designer menu -> Import -> DataStage Components


    Select copied Sample.dsx and import jobs.


    You should see 6 jobs and 1 parameter set in repository.


    You cannot import the Sample.dsx file from a non-English client tier as it is. To import the Sample.dsx file from a non-English client tier, you need to copy the file, open it in a text editor and edit it.
    For example, to import the Sample.dsx file on Japanese Windows machine, you need to use the following steps:

    1. Copy the Sample.dsx file and open the file in a text editor.
    2. In the CharacterSet="CP1252" line, set your client code page as "CP932".
      (Note: Japanese client code page is CP932.)
    3. Save and close the file.

Job Description

Sample Jobs

Job nameDescriptionJava code name
JavaPackTransformer JavaPackTransformer is a sample job for JavaPack compatibility.
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected.
com.ibm.is.cc.javastage.samples.JavaPackTransformer
Transformer Transformer is a sample job for column-based transformer stage.
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected. In rejected records,"ERRORTEXT" and "ERRORCODE" fields are added to show the rejected reason.
com.ibm.is.cc.javastage.samples.Transformer
JavaBeansTransformer JavaBeansTransformer is a sample job for bean-based transformer stage.
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected. In rejected records,"ERRORTEXT" and "ERRORCODE" fields are added to show the rejected reason.
com.ibm.is.cc.javastage.samples.JavaBeansTransformer
IntValueGenerator IntValueGenerator is a sample job for column-based source stage.
The job has 1 Integer column. It writes rows to an output link. The number of rows to generate is specified as the user's custom stage property "NumOfRecords" value in the Java Integration stage and is fetched using the Configuration.getUserProperties() method. Default value is 10.
End-of-wave marker is written based on the user's custom output link property "WaveCount". Based on the specified value, end-of-wave marker is written after specified number of records. Default value is 5.
Generated value for Integer columns are the incremented integer value starts from 0. If the job is running on multi-node environment, generation is distributed across player nodes in round-robin method. For example, if job is running on 3 nodes and "NumOfRecords" is set to 10, records are generated as follows:
  • node 0 - 0, 3, 6, 9
  • node 1 - 1, 4, 7
  • node 2 - 2, 5, 8
com.ibm.is.cc.javastage.samples.IntValueGenerator
RCP RCP is a sample job for column-based transformer stage using Runtime Column Propagation (RCP).
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time[microseconds])columns, converts the firstname and lastname value to uppercase.
There is only "firstname" and "lastname" column defined in the output link (Runtime column propagation check box is checked).
If RCP is on, all columns in input link will be propagated to output link. Columns other than "firstname" and "lastname" is sent to output link without any changes.
If RCP is off, only "firstname" and "lastname" will be sent to output link.
com.ibm.is.cc.javastage.samples.RCP
UserDefinedFunction UserDefinedFunction is a sample job to invoke User-Defined Function (UDF).
UDF has double and input bean arguments to return output bean.
com.ibm.is.cc.javastage.samples.UserDefinedFunction

Parameter Set

Parameter set nameDescriptionPrameters
JavaIntegrationSamples JavaIntegrationSamples is Parameter Set contains the parameters used in the sample jobs described in above table. usercp - Parameter for User Classpath
inputfile - Parameter for input data file path

Modifying Parameter Set

You will have to modify the parameters used in the jobs to match your environment.

  1. Double click on JavaIntegrationSamples parameter set in the repository.

  2. It has two parameters, usercp and inputfile.


    usercp is the parameter used in the Classpath property of the Java Integration stage.


    inputfile is the parameter used in the File property of the Sequential File Stage.


  3. Change parameter set values to appropriate value where you have placed Sample.jar and sample/data directory on engine tier.

Validating Modified Parameter Set Values

In order to check the validness of the parameter values, you can verify from the imported job.
Here is the example:

  1. Open JavaPackTransformer job by double-clicking the job in the repository.
  2. Double click on Sequential Stage icon which name is emplyee.
  3. The File property value is the inputfile parameter defefined in the JavaIntegrationSamples parameter set configured in previous step. In order to check if the parameter value is a valid file path, click on View Data... button and check the contents of the input data file.


    Resolve Job Parameter dialog will be launched, and shows the default value for inputfile parameter defined in JavaIntegrationSamples Parameter Set. If the default value does not need to be changed, click on "OK". If not, change the value and click on "OK".


    Click on "OK" button for Data Browser dialog.


    You should see the contents of the input data file (data/employee.txt) if the file path is configured correctly.


  4. Double click on Java Integration stage icon which name is JavaPackTransformer.
  5. Java Integration stage has following parameter defined:


    The Classpath property value is the usercp parameter defefined in the JavaIntegrationSamples parameter set configured in previous step.
    In order to check if the parameter value is a valid classpath, click on Select helper button to list the user class included in the jar file specified in the Classpath property.


    Resolve Job Parameter dialog will be launched, and shows the default value for usercp parameter defined in JavaIntegrationSamples Parameter Set. If the default value does not need to be changed, click on "OK". If not, change the value and click on "OK".


    You should see the list of the user class included in the jar file, if the classpath value is configured correctly.
    Select JavaPackTransformer(it should be selected by default) and click on "OK", or "Cancel".


Executing Sample Jobs

JavaPackTransformer

JavaPackTransformer is a sample testcase to test JavaPack compatibility.
The job has 1 input link and 2 output link (including 1 reject link).
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected.


Input data file (data/employee.txt) is read by Sequential File Stage and sent to input link of the JavaIntegration. Java Integration stage transforms the input data and sends the data to output link and reject link.

Parameter set should be configured successfully in previous step, so there is no additional steps required for this job.
You can compile and execute the job.

Job should be executed successfully without any warning messages.

Transformer

Transformer is a sample job for column-based transformer stage.
The job has 1 input link, 1 output link and 1 reject link.

It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected. In rejected records,"ERRORTEXT" and "ERRORCODE" fields are added to show the rejected reason.


Input data file (data/employee.txt) is read by Sequential File Stage and sent to input link of the JavaIntegration. Java Integration stage transforms the input data and sends the data to output link and reject link.

Parameter set should be configured successfully in previous step, so there is no additional steps required for this job.
You can compile and execute the job.

Job should be executed successfully without any warning messages.

JavaBeansTransformer

JavaBeansTransformer is a sample job for bean-based transformer stage.
The job has 1 input link, 1 output link and 1 reject link.
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time) columns, converts the firstname and lastname value to uppercase.
If firstname column of the input row contains the character '*', the row is rejected. In rejected records,"ERRORTEXT" and "ERRORCODE" fields are added to show the rejected reason.


Additional configurations are required if the Java code uses Java Bean to store the record.
You will need to specify the mapping JavaBeans property name and DataStage Column name.

  1. Click on Configure button on stage editor.

  2. Resolve Job Parameter dialog will be launched, and shows the default value for usercp parameter defined in JavaIntegrationSamples Parameter Set. If the default value does not need to be changed, click on "OK". If not, change the value and click on "OK".


    Column Mapping Editor will be displayed to user since Transformer code uses Java Bean (InputBean and OutputBean) to store the record data.

  3. Click on Browse Objects button on Column Mapping Editor.

  4. Import Java Beans Properties dialog will be lauched.


  5. Select all the objects in the beans and click "OK".

  6. You can see the column mapping is defined.


    By clicking Link combo box, you can swich input link and output link column mapping panel.


  7. Click "Finish" button on Column Mapping Editor.
  8. Select the Input tab and select "InputLink(employee)" from the Input name(upstream stage) drop down list. Verify that the Column Mapping and JavaBeans class properties are now defined.


You are now ready to compile and execute the job.

Job should be executed successfully without any warning messages.

IntValueGenerator

IntValueGenerator is a sample job for column-based source stage.
The job has 1 output link, and 1 Integer column defined in output link. It writes rows to an output link.

The number of rows to generate is specified as the user's custom stage property "NumOfRecords" value in the Java Integration stage and is fetched using the Configuration.getUserProperties() method. Default value is 10.
End-of-wave marker is written based on the user's custom output link property "WaveCount". Based on the specified value, end-of-wave marker is written after specified number of records. Default value is 5.

Generated values for Integer columns are the incremented integer values starts from 0. If the job is running on multi-node environment, generation is distributed across player nodes in round-robin method. For example, if job is running on 3 nodes and "NumOfRecords" is set to 10, records are generated as follows:


Additional configurations are required since this job requires Custom properties.

  1. Click on Configure button on stage editor.

  2. Custom Property Editor will be displayed to user since "IntValueGenerator.java" code uses custom properties.
  3. Click on Property value field of Number of Records property.
  4. Change the value to the number of records you want to generate, for example, 20.


  5. Change to the output link property by selecting "OutputLink(Link)" from Scope combo box.

  6. Click on Property value field of Wave Count property.
  7. Change the value to the number of records you want to process before writing end-of-wave marker, for example, 10.

  8. Click "Finish".
  9. You will see NumOfRecords property in the Custom property field for Stage.


    You will see WaveCount property in the Custom property field for Output Link.


You are now ready to compile and execute the job.

Job should be executed successfully without any warning messages.

RCP

RCP is a sample job for column-based transformer stage using Runtime Column Propagation (RCP).
It reads a row having empno(Integer), firstname(VarChar), lastname(VarChar), hireDate(Date), edLevel(SmallInt), salary(Double), bonus(Double) and lastUpdate(Time[microseconds])columns, converts the firstname and lastname value to uppercase.

There are only "firstname" and "lastname" columns defined in the output link (Runtime column propagation check box is checked).

If RCP is on, all columns in input link will be propagated to output link. Columns other than "firstname" and and "lastname" is sent to output link without any changes.
If RCP is off, only "firstname" and "lastname" will be sent to output link.

Parameter set should be configured successfully in previous step, so there is no additional steps required for this job.
You can compile and execute the job.

Job should be executed successfully without any warning messages.

Note:

If Runtime Column Propagation is disabled in your project, the Runtime column propagation check box is grayed out in RCP job. It should be checked by default when importing this sample job.
If you want to enable Runtime column propagation check box, you can enable this function from Data Stage Administrator as follows:
Launch DataStage Administrator and login to your server. Click "Properties" button.

Enable Runtime Column Propagation on Prallel Jobs check box is unchecked by default.

Check on Enable Runtime Column Propagation on Prallel Jobs check box.
You can change default setting for new parallel jobs by the Enable Runtime Column Propagation for new links check box.

UserDefinedFunction

UserDefinedFunction is a sample job to invoke User-Defined Function (UDF).
The job has 2 input links and 1 output link.
UDF has double and input bean arguments to return output bean.


Additional configurations are required since User-Defined Function code uses Java Bean to store the record.
You will need to specify the mapping JavaBeans property name and DataStage Column name.

  1. Click on Configure button on stage editor.

  2. Column Mapping Editor will be displayed to user since UserDefinedFunction code uses Java Bean (InputBean and UDFOutputBean) to store the record data.


  3. Click on Browse Objects button on Column Mapping Editor.

  4. Import Java Beans Properties dialog will be lauched.

  5. Select all the objects in the beans and click "OK".

  6. You can see the column mapping is defined.


    By clicking Link combo box, you can swich InputLink, InputLink2 and OutputLink column mapping panel.



  7. Click "Finish" button on Column Mapping Editor.
  8. You can verify that the Column Mapping and JavaBeans class properties are now defined.

    OutputLink example:

You are now ready to compile and execute the job.

Job should be executed successfully without any warning messages.

Last updated: 2012-04
©Copyright IBM Corporation 2011, 2012