Beef up SOA with real-time data integration

Service-Oriented Architecture (SOA) is a popular method for designing enterprise applications, because it provides benefits like reusable components and platform-independent communication. When considering an SOA, it's essential to factor in data integration. A great deal of legacy data is derived from daily transactions and must be maintained as part of new applications. If you can combine both SOA and data-integration technologies, you benefit through reusability, increased communication with other enterprise applications, and the use of Web services. This article explains how IBM® WebSphere® DataStage helps make this possible. DataStage, an IBM flagship product, provides a complete solution for real-time data integration (RTI) that can be handled as a Web service. You'll use DataStage to develop a sample RTI job, publish it as a Web service, and invoke the Web service with a Java™ client.

Deng Peng Zhou (zhoudp@cn.ibm.com), Software Engineer, IBM

Author photo of Deng Peng ZhouDeng Peng Zhou is an IBM software engineer who focuses on data integration and Java technology.



06 September 2007

Also available in Chinese

Introduction to real-time data integration and WebSphere DataStage

RTI, a component of IBM WebSphere DataStage (hereafter referred to as DataStage), lets you create sharable standard services, including Web services. You can invoke these services, which represent the data-integration functionality of the DataStage jobs, without having to be fully aware of the data-integration logic. Deploying DataStage jobs as sharable services yields a number of benefits, including:

  • Providing single-point standard access to disparate data sources, internal and external.
  • Reusing the logic from DataStage jobs in real time.
  • Developing applications more quickly by providing unified services for every application, which dramatically reduces the redundant code.

Figure 1 shows the architecture of RTI.

Figure 1. Architecture of RTI
Architecture of RTI

This article explains how to deploy RTI jobs as Web services. Let's break that down into the following sections:

  • An introduction to RTI job topologies
  • Developing a sample data-integration component step by step with DataStage
  • Publishing the data-integration component as a Web service
  • Developing a Java client to call the Web service that you publish

RTI job topologies

The RTI server supports three job topologies:

  • Topology I: Batch Jobs
  • Topology II: Batch Jobs with an RTI output stage
  • Topology III: Fully RTI-compliant jobs

Topology I: Batch jobs

Topology I uses new or existing batch jobs that are exposed as RTI services. (Note that this topology doesn't contain any RTI stages, such as an RTI input stage or an RTI output stage, which are explained in the following sections.) An RTI service that's based on a batch job can accept job parameters as input arguments. This type of service returns no output. When you configure the deployment, you can set values for job parameters. Figure 2 shows an example of this topology.

Figure 2. Batch job
Batch job

Topology II: Batch jobs with an RTI output stage

The only difference between topology I and topology II is that there's an RTI output stage in topology II. The RTI output stage is the exit point from the job, and it returns one or more rows to the client application as a service response. The RTI output stage supports one input link. Its table definition maps to the output arguments of the RTI service. See an example of this topology in Figure 3.

Figure 3. Batch jobs with an RTI output stage
Batch jobs with an RTI output stage

Topology III: Fully RTI-compliant jobs

In topology III, jobs use both an RTI input stage and an RTI output stage. The RTI input stage is the entry point to a job, accepting one or more rows during a service request. The RTI input stage supports one output link. Its table definition maps to the input arguments of the RTI service, such as the input arguments of a Web service operation. A job that conforms to topology III is always on. After you deploy this job as a Web service, you find an instance of this job running in the DataStage Director. Figure 4 shows an example of this topology.

Figure 4. Fully RTI-compliant jobs
Fully RTI-compliant jobs

Develop a sample RTI job

Now let's create a sample RTI job to extract the location information from a database table. The parameter you use is employeeid, which is structured as an array, so you can pass in several at the same time. The job then returns the location information of the employees specified by the input parameter. This RTI job uses a table named RFIDLOCATION (see Table 1 for the table definition), which is stored in an IBM DB2® database called GBPMDB. Table definitions of the RTI input stage and RTI output stage are shown in Table 2 and Table 3, respectively. Notice that all the columns in Tables 2 and 3 are included in Table 1. So when you import a table definition using DataStage Designer, you can get the table definitions of Tables 2 and 3 from the table definition in Table 1.

Table 1. Table definition of table RFIDLOCATION
Column nameKeyTypeLengthNullable
RFIDRecordLocationIDYesInteger10No
EmployeeIDNoVarchar60No
LocationIDNoInteger10No
RecordTimeNoTimestamp26No
Table 2. Table definition of RTI input stage
Column nameKeyTypeLengthNullable
EmployeeIDNoVarchar60No
Table 3. Table definition of RTI output stage LocationInfo
Column nameKeyTypeLengthNullable
EmployeeIDNoVarchar60No
LocationIDNoInteger10No
RecordTimeNoTimestamp26No

Import the table definition using DataStage Designer

  1. In the repository of DataStage Designer, right-click Table Definitions, then select Import > Plug-in Meta Data Definitions.
  2. Select DSDB2 in the Name column, and click OK.
  3. In the next window (see Figure 5), select the server name from the drop-down list (in this case, GBPMDB). The server name is the database's name you created, which contains the table you want to import.
  4. Type the user name, db2inst1, and password, passw0rd, to connect to the server.
  5. Select the Tables check box, and click Next to continue.
    Figure 5. Select database
    Select database
  6. Now select the RFIDLOCATION table from the Select Table(s) list, then click Import to import the RFIDLOCATION table definition.

You should now see the table definitions you just imported in the DSDB2 subcategory in the repository of DataStage Designer, as shown in Figure 6.

Figure 6. Table definition
Table definition

Create a parallel job

  1. Now create a new parallel job in DataStage Designer (see the layout of this job in Figure 10). This job contains one DB2/API UDB stage, one RTI input stage, one RTI output stage and one join stage—all four of which are connected by link stages.
    Figure 7. Job layout
    Job layout
  2. Save the new job as sampleRTI.

Configure the job properties

  1. Click the Job Properties icon in the DataStage Designer to open the Job Properties window (see Figure 8).
  2. On the General tab, select Allow Multiple Instance and RTI Service Enabled.
  3. Click OK, then save the configuration.
    Figure 8. Configuration of job properties
    Configuration of job properties

Configure the DB2/UDB stage

  1. Double-click the DB2/UDB API stage RFIDLOCATION to open the window shown in Figure 9.
    Figure 9. Configure database connection information
    Configure database connection information
  2. In this window, specify the server name (the database name you want to connect to), in this case, GBPMDB.
  3. Enter the user ID, db2inst1, and password, passw0rd. Keep the other configurations as default.
  4. Click the Output tab at the top of the window.
  5. On the General tab of the Output page (see Figure 10), enter RFIDLOCATION in the Table names field, and select Generated SQL query from the Query type drop-down list.
  6. Leave the remaining default settings, and click the Columns tab.
    Figure 10. Configure table information
    Configure table information
  7. On the Columns tab, click the Load button to load the table definition. A window pops up from which you can select the table definition in the repository.
  8. Select RTILOCATION in the DSDB2 subcategory, and then click OK.
  9. Now select the columns shown in Figure 11, then click OK.
    Figure 11. Select columns
    Select columns
  10. You'll see the results as shown in Figure 12. Click OK, and save the job.
    Figure 12. Import result
    Import result

Configure the RTI input stage Employeeid

  1. Double-click the RTI input stage Employeeid, and select the Columns tab on the Output page.
  2. From here, load the table definition. (The steps for loading the table definition for the Employeeid RTI input stage are the same as those described in Configure the DB2/UDB stage section.)
  3. After loading the table definition, click OK, and save the job.

Configure the RTI output stage LocationInfo

  1. Double-click the RTI input stage LocationInfo, and go to the Columns tab on the Input page.
  2. Load the table definition. (The steps for loading the table definition for the Employeeid RTI input stage are the same as those described in Configure the DB2/UDB stage section.)
  3. After loading the table definition, click OK, and save the job.

Configure the join stage JoinByEmployeeid

  1. Double-click JoinByEmployeeid to set it up.
  2. Go to the Properties tab on the Stage page, then select EMPLOYEEID as the join key and Inner as the join type, as shown in Figure 13.
    Figure 13. Configure join stage
    Configure join stage
  3. Now go to the Output tab. You can see that DataStage has generated a mapping relationship between the source and the target. Leave these default settings, and click OK.

Compile the job

Click the Compile icon in the DataStage Designer. The Compile Job window opens with a Job successfully compiled message, assuming no errors have occurred during the process. The job is now ready for deployment.


Deploy the RTI job sampleRTI as Web service

The rest of this article describes how to deploy the RTI job as a Web service using the RTI console.

  1. In the Current Tasks pane, open the RTI console, and click Register an RTI Server. This opens the RTI Server Wizard, as shown in Figure 14.
  2. In the RTI Server Name field, enter the machine name of the RTI server. The port number may be different, depending on which application server your RTI server is running on. For example, if the application server is IBM WebSphere Application Server, the port number is 9080.
  3. Keep the default settings of all the other fields, and click Finish.
    Figure 14. Set RTI server
    Set RTI server
  4. The RTI server you just registered now appears as an icon in the right pane. Double-click the icon.
  5. In the Current Tasks pane, click Register a DataStage Machine to open the DataStage Machine Registration Wizard (see Figure 15).
  6. In the Machine name field, enter the name of the machine that's running a DataStage server or DataStage TX host.
  7. For DataStage server machines, enter your valid credentials in the User and Password fields. The default listening port for the RTI Agent is 2000. If your DataStage administrator has changed the port, do the following:
    1. Select the User-defined port button.
    2. Enter the new port number.
    3. Click Finish.
    Figure 15. Set DataStage machine
    Set DataStage machine
  8. In the Current Tasks pane, click Add a New Service to the RTI Server to open the RTI Service Wizard.
  9. In the Service name field, enter the name of the new service, then click Finish.
  10. The service you just created now appears as an icon in the right pane. Double-click this sampleRTI icon.
  11. In the Current Tasks pane, click Add Support for Service Bindings to open the Add Support for Service Bindings Wizard.
  12. Select Soap over HTTP in the right-pane list, then click Next.
  13. In the Additional binding-specific description field, optionally enter a description of the binding, which is added to the Web Services Description Language (WSDL).
  14. From the Style list, select an encoding style for SOAP messages. Your choice depends on what client applications accept.
  15. Click Finish. The binding icon appears in the Results pane.
  16. In the Current Tasks pane, select Add an Operation to open the New Operation Wizard, as shown in Figure 16. You'll see the list of registered DataStage Server and DataStage TX machines represented as nodes.
  17. Select the job sampleRTI, which you just created, then click Next.
    Figure 16. Select RTI job
    Select RTI job
  18. In the Operation Name field, change the name if necessary. The default is the name of the job or map.
  19. In the Queue Size field, specify the size of the operation queue in terms of service requests. If the queue size is exceeded, the request is rejected. The default is three requests.
  20. In the Wait delay field, specify the maximum wait time in milliseconds. If the wait time is exceeded, the request is rejected. The default value is 100 milliseconds.
  21. Click Next.
  22. Pull down the Options list and select Array (see Figure 17). The jobs that accept or require multiple rows in a single request should have their input arguments organized as arrays. In this case, you deploy the job as a Web service that can accept multiple rows in a single request, so you have to choose Array from the Options list.
    Figure 17. Create new operation
    Create new operation
  23. The RTI job returns multiple rows in a single request, so again select Array in the Options list.
  24. Click Next.
  25. You should now be in the New Operation Wizard - Messages Summary window, as shown in Figure 18. In the Service Request and Response Messages field, review the input and output arguments for the operation, then click Finish.
    Figure 18. Review input and output arguments
    Review input and output arguments

Now you set the runtime parameters.

  1. In the Minimum field, enter the minimum number of concurrent job instances that can run at any given time.
  2. In the Maximum field, enter the maximum number of concurrent job and map instances that can run at any given time. The default is five, and the maximum is 500.
  3. Keep the default settings in the other fields, and click Next.
  4. In the next window, replace the default user credentials if needed, then click Finish.
  5. When the Operation Created window pops up, click OK.
  6. Right-click the binding icon in the right pane, and select Activate, as shown in Figure 19.
    Figure 19. Activate the binding
    Activate the binding
  7. Double-click the operation you created. In the right pane, you'll find the job you just attached to the operation.
  8. In the Global Tasks list, click Browse the RTI Registry to open the RTI Registry Web page. You'll find the RTI service you just registered, sampleRTI, in the list (see Figure 20).
    Figure 20. RTI service list
    RTI service list
  9. Click sampleRTI to display the registry information for it.
  10. To display the WSDL for the service in a Web browser, click the WSDL link. You can invoke the RTI job you developed through this WSDL file.

Develop a Java client to call the Web service

This section explains how to call the Web service that you just deployed using a Java Client—and your main task is to develop this Java client. Before you start, you need to prepare the following environment:

  • Eclipse IDE Version 3.0 or later — You use Eclipse to develop the Java project; prepare an Eclipse IDE so you can easily follow the steps in this article.
  • JDK 1.4 or 1.5 — This is essential to develop a Java project.
  • Apache Axis — You use Axis to generate a local Java stub from a Web service, which makes it easy to invoke a Web Service.

Now you can begin developing your Java client.

  1. Create a Java project, and name it TestRTIJob.
  2. Right-click the project, and select Properties. A window like the one shown in Figure 21 opens.
  3. Click the Libraries tab, and add the Axis .jar files shown in Figure 21 to this project.
    Figure 21. Add .jar files
    Add .jar files
  4. Select Run > Run from the Eclipse IDE to open a window like the one shown in Figure 22.
    Figure 22. Generate Java stub
    Generate Java stub
  5. Create a new Java application instance in the left part of this window, then click the Search button on the right.
  6. From the Choose Main Type window that opens, select the WSDL2Java class. This class is provided by Axis to generate a local Java stub from a WSDL file.
  7. Click OK.
  8. Copy the URL of the WSDL file of the Web service you just published to the Program arguments field, as shown in Figure 23.
    Figure 23. Copy URL of the WSDL file
    Copy URL of the WSDL file
  9. Click Run. When the program is finished, you'll notice that some stub classes are generated in your project, just like Figure 24 shows. These classes are used to help you invoke the Web service.
    Figure 24. Generated Java stub classes
    Generated Java stub classes
  10. Now you need to create a class named TestRTIJob to invoke the Web service via the stub classes that were just generated. The source code of this class is provided in Listing 1.
    Listing 1. Call Web service
    package com.ascential.rti.sample;
    
    import java.rmi.RemoteException;
    
    import javax.xml.rpc.ServiceException;
    
    public class TestRTIJob {
    
    	public static void main(String[] args){
    	    SampleLocator locator = new SampleLocator();
    	    SampleDOCLIT service = null;
    	    try {
    			service = locator.getsampleSoap();
    			String name = service.RTIJob("001");
    			System.out.println("The name is: " + name);
    		} catch (ServiceException e) {
    			e.printStackTrace();
    		} catch (RemoteException e) {
    			e.printStackTrace();
    		}
    	}
    }
  11. Run this class. The Java console prints the user's location information.

You're done! You've finished the whole process from developing an RTI job to publishing it as a Web service to invoking the service from a Java client.


Conclusion

IBM WebSphere DataStage provides a convenient approach to deploying DataStage jobs as Web services. In this article, you learned about RTI and all its characteristics, then you developed a sample RTI job and published it as a Web service. You wrapped up by invoking the Web service with a Java client. Hopefully you've become more familiar with DataStage and how it combines seamlessly with SOA.

Resources

Learn

Get products and technologies

  • Innovate your next development project with IBM trial software, available for download or on DVD.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into SOA and web services on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services, Information Management, WebSphere
ArticleID=253824
ArticleTitle=Beef up SOA with real-time data integration
publish-date=09062007