Use IBM InfoSphere Information Server to transform legacy data into information services

An SOA approach to information sharing using GJXDM schema

Learn how to create and deploy information services to access legacy databases without writing any code. The generated web services are created using the IBM Information Server components including InfoSphere DataStage, InfoSphere Federation Server, InfoSphere Information Services Director, and WebSphere Transformation Extender for DataStage. In this example, the information services are delivered using a standard government XML model (GJXML).

This scenario was implemented by the Federal Innovation Solution Center (FISC) as part of an information-sharing demonstration.

Lai-Fong Leung (lfleung@us.ibm.com), Senior Software Engineer, IBM

Lai-Fong LeungLai-Fong Leung is a Senior Software Engineer with the Software Group Federal Innovation Solution Center at Bethesda, MD. In this role, she is responsible for building Content Management Services and Information Integration Services demo. Her areas of expertise include e-business technologies such as Enterprise Content Management Solution, WebSphere and Java. She has over twenty years of experience in application development. She holds a masters degree in Computer Science from the University of George Washington.



Walt Geddes (geddes@us.ibm.com), Senior Software Engineer, IBM

Walt GeddesWalt Geddes is a senior software engineer with the IBM Software Group, Strategy team in IBM. In his 37 years with IBM Mr. Geddes was the software architect and team lead on a number of IBM products and customer projects. He is currently working in the IBM Federal Innovation Solution Center, focused on creating customer demonstrations and proof of concepts.



12 November 2012 (First published 07 February 2008)

Before you start

Government agencies often need to share important information with one another. Unfortunately, the data for these agencies is often kept in many different repositories, and reflects many different data models. One way to meet this need is to map each of the local models to a common, shared model. The mapped data can then be exposed to the sharing organizations as information services.

Wouldn’t it be great if you could create all of these information services without writing code? Well, such tools do exist as components of the IBM Information Server (IIS).

About this tutorial

In this tutorial, learn how to create and deploy information services that can access legacy databases and return the information in a standard government XML model (GJXML). The techniques described in the tutorial can be used with any XML model chosen.

Objectives

  1. Import legacy database schemas into IBM Information Server (IIS)
  2. Create a federated view (optional)
  3. Import GJXML schema
  4. Map the federated view to GJXML schema
  5. Create a DataStage job to access the data
  6. Publish and deploy DataStage job as a Web service

Prerequisites

To implement the IncidentReport Web service as described in this tutorial, you will need:

  • An IBM Information Server V8.1.0.1, that consists of at least the following components:
    • IBM Information Server Console. This is the Client for the InfoSphere Information Services Director (IISD).
    • InfoSphere DataStage Server
    • InfoSphere DataStage and QualityStage Designer
    • InfoSphere DataStage and QualityStage Director
    • InfoSphere DataStage and QualityStage Administrator
    • InfoSphere Federation Server
  • WebSphere Transformation Extender for DataStage Version 8.1.0.1.
  • If the latest WebSphere Transformation Extender for DataStage is V8.1.0.1, you will also need WebSphere Transformation Extender Designer Studio V 8.1.0.2
  • A repository that stores the legacy data.
  • A Web service testing tool to test the IncidentReport Web service created by IISD. You can test using the WebExplorer in one of the following IBM development tools:
    • InfoSphere Data Architect V7.0
    • Rational Application Developer for WebSphere Software V7.0

Trial versions of some of the products mentioned above are available. Refer to the Resources section for the link to Download IBM product evaluation versions.

Figure 1 shows the development environment that we used to develop the IncidentReport Web service. It consists of an IBM Information Server (IIS), an IBM DB2 Server and a client workstation.

Figure 1. IncidentReport development environment
IncidentReport development environment

IBM Information Server

We installed all the IBM Information Server components onto one server. The operating system was Windows XP. The IIS server contained both the server and client for developing the DataStage job.

IBM DB2 Server

We setup an IBM DB2 V9.0 server to simulate where legacy data is stored. You can use any repository that IIS supports.

Client Workstation

We setup a Client workstation on an IBM T40 Thinkpad with 2GB of RAM. We installed the IBM Rational Application Developer (RAD) V7.0, so we could test the IncidentReport Web service using the WebExplorer in the RAD. We also installed the WebSphere Transformation Extender Designer Studio V 8.1.0.2 on the Client workstation and used that to develop the IncidentReport transformation map.

System requirements for IIS

Please refer to the IBM Information Server Information Center for the complete system requirements and installation instructions on the IBM Information Server.

In our development environment, the IIS server was actually installed on a VMWare Image running Windows XP. The VMWare memory was set to 2184 MB, and the virtual disk was set to 40 GB. The IIS VMWare image was hosted by a Lenovo T60p Thinkpad with 3GB of RAM.


The information-sharing environment

Business problem

Suppose, for an instance, that you are presented with this scenario:

The federal government is building a criminal justice information-sharing system to provide nationwide connectivity to disparate local, state, tribal, and federal systems for the exchange of information. In order to effectively share this disparate information, standardized information exchanges are defined using a government XML Standard defined by the National Information Exchange Model (NIEM). This standard is based upon an extension to the Global Justice XML model (GJXML).

The sharing organizations need effective tools to enable them to share their legacy data using the new, information-sharing standard without requiring a large development effort or a re-writing of their existing systems.

Solution

The IBM Information Server (IIS) product suite provides the tools for generating and deploying information services that can be used to access the legacy databases without writing code. Instead, you can perform codeless visual design of data flows using hundreds of built-in transformation functions.

The IIS InfoSphere DataStage and WebSphere Transformation Extender are used to generate a DataStage job that reads the database and transforms the information into a GJXML XML model. Then, the InfoSphere Information Services Director (IISD) component of IIS is used to generate and deploy the Web service without any coding.


The IncidentReport Web service

Introduction

The purpose of the IncidentReport Web service is to demonstrate how the IBM Information Server (IIS) technology can participate in an information-sharing environment. The IncidentReport Web service shows that IIS has the tools to enable sharing organizations to create information services that map their database definitions to the Global Justice XML Data Model (GJXDM) and to surface the information service as a Web service.

Figure 2 shows a high-level overview of the IncidentReport Web service:

Figure 2. IncidentReport high-level overview
IncidentReport high-level overview

The IncidentReport Web service is created without writing any code. Instead, the following IIS components are used to implement the IncidentReport Web service:

  • InfoSphere Federation Server
  • InfoSphere DataStage
  • WebSphere Transformation Extender for DataStage
  • InfoSphere Information Services Director (IISD)

In order to create this IncidentReport Web service, you should follow these high-level steps:

  1. Create the data sources required to satisfy the IncidentReport (see section "Create the data sources").
  2. Setup the InfoSphere Federation Server to access the various data sources required to produce the IncidentReport (see section "Federate the data").
  3. Create the transformation map for data from the legacy databases to the target IncidentReport schema (see section "Create transformation map").
  4. Create a DataStage job to retrieve the data and transform it into an XML document based on the IncidentReport.xsd schema (see section "Create DataStage job").
  5. Publish and deploy the DataStage job from Step 4 as a Web service (see section "Publish and deploy DataStage job as a Web service").

Step 1. Create the data sources

To simulate legacy data, two relational databases were created on separate AIX systems. Table 1 lists the database names and the tables:

Table 1. Data sources
Database namesTable names
LAWENF
  • LAWENFC
ARREST
  • BOOKINGS
  • CHARGES

Listing 1 shows the DDL statement for creating the LAWENFC table:

Listing 1. DDL for creating the LAWENFC table
------------------------------------------------
-- DDL Statements for table "DB2INST1"."LAWENFC"
------------------------------------------------
 
 CREATE TABLE "DB2INST1"."LAWENFC"  (
		  "INCIDENTNUMBER" VARCHAR(10) NOT NULL , 
		  "INCIDENTREFERENCE" VARCHAR(50) NOT NULL , 
		  "INCIDENTLOCATION" VARCHAR(50) , 
		  "INCIDENTDESCRIPTION" VARCHAR(50) , 
		  "PERSONNAME" VARCHAR(50) , 
		  "ALIAS1" VARCHAR(50) , 
		  "ALIAS2" VARCHAR(50) , 
		  "RESIDENCE1" VARCHAR(120) , 
		  "RESIDENCE2" VARCHAR(120) , 
		  "RESIDENCE3" VARCHAR(120) , 
		  "PHONENUMBER1" VARCHAR(15) , 
		  "PHONENUMBER2" VARCHAR(15) , 
		  "PHONENUMBER3" VARCHAR(15) , 
		  "DOB" VARCHAR(10) , 
		  "INCIDENTTYPE" VARCHAR(10) , 
		  "REASONFORSTOP" VARCHAR(20) , 
		  "ITEMSEIZED" VARCHAR(20) , 
		  "REASONFORSEARCH" VARCHAR(30) , 
		  "TIME" VARCHAR(5) , 
		  "GANGAFFILIATION" VARCHAR(20) , 
		  "OFFICERNAME" VARCHAR(50) , 
		  "UNITNAME" VARCHAR(30) , 
		  "DIVISION" VARCHAR(10) , 
		  "AUTOLICENCENUMBER" VARCHAR(10) , 
		  "DRIVERSLICENSE" VARCHAR(20) , 
		  "EMAIL" VARCHAR(100) , 
		  "XSSN" VARCHAR(11) , 
		  "HAIRCOLOR" VARCHAR(12) , 
		  "EYECOLOR" VARCHAR(12) , 
		  "HEIGHT" VARCHAR(8) , 
		  "WEIGHT" VARCHAR(8) , 
		  "RACE" VARCHAR(15) , 
		  "SEX" CHAR(1) )   
		 IN "USERSPACE1" ; 

-- DDL Statements for primary key on Table "DB2INST1"."LAWENFC"

ALTER TABLE "DB2INST1"."LAWENFC" 
	ADD CONSTRAINT "CC1144439332548" PRIMARY KEY
		("INCIDENTNUMBER");

Listing 2 shows the DDL statement for creating the BOOKINGS table:

Listing 2. DDL for creating the BOOKINGS table
------------------------------------------------
-- DDL Statements for table "DB2INST1"."BOOKINGS"
------------------------------------------------
 
 CREATE TABLE "DB2INST1"."BOOKINGS"  (
		  "BOOKINGID" VARCHAR(10) NOT NULL , 
		  "INCIDENTNUMBER" VARCHAR(10) NOT NULL , 
		  "FACILITYNAME" VARCHAR(50) )   
		 IN "USERSPACE1" ; 

-- DDL Statements for primary key on Table "DB2INST1"."BOOKINGS"

ALTER TABLE "DB2INST1"."BOOKINGS" 
	ADD CONSTRAINT "BOOKING_ID" PRIMARY KEY
		("BOOKINGID");

Listing 3 shows the DDL statement for creating the CHARGES table:

Listing 3. DDL for creating the CHARGES table
------------------------------------------------
-- DDL Statements for table "DB2INST1"."CHARGES"
------------------------------------------------
 
 CREATE TABLE "DB2INST1"."CHARGES"  (
		  "CHARGEID" VARCHAR(10) NOT NULL , 
		  "INCIDENTNUMBER" VARCHAR(10) NOT NULL , 
		  "SUBJECTNAME" VARCHAR(50) , 
		  "DESCRIPTION" VARCHAR(200) , 
		  "APPLICABILITY" VARCHAR(50) , 
		  "CATAGORYCODE" VARCHAR(50) , 
		  "STATUTECODE" VARCHAR(50) , 
		  "STATUTELEVEL" VARCHAR(10) )   
		 IN "USERSPACE1" ; 

-- DDL Statements for primary key on Table "DB2INST1"."CHARGES"

ALTER TABLE "DB2INST1"."CHARGES" 
	ADD CONSTRAINT "CHARGE_ID" PRIMARY KEY
		("CHARGEID");

Step 2. Federate the data

In the real world it is very common for an enterprise to have data residing in heterogeneous data sources. The IBM InfoSphere Federation Server enables enterprises to access and integrate diverse data and content as if they were a single resource.

To simulate the real world scenario, the InfoSphere Federation Server is used to access data from the LAWENF and ARREST databases. Table 2 summarizes the steps for creating a federated object for an IBM DB2 for Linux ®, UNIX® and Windows ® database table:

Table 2. Steps to create a federated object for a DB2 database table.
StepsDescription
1 Catalog the remote database. Skip this step if the database is local to the Federation Server.
2 Create DB2 wrapper.
3 Create server definition.
4 Create user mapping.
5 Create nickname.

Create federated nicknames

To access the remote LAWENFC, BOOKINGS and CHARGES tables via the Federation Server, you must create a nickname object for each of the remote database tables. Below are the steps for creating the BOOKINGS and CHARGES nickname objects. These steps assume that a Federation Database has already been created. In this tutorial, the Federation Database is called FEDSRVER.

  1. Catalog the ARREST database:
    • From the Start menu, select Programs -> IBM DB2 -> DB2COPY1 (Default) -> Set-up Tools -> Configuration Assistant.
    • Select Selected -> Add Database Using Wizard.
    • Follow the prompts from the Add Database Wizard to catalog the ARREST database.
  2. Create the DB2 wrapper:
    • From the Start menu, select Programs IBM DB2 -> DB2COPY1 (Default) -> General Administration Tools -> Control Center.
    • Select FEDSRVER -> Federation Database Object.
    • Right-click on Federation Database Object and select Create Wrapper.
    • In the Create Wrapper window, enter the following parameters:
      • Data source: DB2
      • Wrapper name: drda2
    • Click OK.

    Figure 3 shows the steps to create a DB2 wrapper:

    Figure 3. Creating the DB2 wrapper
    Creating the DB2 wrapper
  3. Create a server definition for the ARREST database.
    • Expand the DRDA wrapper object.
    • Right-click on Server Definitions and select Create.
    • On the Create Server Definitions window, click on Discover. The system returns a list of databases in the Server Definitions list box.
    • From the Server Definitions list box, select the Uncheck All button. Check the box next to the ARREST database to select only the ARREST database.
    • Click on Properties. Enter the following parameters on the Server Definition Properties:
      • Name: ARREST
      • Type: DB2
      • Version: (select the appropriate version)
      • Enter the appropriate User ID and Password.
    • Click OK.

    Figure 4 shows the steps to create a server definition for the ARREST database:

    Figure 4. Creating a server definition for the ARREST database
    Creating a server definition for the ARREST database
  4. Next, you should create user mapping. This step requires a local database userid be mapped to the remote database userid. The remote database userid is the userid that has access rights to the remote database.
    • Expand on the ARREST server definition.
    • Right-click on User Mappings and select Create.
    • Select the local user ID(s) to be mapped, as Figure 5 shows:
    Figure 5. Create user mapping for local user ID(s)
    Create user mapping for local user ID(s)
    • Select the Settings tab.
    • Next to the REMOTE_AUTHID, enter the remote userid that has access right to the remote database, as in Figure 6.
    • Next to the REMOTE_PASSWORD, enter the password for the remote userid.
    Figure 6. Enter remote database user ID and password
    Enter remote database user ID and password
  5. Create a nickname for the BOOKINGS and CHARGES tables.
    • Expand on the ARREST server definition.
    • Right-click on Nicknames and select Create.
    • From the Create Nicknames window, click on Discover.
    • From the Discover window, enter a value for the Remote schema to narrow down the search criteria. If no filter is provided, the system returns all the tables it discovers for the ARREST database. Figure 7 shows this step:
    Figure 7. Enter parameters to discover remote database tables
    Enter parameters to discover remote database tables
    • From the Nicknames window, click on the check boxes next to the BOOKINGS and CHARGES nicknames. If you are satisfied with the value of the nicknames, click OK to finish creating the nicknames. Otherwise, change the nickname value by double-clicking the nickname value. That puts you into edit mode. Change the value. When done, click OK to finish creating the nickname objects for the BOOKINGS and CHARGES tables. Figure 8 illustrates this step:
    Figure 8. Select the remote database tables to create the nickname objects
    Select the remote database tables to create the nickname objects

Follow the same steps listed above to create a nickname for the LAWENFC table in the LAWENF database.

Create a federated view

Once a nickname is defined for the remote database table, you can reference the remote table by its nickname as if the table is defined in your local database. To facilitate the DataStage job in accessing the data in the LAWENFC, BOOKINGS and CHARGES tables, a federated view, IncReport, is created. The IncReport view combines selected columns from these three tables into one single view. Listing 4 shows the DDL statement that creates the IncReport federated view:

Listing 4. DDL for creating the IncReport federated view.
--
-- DDL Statements for creating a federated IncReport view
-- 
create view IncReport 
as SELECT L.INCIDENTNUMBER, L.INCIDENTLOCATION, L.INCIDENTDESCRIPTION, 
L.PERSONNAME, L.RESIDENCE1, L.PHONENUMBER1, L.DOB, L.INCIDENTTYPE, 
L.DRIVERSLICENSE, L.XSSN, L.HAIRCOLOR, L.EYECOLOR, L.HEIGHT, L.WEIGHT, 
L.RACE, L.SEX, B.BOOKINGID, B.FACILITYNAME, C.CHARGEID, C.SUBJECTNAME, 
C.DESCRIPTION, C.APPLICABILITY, C.CATAGORYCODE, C.STATUTECODE, C.STATUTELEVEL
   FROM LAWENFC AS L, BOOKINGS AS B, CHARGES AS C
   WHERE (L.INCIDENTNUMBER = B.INCIDENTNUMBER) AND 
   (L.INCIDENTNUMBER = C.INCIDENTNUMBER);

Step 3. Transform legacy data to GJXDM type XML document

A major goal at the outset of this project was to find the right set of IBM tools that could transform legacy data into the IncidentReport XML document. The XML schema used in the IncidentReport XML is based on the standard GJXDM, Global Justice XML Data Model, schema. We looked at the XML tooling in the following IBM products:

  • WebSphere Rational Application Developer V 6
  • WebSphere Rational Application Developer V 7.0.0.3
  • DataStage v8.0 XML Pack
  • WebSphere Transformation Extender for DataStage V 8.1.0.1
  • WebSphere Transformation Extender Designer Studio V 8.1.0.2

Only the WebSphere Transformation Extender(WTX) Designer Studio V 8.1.0.2 successfully generates an XML tranformation map for the IncidentReport schema. However, InfoSphere DataStage requires the WTX for DataStage product. The problem is that WTX for DataStage v 8.1.0.1 contains an older version of the WTX engine which can not generate the XML transformation map for the GJXDM schema successfully.

We worked around the problem by using both products. We used the WTX Designer Studio v8.1.0.2 to generate the XML transformation map for the GJXDM schema. We used the WTX for DataStage v8.1.0.1 to execute the map as part of a DataStage job. We installed WTX for DataStage v8.1.0.1 on the workstation where the DataStage Designer Client was installed. We installed WTX Designer Studio v8.1.0.2 on a different workstation since the installation of WTX for DataStage installs a copy of the WTX Designer Studio as well as the WTX-DataStage integration code.

WTX for DataStage

Assuming you have access to a later version of WTX for DataStage that contains the WTX engine that is on par with WTX v8.1.0.2, you only need to install WTX for DataStage. After installing WTX for DataStage, you will see the WTX Designer Studio under the Start -> All Programs menu.

WTX for DataStage enables the WebSphere TX Map Stage that appeared under the DataStage Designer Client's Processing palette to function properly. Without installing WTX for DataStage, you will get an error message if you try to configure the WebSphere TX Map Stage.

You can create the WTX type tree and WTX map outside of the DataStage Designer Client by selecting the WTX Type Designer and WTX Map Designer directly via the Start -> All Programs menu. Alternatively, you can create the WTX type tree and WTX map by double-clicking the WebSphere WTX Map Stage after you have dropped the stage into your DataStage job.

Essentially, a WTX transformation map is created by mapping a source WTX type tree to a target WTX type tree. There are different methods of creating a WTX type tree. One method is to import an existing XML schema; this is the method described in this tutorial. Table 3 lists the steps you follow to create the transformation map:

Table 3. Steps to create the WTX transformation map
StepsDescription
1 Analyze the data to determine mapping.
2 Create a simple XML schema for the legacy data. We named it myInternalIncRpt.xsd.
3 Create a WTX type tree map for the XML IncidentRepor.xsd schema.
4 Create a WTX type tree map for myInternalIncRpt.xsd schema.
5 Create a WTX Transformation Map that maps the myInternalIncRpt type tree to the IncidentReport type tree.

Data analysis

Analyze the data to determine which legacy data fields to use to fulfill the IncidentReport request, and determine how each field will be mapped to the IncidentReport schema. See Downloads for an Excel spreadsheet, DBXml2IRMap.xls, in the sample code that lists the data fields from the legacy data sources that were selected to fulfill the IncidentReport. The spreadsheet also shows the mapping from the legacy data field to the corresponding data field in the IncidentReport schema.

Create XML schema for legacy data

Multiple rows of legacy data might be required to fulfill a request for an IncidentReport document. Currently, the WebSphere TX Map Stage only accepts one input card (WTX terminology for an input data stream). Because of this, we decided to transform the legacy data into a simple XML document. Doing this multiple rows of data into one XML document and feeds that XML document into the WebSphere TX Map Stage as one input card.

To get a copy of the XML schema file, myInternalIncRptNoNS.xsd, please go to Downloads.

Besides using the myInternalIncRptNoNS.xsd to generate the WTX type tree, the myInternalIncRptNoNS.xsd schema was also used in the DataStage job to transform the legacy data in a relational format to an XML document. This is discussed further in a later section, "Create the DataStage job".

Create WTX type tree for IncidentReport schema

The IncidentReport XML schema is publicly available. It is included in the Downloads package. Creating the WTX type tree is a matter of importing the IncidentReport XML schema into the WTX Type Designer.

  1. Select Start -> All Programs -> WebSphere Transformation Extender v8.1 -> Design Studio -> Type Designer.
    Figure 9. Select WTX type designer
    Select WTX type designer
  2. From the Type Designer menu bar, select Tree -> Import. From the Import window, select XML Schema and click Next. Figure 10 shows these steps:
    Figure 10. Import XML Schema
    Import XML Schema
  3. From the Import Wizard - XML Schema window, enter the location of the XML Schema and click Next.
    Figure 11. Enter location of the XML schema file
    Enter location of the XML schema file
  4. Select the language and click Next as Figure 12 shows:
    Figure 12. Select language
    Select language
  5. Enter the type tree name. Under Validation, select Xerces. Click Next.
    Figure 13. Select validation type
    Select validation type
  6. Once the Type Designer completes generating the type tree, click Finish as Figure 14 shows:
    Figure 14. Import of XML schema completed
    Import of XML schema completed
  7. After the import step is completed, you need to analyze the newly generated type tree for structural and logical correctness. From the Type Designer menu bar, select Tree -> Analyze -> Structure Only. A bunch of warnings will result. They are harmless. You can ignore them.
    Figure 15. Analyze the type tree structure
    Analyze the type tree structure
  8. Select Tree -> Analyze -> Logic Only. There should be no error. If there is any error, you need to resolve the error before type tree can be used for creating the map. Save the newly generated type tree.
    Figure 16. Analyze the type tree logic
    Analyze the type tree logic

Create WTX type tree for the legacy data

Follow the same steps listed above to create a WTX type tree for the legacy data XML schema, myInternalIncRptNoNS.xsd.

Create WTX map

After the WTX type trees for the IncidentReport XML schema and the legacy data XML schema are created, the next step is to create the WTX transformation map.

It is beyond the scope of this tutorial to provide detailed steps on how to use the WTX Map Designer or how to map a source type tree to the target type tree. However, our source code is available for download. If you open the IRMap.mms in your Map Designer, you will see how the legacy data type tree is mapped to the IncidentReport type tree.

Refer to the Resources section for a list of suggestions on where to get additional learning modules for the WTX product.

Notes on using the Map Designer:

  • We found it helpful to place all the XML schemas, WTX type trees and WTX maps into the same subdirectory. Doing it this way keeps all the files together, and we liked copying the entire directory into the DataStage Server subdirectory. This is discussed in greater detail in a later section, Create the DataStage job.
  • Generate a sample XML document from the legacy data XML schema. For example, we generated the myInternalIncRptNoNS.xml from the myInternalIncRptNoNS.xsd file. Use the generated XML document to test out the WTX Map.
  • There was a known problem in the XML Pack of DataStage v8.0.1 that it could not correctly process XML namespaces. Therefore, the myInternalIncRptNoNS.xsd had to be created without any namespace information.

Step 4: Create the DataStage job

Figure 17 shows the IncidentReport DataStage job. The job name is IncRptFedViewParm. The job sequence is from left to right. It follows the pattern of data retrieval, data transformation, data mapping and data delivery.

Figure 17. IncidentReport DataStage job
IncidentReport DataStage job

Table 4 gives a brief description of each stage in the job.

Table 4. Description of the stages in the IncRptFedViewParms job
StagesDescription
DB2 API (incReport) Retrieves data from the IncReport federated view.
Transformer Transforms the data retrieved from the incReport Stage to the metadata defined by the myInternalIncRptNoNS.xsd schema.
XML Output (DBXml_Output) Generates an XML document based on the myInternalIncRptNoNS.xsd schema.
WebSphere TX Map This Stage calls the WebSphere Transformation Extender to map the XML document generated by the DBXml_Output Stage to the target IncidentReport schema. The result is the IncidentReport Xml document.
IISD Output Returns the IncidentReport Xml document to the Web Service caller.

Develop DataStage job incrementally

We found it helpful to develop the IncidentReport job in an incremental fashion. We divided the IncidentReport job development into three increments. At the end of each increment, we compiled and tested the job. We would not proceed to the next increment until the testing was successful. The three increments are:

  1. A job that consists of the DB2 API stage, the Transformer stage and the XML Output stage. This increment tests the logic of retrieving the data from the legacy databases, performing minor transformation on the data and generating an XML document for the myInternalIncRptNoNS schema.
  2. In this increment, we added the WebSphere TX Map stage to the previous increment. This increment tests the newly added logic for generating the XML document for the GJXDM schema.
  3. In the third increment, we added the IISD Output stage to the previous increment. This increment tests the capability of enabling the IncidentReport job for information services.

Steps for creating a DataStage job

The general steps for creating the DataStage job are as follows:

  1. Create table definitions for all the data that the job will process.
  2. Design the job by selecting the available stages in the Designer’s palette.
  3. Compile, run and test.
  4. Enable the job for Information Services. That means to prepare and make the job available for InfoSphere Information Services Director (IISD) to build and deploy it as a Web service.

Refer to the InfoSphere DataStage and QualityStage Designer Client Guide, SC18-9893-00, for further instruction on developing DataStage job. (See Resources).

Step 1. Create table definitions for the metadata

The IncRptFedViewParms job uses data retrieved from the IncReport federated view and the myInternalIncRptNoNS.xsd schema. Metadata table definitions need to be created for these two entities.

Metadata definition for IncReport federated view

Create the metadata table definition for the IncReport federated view by using the Import Table Definition wizard. From the menu, select Import -> Table Definition -> Orchestrate Schema Definition or right-click on the Repository panel and select Import Table Definition -> Orchestrate Schema Definition. Figure 18 illustrates these steps:

Figure 18. Create the metadata table definition for the IncReport federated view
Create the metadata table definition for the IncReport federated view

When the Orchestrate Schema Definition wizard completes the import, it creates the table definition for the IncReport federated view.

Refer to Chapter 8 "Table Definitions" in InfoSphere DataStage and QualityStage Designer Client Guide, SC18-9893-00, for detail configuration instructions on defining metadata table (see Resources).

Figure 19. The IncReport federated view metadata table definition
The IncReport federated view metadata table definition

Metadata definition for the legacy data XML schema

You need to create a metadata definition for the myInternalIncRptNoNS.xsd schema as well. To do that, from the menu, select Import -> Table Definition -> XML Table Definition, or right-click on the Repository panel and select Import Table Definition -> XML Table Definition. The Import XML Table Definition wizard starts the XML Meta Data Importer tool which is part of the DataStage XML Pack.

Refer to InfoSphere DataStage Version 8 XML Pack Guide, SC18-9947-00, for instructions on using the XML Meta Data Importer tool (see Resources).

Figure 20. The XML Meta Data Importer
The XML Meta Data Importer

From the XML Meta Data Importer menu bar, select File -> Open -> File to locate the myInternalIncRptNoNS.xsd schema file. After the XML Meta Data Importer has successfully loaded the schema, you will see a display similar to that in Figure 21. The top left panel displays the schema in a tree-like structure. Select the check box next to the data element you want to have included into the table definition. When you click on the check box, you will see the data element display in the Table Definition panel. For each data element that you select, you may also need to made changes to the values of SQL Type, Length and the like.

Notes on XML Meta Data Importer

We found that the XML Meta Data Importer worked well for XML schema that was in the lower range of complexity. The tool provided a quick way for us to create a table definition for the myInternalIncRptNoNS.xsd schema which was simple and straight forward.

However, the XML Meta Data Importer could not load the IncidentReport schema successfully. The tool ran out of memory. In our opinion, once the schema contains a lot of nodes, the tool becomes very cumbersome to use. For moderate to complex XML schema, WTX for DataStage is a better choice. The downside is WTX for DataStage is a different product that the user must buy separately.

The Description column, in Figure 21 under the Table Definition pane, contains the XPATH definition of each data field. At the time that we used the tool, there was a known problem in the DataStage XML Pack. This problem was that XML Output Stage could not process the XML namespace properly. As a result, we had to modify each XPATH under the Description column to remove the namespace tag (that is, the defns: tag).

Figure 21. Select the data fields to be included in the table definition
Select the data fields to be included in the table definition

Figure 22 shows the table definition after that the XML Meta Data Importer finishes importing the XML schema. Note the XPath under the Description column. The XML_Output Stage uses the XPaths to generate the xml document.

Figure 22. Metadata table definition for the legacy data XML schema
Metadata table definition for the legacy data XML schema

Step 2. Design the DataStage job

To create a DataStage job, drag the Stage from the Palette panel and drop it onto the job canvas, as Figure 23 illustrates:

Figure 23. Select the stage to be included in the DataStage job
Select the stage to be included in the DataStage job

The DB2 API stage

The DB2 API stage (incReport) retrieves data from the IncReport federated view. It needs to know the metadata definitions of the data it will be retrieving. Figure 24 shows the properties of the DB2 API stage. You go to the Column tab and select the Load button. That prompts you through the steps of loading the IncReport federated view table definition that you have previously defined. Defining the metadata table definition and loading it into the stage is very typical of what you need to do to configure a DataStage stage.

Refer to InfoSphere DataStage Version 8 Connectivity Guide for DB2 Databases, SC18-9932-00, for detail configuration instructions about the DB2 API stage (see Resources).

Figure 24. incReport Stage column definitions
347

Figure 25 shows the SQL Select statement used to retrieve the data. The DB2 API stage has a SQL Builder wizard to help you construct the SQL statement. Note the #IncidentNumber# in the WHERE clause. That is a job parameter.

Figure 25. IncReport Stage query properties
IncReport Stage query properties

You define the job parameter via the Job Properties window as Figure 26 illustrates. Every time the DS job runs, it prompts you for the parameter.

Figure 26. IncRptFedViewParms job parameters
IncRptFedViewParms job parameters

XML Output stage

The function of the XML_Output stage is to generate an XML document. In the IncRptFedViewParms job, the XML_Output stage is used for generating the myInternalIncRptNoNS XML document. The XML_Output stage is configured to expect the data elements defined in the myInternalIncRptNONS table definition as input (see Metadata definition for the legacy data XML schema). Figure 27 shows the data columns after the myInternalIncRptNoNS table definition is loaded. Note the XPaths under the Description column.

Figure 27. XML_Output stage’s input page
XML_Output stage’s input page

Figure 28 shows that the XML_Output stage is expected to output one column of data, and the data is the myInternalIncRptNoNS XML document. The “/” under the Description column means the entire xml document.

Figure 28. Xml_Output Stage's output page
Xml_Output Stage's output page

Refer to InfoSphere DataStage Version 8 XML Pack Guide, SC18-9947-00, for configuration instructions on XML Output stage (see Resources).

The Transformer stage

The Transformer stage is used for transforming and mapping the data retrieved from the IncReport stage to the metadata definition defined in the myInternalIncRptNoNS schema. Figure 29 shows the Transformer stage properties:

Test job increment 1

At this point, the IncidentReport job contained the DB2 API stage, the Transformer stage and the XML Output stage. To facilitate debugging the logic, we configured the XML Output stage to write the XML document to a file. To do that:

  • Open the XML Output stage properties.
  • Under the Input tab, select the Options tab.
  • Select the Write output to a file check box.
  • Enter the fully qualified filename for the XML document next to the File path label.

Each time after the job completed running, we checked the XML document to ensure that the result was as expected.

Refer to Compile, run and test for instructions on how to compile and run a test.

Figure 29. Transformer stage properties
Transformer stage properties

Refer to Chapter 19 "Transformer Stage" in the InfoSphere DataStage Version 8 Server Job Developer Guide, SC18-9898-00, for detailed instructions on configuring the Transformer Stage (see Resources).

WebSphere TX Map stage

The WebSphere TX Map stage is DataStage’s integration point with the WebSphere Transformation Extender (WTX) product. It enables users of the DataStage product to call WTX and use it to do data mapping and transformation.

You can launch the WTX Type Designer and Map Designer from within the WebSphere TX Map Stage to create and/or edit WTX Type Tree and WTX Map. We created the WTX map (*.mmc file) outside of the DataStage Designer, then copied the entire directory that contained the WTX Type Trees and Maps and pasted it as a subdirectory under the IIS Server WTX map directory (that is, C:\IBM\InformationServer\Server\TXMaps). Figure 31 shows the WTX Map Stage properties:

Figure 30. Map Stage properties
Map Stage properties

Refer to Chapter 9 "DataStage TX Map Stages" in the InfoSphere DataStage Version 8 Server Job Developer Guide, SC18-9898-00, for detailed instructions on configuring the WTX Map Stage (see Resources).

There is one configuration step within the WebSphere _TX_ Map stage that may not be clear. On both the Input and Output tabs, you have to configure the "Type for this Card" even though the fields are greyed out. Select the drop-down icon next to the Edit button, and fill out the "Configure Link Row and Column Processing for Type" dialog box. Since our data was an XML document, we selected Raw Data.

Figure 31. Input and output cards configuration
Input and output cards configuration

IISD Output Stage

The IISD Output Stage enables a DataStage job for information services. Once a IISD Output or Input Stage is added to a DataStage job, the control for running the DataStage job is handled by the Information Services Director instead of the DataStage Director. For the ease of unit testing the majority of the IncidentReport job logic, we deferred adding the IISD Output Stage until the step described in the section "Enable job for Information Services".

Step 3. Compile, run and test

To compile the IncidentReport DataStage job:

  1. Open the DataStage job. The filename for the IncidentReport job is IncRptFedViewParm.
  2. Select File -> Compile or select the Compile job icon from the Toolbar.

At this point, the IncidentReport job has not yet been information-services enabled. Therefore, it's execution is still under the control of the DataStage Director Client. The job contains the DB2 API stage, the Transformer stage, the XML Output Stage and the WebSphere TX Map stage. To run the IncidentReport job, from the DataStage Designer select Tool -> Run Director. This starts up the DataStage and QualityStage Director Client, as Figure 32 illustrates:

Figure 32. DataStage and QualityStage Director
DataStage and QualityStage Director

From the DataStage Director, highlight IncRptFedViewParm job and select the run icon from the Toolbar.

The job prompts you to enter an IncidentNumber. At the completion of the job, it outputs the IncidentReport XML document at the location specified by the WTX Map. To find out the location of the XML document, open the WTX Map and edit the Output Card properties. From the Output Card properties window, you will see an entry for Target File Path which contains the filename and path of the IncidentReport XML document. Figure 33 shows the WTX Map Output card properties:

Figure 33. WTX Map Output card properties
WTX Map Output card properties

Refer to the InfoSphere DataStage Version 8 Director Client Guide, SC18-9894-00, for detail instructions on operating the DataStage Director Client (see Resources).

Step 4. Enable job for Information Services

At this point the IncidentReport job is ready to be enabled for information services. For the IncidentReport job that means:

  • Adding the IISD Output stage and configuring the stage to return to the caller the IncidentReport XML document.
  • Modifying the Job Properties to select relevant parameters to enable the job for information services.

For a more in-depth discussion on the IISD Output and Input Stages, refer to the InfoSphere Information Services Director documentation.

Adding IISD Output Stage

The IISD Output Stage takes the IncidentReport XML document generated by the WTX Map stage and returns it to the Web service caller. For every data element to be returned to the caller, the data element must be defined in the IISD Output stage's Input Column table as Figure 34 shows. The IncidentReport job had only one data element which was the IncidentReport XML document. It is defined as one very long character string.

Figure 34. IISD Output Stage
IISD Output Stage

Real-Time Integration (RTI) Compliant Job

The IncidentReport DataStage job that we developed was considered to be a batch job. Batch job is not always on. Each time a caller requests for an incident report, the IncidentReport DataStage job has to be created. While batch mode is acceptable for demo purposes, the preferred practice is to create a job whose instance is always on.

The preferred job topology is to have both the IISD Input stage and IISD Output stage. DataStage job that has IISD Input stage will always be on. A DataStage job that has both IISD Input and Output stages is considered to be RTI-Compliant.

To change the IncidentReport job into a RTI-compliant job, we would have to modify the job to use the ODBC stage instead of the DB2 API stage. We were not able to incorporate the change into our schedule.

As mentioned above, once a DataStage job has been enabled for information services, the control for its execution shifts from the DataStage Director to the Information Services Director. To test an information-services-enabled DataStage job, you need to use the Information Services Director to publish and deploy the job. Then use a Web service-enabled tool to test the job. This is discussed further in the next section .

Setting up the job properties

There are two parameters from the Job Properties window that must be selected to enable the IncidentReport job for information services. They are:

  • Allow multiple instance
  • Enable for information services

Figure 35 shows these properties:

Figure 35. IncidentReport job properties
IncidentReport job properties

Step 5. Publish and deploy the DataStage job as a Web service

In IIS, the process of surfacing a DataStage job as a Web service is called creating SOA services. The InfoSphere Information Services Director (IISD) is the tool that you use to accomplish this task . You access IISD through the Console for IBM Information Server.

To create SOA services, follow these steps:

  1. Create connection to an Information Provider.
  2. Create a project.
  3. Create an application.
  4. Generate, deploy and test SOA services.
  5. Optionally export service to WSRR.

To publish and deploy the IncidentReport DataStage job as a Web service, we created a project called, ISE_Offerings. In the ISE_Offerings project we created an application called ServDynIncReport. In the ServDynIncReport application, we created a service called GenIncidentReport. In the GenIncidentReport service, we created an operation called getIncRptByIncNum which was mapped to the IncRptFedViewParm DataStage job. The IncRptFedViewParm job is the DataStage job that generated the IncidentReport XML document.

After we created the ServDynIncReport application, we deployed it. Deploying the application made it available as a Web service. We did not do the optional step of exporting the application to WSRR.

For further discussion on the IISD tool, please refer to the InfoSphere Information Services Director documentation.

Step 1. Create connection to an information Provider

IISD supports two Information Services connection types: DataStage and QualityStage type and DB2 or Federation connection type. The IncidentReport Web service uses the DataStage and QualityStage type since the DataStage Server is its information provider. To create connection to the DataStage Server:

  1. Log on to the Console for IBM Information Server.
  2. Select Home -> Configuration -> Information Services Connections
  3. From the Information Services Connection page, highlight the existing DataStage Server connection and select Open from the Tasks panel. You only need to enable the connection by entering the User Name and Password. Enter the Userid who has administrative access to the DataStage server. In our case, we used Administrator.
  4. Test the connection as Figure 36 shows:
    Figure 36. Test the DataStage Server connection
    Test the DataStage Server connection
  5. After the connection test completes successfully, select the Save, Enable and Close button.

Step 2. Create a project

  1. From the My Home page, select the New Project button to create a project.
  2. On the New Project window, select Information Services type.
  3. Enter a project name. We created a project called ISE_Offerings as Figure 37 shows:
    Figure 37. Project lists on My Home page
    Project lists on My Home page

Step 3. Create an application

To create an application under the ISE_Offerings project:

  1. From the My Home page, highlight the ISE_Offerings project, and select Open Project.
  2. From the project menu bar, select DEVELOP -> Information Services Application :
    Figure 38. Select Information Services application
    Select Information Services application
  3. From the Information Services Application page, select New under the Tasks panel to create a new application.
  4. From the New Application page, enter the application name on the Name field. We entered ServDynIncReport.
  5. Select Save Application. At this point, you have created a new application. Figure 39 shows the new information-service application:
    Figure 39. New information-service application
    New information-service application

Step 4. Generate, deploy and test SOA services

  1. With the application page still open, select New -> Service to create a new information service.
  2. Enter a service name. We entered GenIncidentReport as Figure 40 shows:
    Figure 40. New service
    New service
  3. Select Bindings. Click the Attach Binding drop-down. Select SOAP over HTTP:
    Figure 41. Select service binding
    Select service binding
  4. From the left navigation menu, expand the Operations twistie. Select newOperation to create a new operation. Enter a name for the new operation. We entered getIncRptByIncNum as Figure 42 shows:
    Figure 42. New operation
    New operation
  5. Click Select next to "Select an information provider".
  6. From the Select an Information Provider page, select DataStage and QualityStage in the Type drop-down list. Figure 43 shows this step:
    Figure 43. Select Information Provider type
    Select Information Provider type
  7. When the "Select a DataStage Job" panel appears, keep on expanding the relevant folder under the DSServer until you see the IncidentReport DataStage job which is called IncRptFedViewParm. Notice that only the DataStage jobs that have been enabled for information services appear in this list.
    Figure 44. List of information service enabled DataStage jobs
    List of information service enabled DataStage jobs
  8. Select IncRptFedViewParm and click OK. You will be returned to the operation page as Figure 45 shows:
    Figure 45. Map DataStage job to service operation.
    Map DataStage job to service operation.
  9. Notice the incidentNumber argument under the Inputs tab. The IncRptFedViewParm job expects an incidentNumber parameter to be passed into the job. It defines the incidentNumber parameter in its Job Properties. IISD automatically detects parameters that are defined in the Job Properties and creates corresponding input arguments for the service operation. When a Web service consumer calls the getIncRptByIncNum operation, it has to provide the incidentNumber argument. Figure 46 shows the operation's input arguments:
    Figure 46. The operation's input arguments
    The operation's input arguments
  10. Notice the document argument under the Outputs tab. The IISD Output Stage of the IncRptFedViewParm is configured to return the IncidentReport XML document to the caller. IISD automatically detects the output parameters defined in the IISD Output Stage and creates corresponding output arguments for the service operation. When a Web service consumer calls the getIncRptByIncNum operation, it can expect to receive a String data type back from the operation. The String happens to be the IncidentReport XML document as Figure 47 shows:
    Figure 47. The operation's output arguments
    The operation's output arguments
  11. Select Save Application, then Close Application to complete the process of creating the ServDynIncReport application.
  12. After the application is created, the next step is to deploy the application. From the Information Services Application page, highlight the ServDynIncReport application. Select Deploy under the Tasks panel. Follow the deploy wizard to complete the deployment process.

Manage the deployed application

The IISD tool provides you the functions to manage the deployed information service application. From the project menu bar, select Operate -> Deployed Information Services Application . Figure 48 shows this menu selection. From the Deployed Applications page, you can

  • Undeploy the application.
  • Disable the application.
  • Find out the url of the WSDL file.
  • Look at the WSDL file.
Figure 48. Deployed information services application menu selection
Deployed information services application menu selection

To undeploy an application:

  • Select the application (that is, ServDynIncReport)
  • Select Undeploy as Figure 49 shows:
    Figure 549. The deployed applications page
    The deployed applications page

To find out information about the wsdl file for an SOA Service such as the wsdl file for the GenIncidentReport service, follow these steps:

  • Click on the twistie next to ServDynIncReport application to expand the application.
  • Select the GenIncidentReport service. On the right panel, find View Service in the Catalog.
    Figure 50. ServDynIncReport application deployed view
    ServDynIncReport application deployed view
  • Select View Service in Catalog. That brings up the Service Catalog in a separate browser window.
  • From the Information Services Catalog window, select Binding.
  • Click the SOAP over HTTP twisty as Figure 51 shows:
    Figure 51. GenIncidentReport Service binding information
    GenIncidentReport Service binding information
  • Click the Open WSDL Document link. That opens up the wsdl document in a separate browser window. The wsdl url is displayed on the browser’s address.
    Figure 52. WSDL file for the GenIncidentReport Service
    WSDL file for the GenIncidentReport Service

Test the deployed application

There are multiple ways to test the GenIncidentReport WebService:

  • You can use the Web Explorer in the RAD, RSA or RDA tool.
  • You can use the RAD or RSA to generate a test client for the WSDL file.
  • You can write your own servlet or portlet application to consume the GenIncidentReport WSDL file.

In our case, we tested the GenIncidentReport WebService using the Web Explorer in the RDA and RAD tools. We generated a test client for the GenIncidentReport WSDL file using the RAD tool. We also developed a portlet to call the GenIncidentReport WebService because our demo required displaying the IncidentReport XML document on the WebSphere Portal Server.


Conclusion

The IBM Information Server(IIS) provides a unified set of tools that can be used to support all types of information integration. In addition, the common services of IIS allow you to manage how services are deployed. This allows cleansing and transformation rules or federated queries to be published as shared services within an SOA, using a consistent and easy-to-use mechanism.

This article discussed how to integrate different legacy data models into a common information sharing model by generating and deploying information as a web service. The IIS suite of tools provides many other components that assist in building other types of integration solutions.

As you familiarize yourself with the other IIS functions and services, you will be able to create other, more interesting information integration scenarios.


Download

DescriptionNameSize
Sample code for articlesamples.zip1215KB

Resources

Learn

Get products and technologies

  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from IBM Information Management®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=288046
ArticleTitle=Use IBM InfoSphere Information Server to transform legacy data into information services
publish-date=11122012