Skip to main content

Data integration and composite business services, Part 1: Provide a single view of a customer

WebSphere Customer Center integration into an existing banking environment

Mary Taylor (marytaylor@us.ibm.com), Senior IT Architect, IBM
Author Photo: Mary Taylor
Mary Taylor is a senior IT architect. She works in the Strategic Technology Architecture and Incubation (STAI) team and is currently working on an SOA CBS pilot . Her interests include DB2 and DataStage.
Li (Tony) Wei (weili@cn.ibm.com), Software Engineer, IBM
Li (Tony) Wei's photo
Li Wei is a software engineer. He is working in the China Emerge Technology Institute (CETI) team.

Summary:  The ability for a company to integrate multiple repositories of customer data to provide a "single view of the customer" is critical for effective Service-Oriented Architecture (SOA) transformations. WebSphere® Customer Center (WCC), a building block for solutions, provides a central hub to integrate all customer data and acts as the single repository to all the channels and lines of business that need to access and update that data. This article is the first in a series that describes how to integrate WCC into an existing environment, using a banking business as an example scenario.

View more content in this series

Date:  28 Jun 2007
Level:  Intermediate
Also available in:   Russian

Activity:  1093 views
Comments:  

Introduction

This article is the first in a series of articles that address data integration aspects in the development of composite business services (CBS). A CBS is a collection of related and integrated business services that support a business process built on an SOA. Many enterprises have multiple repositories of customer data to support their different channels or lines of business.

Following a sample banking scenario originally described in "Building SOA composite business services, Part 1" (developerWorks, Jun 2006), you see how to perform the initial load of data into the WCC repository.

Scenario: Banking information integration

Consider the following hypothetical scenario. As part of their customer data integration (CDI) initiative, Jivaro Bank has just purchased the WCC product to store their customer name and address information. In this scenario, Jivaro is going to continue to use their own legacy repository for their other banking data, such as account and segment information, but that data could have been migrated to WCC. By consolidating their customer name and address data into WCC, Jivaro is able to eliminate the multiple channel repositories that currently contain this data, and is able to provide a common set of services to access that data from all areas within the company. Once the customer name and address data has been migrated to WCC, Jivaro will use an information service to retrieve both the legacy Jivaro credit information and the WCC customer information for display in the Customer Profile portlet. The WebSphere Information Server Directory (WISD) capabilities of IIS are used to create that information service (see Figure 1 below). The use of WISD is not covered in this article, but will be the topic of a future article in this series.


Figure 1. Integration of WCC repository into Jivaro
Integration of WCC repository into Jivaro

The integration of this CDI software is accomplished by executing a sequence of steps:

  1. Load legacy data into the WCC repository
    • The first step involves moving the customer name and address data from the existing Jivaro data stores into the WCC repository. Any ETL tool can be used to load the WCC repository, but this task is made simpler by using WebSphere DataStage, since IBM provides High Volume Load (HVL) DataStage jobs that handle the insertion of legacy data into the WCC repository. DataStage is a module in the Information Server suite of products. When using the HVL jobs, the only task left to the practitioner is the task of producing the HVL input file containing the data to be loaded. Producing that input file is one of the topics covered in this article. The Jivaro data is stored in DB2, but DataStage supports a virtually unlimited number of data sources including text files, complex data structures in XML, ERP systems such as SAP and PeopleSoft and almost any database (including partitioned databases), Web services, and business intelligence tools like SAS.
  2. Create a WCC portlet
    • The second step involves using the WCC Data Stewardship User Interface to replace the Jivaro customer profile portlet with a new portlet. The new portlet provides access to the more complete set of customer profile attributes that the WCC repository provides.
  3. Extend WCC services
    • The third step involves extending the WCC-provided services to access the Jivaro-specific information that isn't addressed by the WCC services. Jivaro bank will create a new service to allow searching on a Jivaro-specific customer ID field within the WCC repository.

Steps 2 and 3 are described in the second article in this series.

Load legacy data into the WCC repository

There are three tasks involved in loading the Jivaro data into the WCC repository.

  1. Map the legacy data to the WCC HVL record formats.
  2. Create a DataStage job to produce a file containing the legacy data in the HVL format.
  3. Running the HVL job sequences to load the WCC repository.

The "Map legacy data to the WCC HVL formats" section describes how to approach the mapping effort, using mappings from the Jivaro database as examples. The "Create the DataStage job to produce the HVL input file" section describes the Jivaro DataStage jobs that were created in order to produce the load file with the Jivaro data. The "Running the HVL job sequences" section describes how to run the HVL-provided load jobs.

1. Map legacy data to the WCC HVL formats

WCC provides a set of HVL jobs that use WebSphere's DataStage to load the WCC repository with legacy data. These jobs require a single file as input. This file should contain all of the legacy data to be loaded, in the proper record layouts defined by the HVL documentation. HVL has defined three types of data, called record types, that can be loaded into WCC: contact, organization, and contract. For each of these record types, there are associated "subtypes" that further break down the information in each of the record layouts. You must map your legacy data to the file formats defined for these record types/subtypes and provide a single file as input to the WCC DataStage load job. See Figure 2 for the types/subtypes supported.


Figure 2. HVL record types and sub types
HVL record types and sub types

The order of the records within the input file provided to the HVL job is critical. The records must be sorted by customer and then by record type/subtype. This sort sequence assures that the proper primary and foreign keys are created in the WCC database. For additional guidelines, refer to the HVL documentation.

Mapping the legacy data to the HVL file formats can be accomplished by following a sequence of steps:

  1. Map obvious relationships, such as commonly recognized data elements
    • Street and city in Jivaro map to addr_line_1 and city.
  2. Map non-obvious relationships by reviewing the WCC HVL documentation to understand the content of each input record.
    • Legacy Jivaro data contains three phone numbers. A phone_number field does not exist in the HVL layouts. HVL specifies a REF_NUM field in the LocationGroup_ContactMethodGroup_ContactMethod file that stores a contact number, phone or email, and a corresponding code value in the input file that is set based on the type of contact number being provided, such as 1 for home phone, 7 for personal email.
  3. Translate legacy data literal values to WCC integer code values.
    • WCC contains more than 90 reference tables that hold integer code values and their descriptions. Integer code values are required in many of the input file formats.
    • Marital status is a literal value in Jivaro, yet it is a code value in WCC. In order to map that properly, you must lookup marital status in the WCC reference table, find a match on the description, and retrieve the appropriate code. This code is then specified in the HVL input file.
  4. Insert new code values into the WCC reference tables, when required to represent the legacy data.
    • The WCC reference tables contain a small set of sample reference values that might not cover the needs of all legacy systems.
    • Jivaro contains the values "Regular" and "Premium" to represent the segment that a customer belongs to. In WCC, there is a code for "client importance", which is seeded with the sample values "Bronze", "Platinum", "Gold", and "Silver." Prior to loading the Jivaro data, two additional code values need to be added to the reference table, and the appropriate code value provided in the input file. In addition, the sample values could be deleted if they were never going to be used in Jivaro.
  5. Map the legacy data that does not have an obvious mapping to the WCC entities.
    • For all remaining unmapped fields, investigate the additional WCC entities that are designed to handle miscellaneous and legacy-specific data. Determine if there is a "best fit" choice for mapping the legacy element. If a fit cannot be found, a WCC entity could be "extended", meaning altered to add an additional column, to handle the legacy data. If an extension is required, that data would need to be loaded manually, not through the HVL job, and performance considerations would need to be addressed. It might be necessary to add an index on that new field if search capability is going to be provided.
    • Jivaro contains an alphanumeric customerId element that is an identifier that each customer uses to identify themselves to the bank. The Jivaro customerId cannot be mapped to the WCC contact ID, because contact ID is a randomly generated number. The Contact table could be extended to include an additional column, or another WCC entity could be used to store that value. Jivaro used the WCC entity called "Contact Equivalent" that is designed to store customer identifying information. Since this field is needed for search capabilities, the WCC search services have been extended to search on this field. This service extension is described in the second article in this series.

2. Create the DataStage job to produce the HVL input file

This section describes the Jivaro usage of the WCC-provided HVL jobs as well as the jobs that were created in order to produce the HVL load file with the Jivaro legacy data.

The first step is to import the WCC DataStage project into a new project in DataStage. To accomplish this, the HVL instructions that were provided by WCC were followed.

The next step is to create a job sequence within DataStage to produce the single HVL input file containing the legacy data to be loaded into WCC.

To create the Jivaro job sequence, three tasks must be executed (see Figure 3 below):

  1. Create a parallel job called RetrieveAllCustomerData that produces a single output file for each record type/subtype required to represent the legacy Jivaro data.
  2. Using a job sequence stage call CMDtoMergeFiles, append the files into a single file.
  3. Create a second parallel job called FormatWCCInput to sort the single file into the order required by the HVL load job.

Once the single load file containing Jivaro legacy data is created, run the HVL job sequence provided by WCC to load the WCC repository.


Figure 3. Job sequence to produce the HVL input file
Job sequence to produce the HVL input file

I. Create the parallel job

Begin by creating a parallel job to retrieve the legacy data from Jivaro, transform that data as necessary, and store that transformed data in a series of flat files, one per record type/subtype (see Figure 4 below). As part of this job, you need to add lookup stages for the data elements that have to be translated from literals to code values. Then use a transform stage to parse the data into the individual record type/subtype formats that were identified as part of the mapping exercise. Design the transform stage to create a separate output link per record type/subtype, and where necessary, create multiple instances of a record type.


Figure 4. RetrieveAllCustomerData parallel job to retrieve legacy Jivaro data
RetrieveAllCustomerData parallel job to retrieve legacy Jivaro data

II. Create a single file

The output of the parallel job is 11 flat files, each with their own record layouts. These files need to be combined into a single file, which is then sorted in the second parallel job. It was found that the simplest way to merge all of the files into a single file is to use the Execute Command stage of a job sequence. This example is running on a Windows operating system, and the TYPE command is used within the Execute Command stage to append all of the files together (see Figures 5 and 6).


Figure 5. Beginning of the TYPE command
Beginning of the TYPE command

Figure 6. Target of the TYPE command is the file allFileInput
Target of the TYPE Command

While this approach is the simplest, it has a couple of drawbacks. First, the Execute Command issues a platform specific command. This needs to be tailored if running in an AIX environment. Second, parameters are not allowed in the Execute Command stage, so the path of each of the files had to be hardcoded. You'll notice in Figures 5 and 6 that the path of c:\dsland\src needs to be provided for each of the files being appended into the target file. If this job was to move to another DataStage server, this Execute Command stage would have to be modified to specify the correct path for each of the files.

III. Sort the file

The final step is the creation of a parallel job that took the file created in the Execute Command as input, passed the data through a sort stage to sort by customer ID, record type, and record subtype, then store the data in a single flat file. Then specify this file as the input file to the HVL load job (see Figure 7).


Figure 7. FormatWCCInput
FormatWCCInput

Once the HVL input file is created, you are able to run the HVL job sequence to process that legacy data.

Run the HVL job sequences

HVL provides two job sequences. One sequence loads the data directly into the WCC repository (database mode), the other creates load files that you can process using an SQL command to load the data into WCC (file mode). Both sets of sequences execute the same set of validation jobs. See Figure 8 for a snapshot of the DBLoadJobFlow sequence, which loads the data directly into the WCC repository.


Figure 8. DBLoadJobFlow sequence that processes the Jivaro input file
DBLoadJobFlow sequence

The HVL sequences invokes a ParseInput job that reads the input file provided, and parses it into individual data sets containing the data for each record type/subtype (see Figure 9).


Figure 9. ParseInput HVL job
ParseInput HVL job

The remaining jobs in the job sequence validate and format the data for each of the data sets. There is a validation job per data set and the flow of each of those jobs is very similar. The first step involves assigning a key value for the data, then each of the reference code values are validated against the WCC reference tables. If any errors are encountered during the processing of a particular data set, the error is written to a log file, and the customer key value is stored in a file that can be used later to delete that customer's data from the WCC database so that only complete customer records are loaded into WCC. Once the customer data in error is fixed, that data can be re-run through the HVL job and subsequently loaded into WCC.

When loading the data into the WCC database, the order of loading the tables is important due to referential integrity constraints. It is necessary to properly order the loads if loading manually, or if inserting directly through DataStage there is the requirement to drop all foreign keys prior to running the DataStage job and then re-establishing them after. This is documented in the HVL guide.

A best practice identified was to first run the WCC DataStage job in file mode to ensure that all of the load files were properly created. Then create an SQL job that could clean the WCC tables by removing only the data that was inserted through the DataStage job. This enables you to run the WCC DataStage job in database mode repeatedly.

Conclusion

CDI, or the "360 degree view of the customer", are terms that refer to the ability to provide a complete view of the customer across brand, channel, and company. WCC accomplishes this by providing a repository that is capable of storing that data along with a framework for accessing and extending that data. WCC simplifies the migration of legacy data into the WCC repository by providing DataStage jobs to assist with that task. A five-step process was identified to map legacy data to the HVL file format, and a three-step DataStage job sequence was identified to produce the legacy input file. As a result of Jivaro's integration with WCC, Jivaro now has a consolidated, multi-channel customer repository that is accessible through a common set of services provided by WCC. Read the second article in this series for details on how to create a WCC customer profile portlet, how to integrate the WCC services into the Jivaro environment, and how to extend the WCC-provided services.



Downloads

DescriptionNameSizeDownload method
Demo 1WCCPart1ExplainingWCC.wmv15MBHTTP
Demo 2WCCPart2DataStageV2.wmv17MBHTTP
Demo 3WCCPart3DataStage.wmv13MBHTTP

Information about download methods


Resources

Learn

  • "Building Composition Business Services, Part 1" (developerWorks, Jun 2006): Create composite business services. Each article in this series documents how to use a CBS-enabling technology in the context of a Software-as-a-Service hosted banking environment.

  • WebSphere Customer Center: Find product information and downloads. Specifically, find a guide for DataStage WCC load jobs: "IBM IIS High Volume Load (HVL) Solution Installation Instructions For IBM WebSphere Customer Center" (WCCHVLSolutionInstallInstructions.pdf), as well as a data dictionary for WCC: (DWLCustomerDataDictionary.pdf), available when you download the product.

  • Visit the Information Integration page on developerWorks to read articles and tutorials on the IBM Information Server and connect to other resources to expand your information integration skills.

  • Browse the technology bookstore for books on these and other technical topics.

  • developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.

  • Stay current with developerWorks technical events and webcasts.

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

About the authors

Author Photo: Mary Taylor

Mary Taylor is a senior IT architect. She works in the Strategic Technology Architecture and Incubation (STAI) team and is currently working on an SOA CBS pilot . Her interests include DB2 and DataStage.

Li (Tony) Wei's photo

Li Wei is a software engineer. He is working in the China Emerge Technology Institute (CETI) team.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere, SOA and Web services
ArticleID=236768
ArticleTitle=Data integration and composite business services, Part 1: Provide a single view of a customer
publish-date=06282007
author1-email=marytaylor@us.ibm.com
author1-email-cc=flanders@us.ibm.com
author2-email=weili@cn.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers