Communicate changing requirements to integration developers with IBM InfoSphere FastTrack

Using the FastTrack validation, remap and job annotation features

This article helps you to understand how to deal with changing data sources and business requirements by using IBM® InfoSphere® FastTrack to define mapping specifications, and using IBM InfoSphere DataStage to develop ETL jobs. An example will guide you through the features that are available in the tool which will help you to understand the different possibilities. The objective is to establish a best practice when working with changing data sources and business requirements.

Share:

Stefan Eberl (seberl@us.ibm.com), Software Architect, IBM

Photo of author Stefan EberlStefan Eberl is a software engineer based in the IBM Silicon Valley Laboratory. He is currently working on the tooling for IBM's Information Management products. In particular he focuses on metadata mapping, and is the architect for IBM InfoSphere FastTrack. Mr. Eberl graduated in 2004 from the Fachhochschule Furtwangen in Germany.



27 January 2011

Introduction

In data integration projects, multiple people often work on different aspects of projects at the same time while passing through multiple iterations. This is unavoidable due to changing requirements, the need to perform tasks in parallel, or simply due to mistakes that are made along the way. This often leads to changing base components that other aspects of a project rely on.

In the case of a data mapping specification, mapping specifications often get modified by multiple people at the same time, data models change frequently along way, and DataStage jobs get changed independently of a mapping specification.

This article will show you how to use IBM InfoSphere FastTrack, IBM InfoSphere DataStage, and IBM InfoSphere Data Architect to deal with these situations.

This article assumes that:

  • The reader has basic knowledge of FastTrack.
  • The reader has basic knowledge of physical database models.
  • The reader has a basic understanding of the development process involving multiple user roles, like a business analyst using FastTrack, a developer using IBM InfoSphere DataStage, and a data modeler using IBM InfoSphere Data Architect.

Learning objectives:

  • Import a physical data model from a modeling tool into the metadata repository.
  • Create a mapping specification that uses the imported model.
  • Switch a data model that is used in an existing mapping specification to a different data model instance.
  • Incorporate external changes to an already mapped data model within a existing mapping specification.
  • Annotate a previously generated DataStage job with changes that have been made to the mapping specification.

Dealing with changes to data models

In the first part of this article you will learn how to update mapping specifications for when columns within either the source or target of an existing model instance have changed. The following example assumes the involvement of InfoSphere Data Architect, and the Metadata Brokers and Bridges to transfer data models to the Metadata Repository. You should also be able to perform a similar process with any other data modeling tool by using the appropriate export/import mechanism that applies to that tool. Alternatively if you do not leverage a data modeling tool, then you should be able use a DDL file to create a database table, and import this to the Metadata Repository from within FastTrack.

Import metadata

Follow these steps to import metadata from a data modeling tool into the metadata repository

  1. Within FastTrack, click Metadata > Metadata Repository.
  2. Select New Host.
  3. Give the new host the name TARGETHOST_1. This host will be used to hold the physical database model that gets imported into the metadata repository.
  4. Start IBM InfoSphere Data Architect. In project Lab, open the physical database model called Database Model.dbm. Data Architect is a file based tool, so what you see is a file that is located on the local file system that contains a physical database model. This model contains a customer entity.
  5. From the File menu, click Export. Click Data, and then select Export a Physical Model to the Metadata Server to transfer the model into the metadata repository.
  6. Click Next, and select the Database Model.dbm. For the host name, specify the newly created TARGETHOST_1, and click Finish.
  7. Click Select All to transfer all objects from Database Model.dbm into the repository. You could also specify a filter here. Specify the repository credentials and click OK to transfer all objects into the metadata repository.

Create a mapping specification

You will use the previously imported metadata within a mapping specification.

  1. Within FastTrack, click Mapping on the top left to enter the Mapping workspace.
  2. Create a new project using Projects > New Project from the toolbar.
  3. Give the new project the name BankRemap, and click Finish.
  4. Click the Mapping tab, then select Mapping Specifications under the BankRemap project, and select New.
  5. Give the mapping specification the name Bank 1 Extract – Remap, and click Save.
  6. Click the Mapping section within Bank 1 Extract - Remap and drag the CUSTOMER table within TARGETHOST_1 to the target for the mapping specification. Complete the mapping specification by supplying the following sources from the bankconnect.BANKDEMO.BANK1 schema to the now existing targets. You can provide the sources by expanding the tree on the right-hand side to find an appropriate column, and then simply drag the column over to the source cell in the mapping grid.
    Source (BANK1 schema)Target (BANK schema)
    CHECKING.ACCOUNT_BALANCECUSTOMER.ACCOUNT_BALANCE
    CHECKING.ADDR1CUSTOMER.ADDR1
    CHECKING.ADDR2CUSTOMER.ADDR2
    CHECKING.CITYCUSTOMER.CITY
    CHECKING.CUSTOMER_IDCUSTOMER.CUSTOMER_ID
    CHECKING.NAMECUSTOMER.NAME
    CHECKING.STATECUSTOMER.STATE
    CHECKING.ZIPCUSTOMER.ZIP
    setNull()CUSTOMER.ONLINE_ACCESS
    setNull()CUSTOMER.YEARS_CLIENT
    setNull()CUSTOMER.GENDER
    setNull()CUSTOMER.LEVEL
    CHECKING.SS_NUMCUSTOMERS.TAX_ID

As shown in Figure 1, you have now created a mapping specification that maps a source system to a physical target model that has been imported from InfoSphere Data Architect.

Figure 1. Final mapping specification after creating mappings
Screen capture: Shows source field, target field, and transformation rule and function if present

Generate a DataStage job

The next step is to create a DataStage job. The following instructions will help you to produce a DataStage job from the mapping specification that were created earlier.

  1. Click the Generate Job button to start the job generation wizard.
  2. Keep the default name for the new DataStage job, and select the Jobs folder from the project dstage1. Clear the Use Shared Table Definitions check box, as shown in Figure 2, for simplicity sake.
    Figure 2. Option to use shared table definitions
    Use Shared Table Definitions
  3. Click Finish. In the case that a DataStage job already exists with the same name, simply choose the overwrite option to replace the existing job.
  4. Start IBM Infosphere Datastage Designer to review the job. If the job does not appear immediately, then you may need to refresh the view by clicking Repository > Refresh. When opening the job, you should see the result shown in Figure 3. As expected this is a very simple job with a single transform object.
    Figure 3. Job after generation from FastTrack
    diagram shows job going through single transform object called Transform_001 to create a customer transform object
    If you see a different result, then you should go back to the mapping specification and verify that everything is mapped correctly. You can click the Validation tab to help you identify potentially unmapped targets.
  5. Double-click on the Transform object and see the mappings that you established within FastTrack, as shown in Figure 4.
    Figure 4. Contents of the generated Transform stage
    Screen capture: Various Link_CHECKING fields are mapped to Link_CUSTOMER fields

Apply changes to the target schema

So far you imported metadata, created a mapping specification, and generated a DataStage job. This next step will focus on how you can apply a change to the data model back in the data modeling tool. You will then learn how to update the existing metadata in the metadata repository.

  1. To simulate a change in the metadata definition within TARGETHOST_1, change the CUSTOMER table's NAME column to CUSTOMER_NAME within InfoSphere Data Architect. To perform this change, select the NAME column and go to the properties in the lower right area of the screen. Change the name and press enter. As shown in Figure 5, you will see the change reflected in the tree on the right.
    Figure 5. CUSTOMER table after change
    CUSTOMER table explorer view shows changed column name
  2. Export this table back to the metadata repository as shown in the previous Import metadata section. Make sure that you specify TARGETHOST_1 as the host name when you export. This is critical since you want to update the existing instance of the metadata in the repository.
  3. Switch to FastTrack and select the Metadata > Metadata Repository tab. Make sure to click the refresh icon in the upper right corner.
  4. Expand the TARGETHOST_1 until you see the CUSTOMER entity. Within the entity you should see the new CUSTOMER_NAME field, as shown in Figure 6.
    Figure 6. Metadata repository after change
    Metadata repository after change
  5. The original mapping specification is now invalid because the NAME column has been altered. This can be displayed by closing and then reopening the Bank 1 Extract -- Remap specification if it is open. If validation is not enabled, then you have to enable it by clicking the exclamation mark in the right upper corner of the mapping grid. The CUSTOMER.NAME field displays an alert and an explanatory tool tip, as shown in Figure 7.
    Figure 7. Warning indication
    Warning indication: Target field NAME of target CUSTOMER does not exist anymore in the repository

Updating the mapping specification

The change to the metadata requires an update to the existing mapping specifications. In this scenario it would simply be possible to remove the wrong column in the mapping spreadsheet and replace it with the new column that is visible in the metadata browser. In many cases however, many changes occur over a period of time and they are most likely made by a different person in the team. In addition, a column could be used in several places within a mapping specification and therefore would have to be changed in all of these places. A feature called remap helps to simplify this process.

  1. To update the mapping for this field, right-click the CUSTOMER.NAME field and select Remap. Then, using the Database Metadata window to locate the CUSTOMER_NAME column, drag it into the Matching Replacement column. The screen describes which column in the spreadsheet (on the left) will be replaced with which column from the repository (on the right).
  2. Click Finish. When you return to the mapping specification window, you will see that the alert on the NAME field has disappeared, and the mapping has been correctly updated.
  3. Figure 8 shows how the mapping specification has been updated with the changes that have been made to the schema. Save the mapping specification.
    Figure 8. Remap screen
    Remap screen shows target to be remapped on left and matching replacement on right

Updating the DataStage job

Now that the mapping specification is updated, you need to change the DataStage job as well. A simple way to perform this is to generate a new DataStage job. However, in a real-world scenario, a developer might already have made modifications to the previously generated job. It would be ideal to simply communicate the few modifications to the mapping specification instead of starting over with a new job. FastTrack offers a feature called job annotation that helps you do this.

  1. To simulate a change to the previously generated DataStage job, simply open the job in the DataStage designer and move one of the stages on the canvas.
  2. After the layout has been altered, save the job, and ensure that the job is closed in the DataStage designer.
  3. Now click the Generate Job button to invoke job generation.
  4. In the job generation wizard, ensure that you select the same job name and the same project. Also, ensure that the option to use table definitions is disabled, and then click OK.
  5. From the Overwrite existing job window, click Annotate, as shown in Figure 9.
    Figure 9. Job annotation dialog
    Job annotation dialog
  6. Open the DataStage Designer to review the existing job. Notice that the layout change is still visible. This means that the job flow itself has not been modified. Instead, there is an annotation on the canvas that lists the changes between the last job generation and the current state of the mapping specification. All changes are captured in a shared container in the upper-area of the job canvas.
  7. You can now look at the changes and apply them to the DataStage job. In addition, any subsequent change to the specification can be annotated in the same way to the existing job. In this case there will be one shared container for each time the user performs a generation. These shared containers can also be kept as a change log in the DataStage job.

Working with different versions of a data model

In many cases there are multiple instances of the same data model in the repository. Reasons for that can be as follows:

  1. Major changes to an existing data model require phasing in a new model. In this case there are two versions of a model in the repository where the new version has to be slowly adopted by jobs and mapping specifications over time.
  2. Data models are deployed on multiple systems, for example, by development or production of potentially different database vendors.

Usually mapping specifications or DataStage jobs refer to one of these data model instances and need to be migrated to another. FastTrack can help with this process in a very similar way than what you've seen earlier.

Create a second model version

First you will import a second instance of an existing data model into the metadata repository.

  1. Create a new metadata host to hold the second data model. Repeat the steps outlined in the Import metadata section at the beginning of this article, this time giving the new host the name TARGETHOST_2.
  2. Within InfoSphere Data Architect, update the CUSTOMER table. Change ADDR1 to ADDRESS_LINE_1, and ADDR2 to ADDRESS_LINE_2, and change the maximum length of the ADDRESS_LINE_1 column to 100 characters. Change the CUSTOMER_NAME field back to NAME. And finally, delete the LEVEL column, and add a COUNTRY column. As shown in Figure 10, this allows you to simulate how to resolve mapping issues which arise as a result of the typical data modeling operations of addition, alteration, and removal of columns.
    Figure 10. Schema after modification
    shows changes to NAME, ADDRESS_LINE_1, ADDRESS_LINE_2, and COUNTRY
  3. Make sure to save the modified model in InfoSphere Data Architect.
  4. Once again, export to the metadata repository, but this time, export the Database Model.dbm to the TARGETHOST_2 host.
  5. Within the FastTrack Metadata Repository tab, you can see that the TARGETHOST_1 and TARGETHOST_2 are now populated – each with a slightly different version of the same model.

Adopt the second model version

The next step is to adopt the second model by your existing mapping specification.

  1. Click the Mapping tab and, if necessary, reopen the Bank Remap > Bank Extract 1 Mapping Specification. This time none of the target mappings display an alert. This is because they still exist in TARGETHOST_1. To transfer these target mappings to TARGETHOST_2, select all the columns in the target then right-click and select Remap.
    Figure 11. Remap launch point
    Screen capture: shows source field, target field, and remap selected in popup menu
  2. As most of the fields have not changed, drag the CUSTOMER table from the TARGETHOST_2 host within the Database Metadata window to the equivalent CUSTOMER row within the Matching Replacement column. The application will try to match all columns from the new table with the existing columns by name, which populates most of the rows.

    Those rows that have changed: ADDR1, ADDR2, CUSTOMER_NAME, and LEVEL are not populated and display an alert. To rectify this, simply drag the ADDRESS_LINE_1, ADDRESS_LINE_2 and NAME fields from the TARGETHOST_2 target metadata to the relevant field within the Matching Replacement column.

    You can also leverage the discover functionality as another option to find matching columns for non-existing columns. In the lower properties area, you can use the discover functionality to find replacement columns by name.

  3. Since the LEVEL column has been removed from the CUSTOMER table, to resolve this alert, it should be removed from the mapping specification. Click Finish to return to the mapping specification, and then right-click the row number to the left of the id field to select the entire row, and then click Delete from the context menu.
  4. Now all existing columns have been remapped to new columns of the second model version. Remember that you also added the COUNTRY column to the CUSTOMER table. In order to bring any new columns into the specification, you can drag the entire CUSTOMER table into the target field column of the last empty row. Note that the editor will now only add the new columns that have not yet been mapped. This will help you to identify any additions to the schema without having to manually compare all elements of the specifications. In this example, the COUNTRY column is the only new column to be added to the table. As shown in figure 12, since there is no equivalent Country column within the CHECKING table, in order to complete the specification, you need to hard code the value "USA" within the Function column so that all rows that pass through the transform are assigned this value.
    Figure 12. Final mapping specification
    Circle around target CUSTOMER.COUNTRY, function is USA
  5. Now you can again regenerate, and either overwrite or annotate the existing DataStage job.

Conclusion

This article demonstrated how to use and update external models, forward changes into your existing assets, and how to work with multiple versions of a database model. It also showed how to use the more advanced remap and job annotation features from FastTrack. If you are currently using the InfoSphere toolset, then this article gives you some new ideas on how to improve your daily workflow.


Download

DescriptionNameSize
Sample data models for this articleft-changemanagement-data2KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=619614
ArticleTitle=Communicate changing requirements to integration developers with IBM InfoSphere FastTrack
publish-date=01272011