Develop mapping models with IBM InfoSphere Data Architect

Designing the mappings for an extract, transform, and load (ETL) process is a critical step in a data warehouse project. Mappings must be easy to modify, capable of version control, easily reported, and easily exported to other formats. This tutorial illustrates how to develop a complete source-to-target mapping model using InfoSphere™ Data Architect. You will also learn about the reporting functions that InfoSphere Data Architect provides.

Share:

Jorge Machado (jorge_machado@de.ibm.com), IT Architect, I.B.M.

Photo of Jorge MachadoJorge Machado is an IT Architect with IBM Software Group Services in Germany. Having covered various roles within his 10 year history of assignments within IBM, his main focus area now is the planning, design, and development of data warehouse solutions as well as business intelligence strategies for customers in central Europe.



Hermann Voellinger (VGR@de.ibm.com), IT Architect, IBM

Photo of Hermann VoellingerHermann Voellinger is an Executive IT Architect at IBM Software Group Services. For the last 12 years, he has been the responsible IT architect for big data warehouse (DWH) projects in Germany. The 10 years before working on DWH projects, Hermann worked in the German Development Lab as the lead developer and architect for Text and Data-Mining solutions and tools. His key skills and main focus areas are currently the architecture of data population processes (ETL), data modeling concepts, and the strategy and architecture of DWH solutions.



27 January 2011

Also available in Vietnamese

Before you start

Introduction

The goal of this tutorial is to introduce a method for developing mapping models for data warehouse (DWH) projects using the functionality available in InfoSphere Data Architect (IDA), Version 7.5.2. The design of the mappings for the extract, transform, and load (ETL) process is an absolutely critical part of each data warehouse project. This design of mappings should be stored in a format that fulfills several requirements, including the following:

  • The mapping can easily be changed.
  • Changes can be versioned.
  • It is easy to report the mappings.
  • It can be exported to other formats.

You will learn in this tutorial how these goals can be reached with IDA. The tutorial builds describes all the necessary steps to build your first mapping model and to create a report for this model. The high-level steps are:

  1. Create a project in IDA
  2. Create the source data model (using the DB2 SAMPLE database)
  3. Create the target data model
  4. Create mappings between source and target tables
  5. Analyze your model using the reporting functions

This tutorial describes each of these five steps in detail. At the end, you will have your first meaningful mapping report in PDF style by using the inherited functionality of the IDA tool.

Product overview and prerequisites

The IBM InfoSphere Data Architect (previously called Rational Data Architect) is an enterprise data modeling and integration design tool that provides a collaborative data design solution that you can use to discover, model, relate, and standardize diverse and distributed data assets.

IDA is based on an open-standard Eclipse integration.

In order to work through this tutorial you need to install the following:

  • InfoSphere Data Architect, Version 7.5.2
  • IBM DB2 9.7 Enterprise Server Edition with its sample database
  • An instance of IBM DB2 Enterprise Edition and the standard sample database installed and accessible on the same computer

See Resources for links to trial versions of both DB2 and IDA.


Creating a project in IDA

All the design work you do in IDA is grouped into projects. In this section, you will learn how to use the IDA wizard to create a new project for your mapping model. Complete the following steps to create a new project.

  1. Select File > New > Project from the InfoSphere Data Architect menu bar. This launches the New Project wizard to help you create a new project in the workbench, as shown in Figure 1.
Figure 1. Creating a new project
Screen cap: File and New and Project highlighted in dropdown
  1. Browse through the list of projects, and open the folders to find out about all types of projects you can create. To assist in locating a particular wizard, the text field can be used to show only the wizards that match the entered text.
  2. Enter data as a key word in the text field to limit the list of projects.
  3. Select Data Design Project. The Data Design wizard launches, as shown in Figure 2. The wizard guides you through the basic setup steps.
Figure 2. Data Design Project wizard
Screen cap: Data Design Project highlighted in New Project wizard
  1. Click Next.
  2. Enter a new name for the project, such as MyMappingTutorial, as shown in Figure 3, and click Finish.
Figure 3. Naming the data design project
Screen cap: MyMappingTutorial as name, c:\jorge\IBM\IDA Workspace for location
  1. Click Yes when prompted to open the data perspective.
Figure 4. Open associated perspective
Screen cap: Open the associated data perspective prompt

Your new project is now ready.


Creating the source data model

In this section, learn to build a physical source data model by reverse-engineering part of the DB2 SAMPLE database. You use only a few of the existing tables. You build a subset by using the DELETE function of IDA. Finally, add annotations to a few fields in the model to extend the description of these fields.

Select data sources

In this step, connect to the data source that contains the source tables: the DB2 SAMPLE database.

  1. Go to the Data Project Explorer, which is located on the left side of the screen. (If it is not visible, go to the menu and select Window > Show View > Data Project Explorer.)
  2. Right-click Data Models, and then click New > Physical Data Model in the popup menu, as shown in Figure 5.
Figure 5. Creation of the physical data model
Screen cap: Data Project Explorer showing Data Model, New, Physical Data Model highlighted
  1. Enter the name of the source database model, such as Source Database Model 1.
  2. Select V9.7 for the DB2 version, and select Create from reverse engineering, as shown in Figure 6.
  3. Click Next.
Figure 6. Specifying the source database model
Screen cap: New Physical Data Model window with properties filled in
  1. Leave the next screen unchanged, indicating that you want to reverse engineer from a database.
  2. Click the New button to add a new database connection to the project, as shown in Figure 7.
Figure 7. Add a new database connection to the project
Screen cap: Select connection window with New highlighted
  1. Enter the user name and password, and select the Save password check box.
  2. Click the Test Connection button to verify that a connection can successfully be established, as shown in Figure 8.
Figure 8. Specify connection parameters
Screen cap: Connection Parameters window with user ID and password highlighted
  1. Click OK when you see the information box confirming that the connection succeeded.
  2. Leave the Filter screen unchanged, and click Finish.
  3. Click on the newly created database connection SAMPLE on the Select Connection screen, as shown in Figure 9.
Figure 9. Select database connection
Screen cap: Select Connection window with SAMPLE and Next highlighted
  1. Click Next.
  2. Apply a filter on the schemas you want to have in the source database model by clicking the check box next to the schema that has the same name as the Windows® user you were when you created the DB2 SAMPLE database. In the example for this tutorial, this is the schema MACHADO, as shown in Figure 10.
Figure 10. Filter objects for source database model
Screen cap: Select Schema window with MACHADO highlighted
  1. Click Next.
  2. Select the element types within the schema that you want to have included in your source database model. For the tutorial, be sure that only Tables is selected, as shown in Figure 11.
Figure 11. Select database elements
Screen cap: Database Elements window with Tables and Next highlighted
  1. Click Next.
  2. Select the following three check boxes, as shown in Figure 12:
    Overview
    You can generate diagrams from the wizard, or you can generate diagrams after you complete the wizard by using the context menu options that are available from the Diagrams folder.
    Infer implicit primary keys
    Select this option to find implicit primary keys in the form of unique indexes. Both unique indexes and un-enforced primary key holders are created in the model when a unique index is present.
    Infer implicit relationships
    Select this option to find relationships among tables that are not explicitly modeled by using foreign keys. If you select this option, these relationships are represented by an un-enforced foreign key holder in the model.
Figure 12. Select more options for model
Screen cap: Options window with Overview and two Infer checkboxes selected
  1. Click Finish. The physical model of the source database is created.

After the creation process, you see the main window where you can review the results of the previous steps. Your newly created model is now available in the Data Project Explorer in the upper left area of the screen. You can also find the new database connection to the sample database in the Data Source Explorer in the lower left area. The main window shows general properties of the new project, as shown in Figure 13.

Figure 13. Screen after creation of source data model
Screen cap: Physical Data Model Editor window with general properties filled in

(View a larger version of Figure 13.)

Select the table subset from source data model

All elements of the schema MACHADO have been added to the model. You can view them in the data diagram that belongs to the source database model by clicking Data Project Explorer > Data Diagrams > Source Database Model 1.dbm > MACHADO. The data diagrams offer a graphical representation of the models you create, as shown in Figure 14.

Figure 14. Data diagram of Source Database Model 1
Screen cap: Flow chart of data in MACHADO

(View a larger version of Figure 14.)

You can now browse through the model and see the results of the reverse engineering process.

You can also use the outline widget to help you see the big picture of the model and quickly navigate throughout the pane.

Remove unnecessary objects

Because you need only a subset of those tables, you can remove all unnecessary objects. To remove unnecessary objects, go to the pane in the center of the screen, and remove every table except Employee and Project by right-clicking on each table and selecting Delete from Model from the context menu, as shown in Figure 15.

Figure 15. Remove tables from data diagram
Screen cap: Shows a table right-clicked with Delete from Model highlighted in menu

(View a larger version of Figure 15.)

Keep in mind that Delete from Diagram leaves the object in the data model, while Delete from Model removes the object from both the data diagram and the data model (but leaves it in the source database). The result of selecting Delete from Model should look similar to Figure 16.

Figure 16. Cleansed source table model
Screen cap: Table model flow chart with only Employee and Project showing

(View a larger version of Figure 16.)

Add annotations to every element of source data model

The source model you created in the previous section contains most of the specification used by application developers on a technical level. However a more descriptive annotation to every element of the model down to column level is required. For example, not every developer would interpret EMPNO char(6) exactly the same way. Complete the following steps to add an exemplary annotation to the EMPNO column in the table field, as shown in Figure 17.

  1. If the columns are not visible, right-click the project table and select Filters > Show/Hide compartment.
  2. Select the key and column.
  3. Repeat this for the employee table.
  4. Click EMPNO in the table employee column.
  5. Click Properties in the lower information area.
  6. Click Annotation.
  7. Click on the golden rhombus in the upper left corner of the lower information area.
  8. Rename the new annotation property to Description.
  9. Type a meaningful description in the Value field, such as The unique serial number of an employee of company xyz. This serial remains personally assigned even if the employee leaves the company and later decides to return.
Figure 17. Add annotations to every element
Screen cap: EMPNO highlighted and Description typed in below

(View a larger version of Figure 17.)


Creating the target data model

To build the physical target data model that you will use in the mapping tutorial, you first create a logical data model from scratch. Later you will transform this logical to a physical data model.

Create a logical target data model

Complete the following steps to create the target data model from a logical level. This will be the input for the physical model described later.

  1. Go to the Data Project Explorer, and right-click Data Diagrams.
  2. Click New > Logical Data Model.
  3. Click Finish.
  4. Change the name of the newly created diagram by right-clicking on the new ldm-file and naming it Logical Target Data Model. Figure 18 shows the resulting window in your Data Project Explorer:
Figure 18. Data Project Explorer with new logical target data model
Screen cap: Logical Target Data Model with Diagram1 and Package1 highlighted

Note that a corresponding data model has automatically been created.

  1. Right-click the working pane of the new diagram, which is named diagram1. This diagram is part of Logical Target Data model.ldm.
  2. Add a new entity to the diagram by clicking Add Data Object > Entity, as shown in Figure 19.
Figure 19. Adding a new logical entity
Screen cap: Right-click on the pane and choose Entity from the context menu
  1. Name this new entity Ent_proj_cost, as shown in Figure 20.
Figure 20. Enterprise project costs logical entity
Screen cap: Logical representation of the project costs entity
  1. Add the attributes by right-clicking the new entity and choosing either Add Primary Key, as shown in Figure 21, or Add Attribute for the non-key attributes.
Figure 21. Adding a primary key to the logical entity
Screen cap: Ent_proj_cost with Add Primary Key showing

This entity contains the results of the transformations from the source tables that provide information about the costs caused by the salaries of the employees assigned to the respective project.

Create physical target data model

This section describes how to derive the physical target data model from the logical model you created in the previous section. Complete the following steps.

  1. Right-click Logical Target Data Model, and click Transform to Physical Data Model, as shown in Figure 22. You see a series of screens in which you will specify physical characteristics of your model.
Figure 22. Initialization of physical model creation
Screen cap: Transform to Physical Data Model menu item highlighted
  1. Change the database to DB2 for Linux, UNIX, and Windows, change the Version to V9.7, and update the file name to Physical Target Data Model, as shown in Figure 23.
Figure 23. Specify filename and database version of physical data model
Screen cap: Physical Data Model File window with filename and version highlighted
  1. Click Next. The resulting window prompts you to specify key attributes, as shown in Figure 24 and described in the following steps.
  2. Specify the Name Case for physical names to be Use lower case.
  3. Enter the Schema name as reporting.
  4. Select Transform diagrams.
  5. Select Column ordering (key columns first).
  6. Click Next after you have finished specifying attributes.
Figure 24. Transform to physical data model
Screen cap: Lowercase, schema name, transform diagrams, and column ordering highlighted
  1. Click Next.

You have now successfully created the physical data model. Note that an entry for this new model Physical Target Data Model.dbm has been created in the Data Project Explorer, as shown in Figure 25.

Figure 25. New physical model created
Screen cap: Data Project Explorer with Physical Target Data Model.dbm highlighted

Customize your view

To customize how your model is displayed, click the Properties tab, and select all of the following options on the Properties screen, as shown in Figure 26:

  • Show key
  • Show non-key
  • Show name
  • Show label
  • Show data types
  • Show foreign key decoration
Figure 26. Display options for models
Screen cap: Package1_Diagram1 window with the options highlighted

Your physical data model is now ready to be used as a target in the transformations and mappings you will do in the next section.


Creating mappings between source and target tables

This section describes how to do the actual mapping between the source and target tables. IDA provides mapping models for this purpose. Complete the following steps to create such a model using the functionality of IDA.

  1. Go to Data Project Explorer, and right-click the Mappings folder.
  2. Click New > Mapping Model. The resulting window is shown in Figure 27.
Figure 27. Naming the mapping model
Screen cap: Create Mapping window with MyMappingTutorial highlighted
  1. Change the file name to source_to_target_mapping.msl. At this point, if you wanted to create a link to an existing mapping model, you would click Advanced, but that is not needed for this tutorial.
  2. Click Next.
  3. Click Add, and specify Source Database Model 1 for the source data model for the mapping you want to create, as shown in Figure 28.
Figure 28. Choosing the source database model for mapping
Screen cap: Specify the Mapping Source window showing Source Database Model 1.dbm
  1. Select MACHADO as the database schema for the mapping model, as shown in Figure 29.
Figure 29. Choosing the source schema for the mapping model
Screen cap: Source Database Model 1.dbm with MACHADO selected below
  1. Click Finish.
  2. Repeat steps 2-7 for the target data model Physical Target Data Model.dbm, as shown in Figure 30.
Figure 30. Choosing target data model and schema for mapping model
Screen cap: Specify the Mapping Target window with Physical Target Data Model and Reporting selected

You have now created the mapping model with the tables required to proceed with the actual mapping between the source and the target tables.

Create a mapping to fill the target table

Complete the following steps to create a mapping to fill the target table with data on the costs caused by the salaries of each project member aggregated by project.

  1. Click the column, hold down Shift, and select the following fields:
    • employee.empno
    • project.respemp
    • ent_proj_cost.cost_per_project
  2. Right-click the Mappings pane, and select Create Mapping from the context menu. IDA creates graphical links between the selected source and the target columns.
  3. Click the Properties tab in the middle lower area of the screen, and scroll down to the transformation box.
  4. Manually enter the transformation rule in Listing 1, substituting the user name that you used when creating the SAMPLE database for machado, if needed.
Listing 1. Transformation rule
SELECT projname, sum(Salary) as summe
FROM machado.employee, machado.proj
where employee.empno = proj.respemp
group by  projname

Listing 1 also contains the selection and aggregation criteria.

Figure 31 shows your model.

Figure 31. Mapping model with first transformation rule defined
Screen cap: Flow showing EMPNO, SALARY, and RESPEMP in source to target

(View a larger version of Figure 31.)

  1. Repeat steps 1-4 to create a one-to-one mapping for the following fields:
    • Project.Projectno > ent_proj_cost.projno
    • Project.projname > ent.proj_cost.proj_name
  2. Add the text 1:1 in the transformation boxes of the respective mappings.

You have now successfully mapped all columns for this tutorial.


Analyzing your model using the reporting functions

IDA provides a series of pre-configured reports that can easily be used to produce documentation. You can now produce a report that shows details of the mapping model you built in the previous section.

  1. Go the menu bar, and select Run > Report > Report Configurations. The Report Configuration screen appears, as shown in Figure 32. From this screen, you can configure, manage, or run a report:
Figure 32. Report configuration
Screen cap: Configurations window showing My BIRT Logical Model Report
  1. Double-click the Data Model Classic XSLT Report to create a new blank report configuration page.
  2. Rename the report to be MyTutorialMappingReport.
  3. Go to the Report Input box, and click Browse.
  4. Select /MyMappingTutorial/source_to_target_mapping.msl as input for the report.
  5. Click Browse for the Built-In field, and click the mapping model Classic Mapping Model Report. The resulting window is shown in Figure 33.
Figure 33. Classic Mapping Model Report
Screen cap: Select Built In Model Report with Report highlighted
  1. In the Report Output box, type the Location where you want the report to be stored, naming the report xslt_classic_mapping_model.pdf, as shown in Figure 34. Currently PDF is the only supported output format for the report.
Figure 34. Completed report configuration page
Screen cap: The MyTutorialMappingReport writing to xslt_classic_mapping_model.pdf

Create an actual report

You have two ways to create an actual report:

  • Click Report at the lower end of the configuration dialog (mandatory for the initial run).
  • Click Close at the lower end of the configuration dialog.

To continue the tutorial, click Run > Report > MyTutorialMappingReport from the menu bar. A PDF file opens containing the mapping report data. Because your mapping model is very simple, the report also displays a few mappings.

The mapping model consists of two main parts:

Header information
  • The name of the mapping model on which the report is based
  • The source and target tables
  • A table of contents based on the target objects

The header information for MyTutorialMappingReport is shown in Figure 35.

Figure 35. Header area of mapping report
Screen cap: The mapping model showing the Mapping Spec File, Associated Models, and Content
Detailed mapping information
  • Which source column is mapped to which target column and based on which transformation rule
  • The join criteria

The detailed mapping information for MyTutorialMappingReport is shown in Figure 36.

Figure 36. Detailed mapping report
Screen cap: Report showing mappings, targets, and sources for ent_proj_cost

Conclusion

This tutorial described how to get a first meaningful mapping model for the design of ETL jobs with just a few steps.

You started by creating a new project folder for mapping models in InfoSphere Data Architect (IDA). Next, you created a physical source data model (using the standard SAMPLE database in DB2). You extended the description of the fields in this model. After this you defined a new target physical data models with IDA from scratch. Then you started defining mapping models. You designed a few simple mappings from source to target, which showed the possibilities for designing ETL mappings with IDA. In the last section, you learned how to report the mappings in a first classical xslt report.

You can now use this approach for more complex scenarios with hundreds of mappings. Then you can take advantage of the automated features for easy changing, easy versioning, and better reporting functionality by using the data modeling tool InfoSphere Data Architect (IDA) from IBM.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=618702
ArticleTitle=Develop mapping models with IBM InfoSphere Data Architect
publish-date=01272011