Develop mapping models with IBM InfoSphere Data Architect
Before you start
The goal of this tutorial is to introduce a method for developing mapping models for data warehouse (DWH) projects using the functionality available in InfoSphere Data Architect (IDA), Version 7.5.2. The design of the mappings for the extract, transform, and load (ETL) process is an absolutely critical part of each data warehouse project. This design of mappings should be stored in a format that fulfills several requirements, including the following:
- The mapping can easily be changed.
- Changes can be versioned.
- It is easy to report the mappings.
- It can be exported to other formats.
You will learn in this tutorial how these goals can be reached with IDA. The tutorial builds describes all the necessary steps to build your first mapping model and to create a report for this model. The high-level steps are:
- Create a project in IDA
- Create the source data model (using the DB2 SAMPLE database)
- Create the target data model
- Create mappings between source and target tables
- Analyze your model using the reporting functions
This tutorial describes each of these five steps in detail. At the end, you will have your first meaningful mapping report in PDF style by using the inherited functionality of the IDA tool.
Product overview and prerequisites
The IBM InfoSphere Data Architect (previously called Rational Data Architect) is an enterprise data modeling and integration design tool that provides a collaborative data design solution that you can use to discover, model, relate, and standardize diverse and distributed data assets.
IDA is based on an open-standard Eclipse integration.
In order to work through this tutorial you need to install the following:
- InfoSphere Data Architect, Version 7.5.2
- IBM DB2 9.7 Enterprise Server Edition with its sample database
- An instance of IBM DB2 Enterprise Edition and the standard sample database installed and accessible on the same computer
See Related topics for links to trial versions of both DB2 and IDA.
Creating a project in IDA
All the design work you do in IDA is grouped into projects. In this section, you will learn how to use the IDA wizard to create a new project for your mapping model. Complete the following steps to create a new project.
- Select File > New > Project from the InfoSphere Data Architect menu bar. This launches the New Project wizard to help you create a new project in the workbench, as shown in Figure 1.
Figure 1. Creating a new project
- Browse through the list of projects, and open the folders to find out about all types of projects you can create. To assist in locating a particular wizard, the text field can be used to show only the wizards that match the entered text.
- Enter data as a key word in the text field to limit the list of projects.
- Select Data Design Project. The Data Design wizard launches, as shown in Figure 2. The wizard guides you through the basic setup steps.
Figure 2. Data Design Project wizard
- Click Next.
- Enter a new name for the project, such as MyMappingTutorial, as shown in Figure 3, and click Finish.
Figure 3. Naming the data design project
- Click Yes when prompted to open the data perspective.
Figure 4. Open associated perspective
Your new project is now ready.
Creating the source data model
In this section, learn to build a physical source data model by reverse-engineering part of the DB2 SAMPLE database. You use only a few of the existing tables. You build a subset by using the DELETE function of IDA. Finally, add annotations to a few fields in the model to extend the description of these fields.
Select data sources
In this step, connect to the data source that contains the source tables: the DB2 SAMPLE database.
- Go to the Data Project Explorer, which is located on the left side of the screen. (If it is not visible, go to the menu and select Window > Show View > Data Project Explorer.)
- Right-click Data Models, and then click New > Physical Data Model in the popup menu, as shown in Figure 5.
Figure 5. Creation of the physical data model
- Enter the name of the source database model, such as Source Database Model 1.
- Select V9.7 for the DB2 version, and select Create from reverse engineering, as shown in Figure 6.
- Click Next.
Figure 6. Specifying the source database model
- Leave the next screen unchanged, indicating that you want to reverse engineer from a database.
- Click the New button to add a new database connection to the project, as shown in Figure 7.
Figure 7. Add a new database connection to the project
- Enter the user name and password, and select the Save password check box.
- Click the Test Connection button to verify that a connection can successfully be established, as shown in Figure 8.
Figure 8. Specify connection parameters
- Click OK when you see the information box confirming that the connection succeeded.
- Leave the Filter screen unchanged, and click Finish.
- Click on the newly created database connection SAMPLE on the Select Connection screen, as shown in Figure 9.
Figure 9. Select database connection
- Click Next.
- Apply a filter on the schemas you want to have in the source database model by clicking the check box next to the schema that has the same name as the Windows® user you were when you created the DB2 SAMPLE database. In the example for this tutorial, this is the schema MACHADO, as shown in Figure 10.
Figure 10. Filter objects for source database model
- Click Next.
- Select the element types within the schema that you want to have included in your source database model. For the tutorial, be sure that only Tables is selected, as shown in Figure 11.
Figure 11. Select database elements
- Click Next.
- Select the following three check boxes, as shown in Figure 12:
- You can generate diagrams from the wizard, or you can generate diagrams after you complete the wizard by using the context menu options that are available from the Diagrams folder.
- Infer implicit primary keys
- Select this option to find implicit primary keys in the form of unique indexes. Both unique indexes and un-enforced primary key holders are created in the model when a unique index is present.
- Infer implicit relationships
- Select this option to find relationships among tables that are not explicitly modeled by using foreign keys. If you select this option, these relationships are represented by an un-enforced foreign key holder in the model.
Figure 12. Select more options for model
- Click Finish. The physical model of the source database is created.
After the creation process, you see the main window where you can review the results of the previous steps. Your newly created model is now available in the Data Project Explorer in the upper left area of the screen. You can also find the new database connection to the sample database in the Data Source Explorer in the lower left area. The main window shows general properties of the new project, as shown in Figure 13.
Figure 13. Screen after creation of source data model
(View a larger version of Figure 13.)
Select the table subset from source data model
All elements of the schema MACHADO have been added to the model. You can view them in the data diagram that belongs to the source database model by clicking Data Project Explorer > Data Diagrams > Source Database Model 1.dbm > MACHADO. The data diagrams offer a graphical representation of the models you create, as shown in Figure 14.
Figure 14. Data diagram of Source Database Model 1
(View a larger version of Figure 14.)
You can now browse through the model and see the results of the reverse engineering process.
You can also use the outline widget to help you see the big picture of the model and quickly navigate throughout the pane.
Remove unnecessary objects
Because you need only a subset of those tables, you can remove all
unnecessary objects. To remove unnecessary objects, go to the pane in the
center of the screen, and remove every table except
right-clicking on each table and selecting Delete from
Model from the context menu, as shown in Figure 15.
Figure 15. Remove tables from data diagram
(View a larger version of Figure 15.)
Keep in mind that Delete from Diagram leaves the object in the data model, while Delete from Model removes the object from both the data diagram and the data model (but leaves it in the source database). The result of selecting Delete from Model should look similar to Figure 16.
Figure 16. Cleansed source table model
(View a larger version of Figure 16.)
Add annotations to every element of source data model
The source model you created in the previous
section contains most of the specification used by application
developers on a technical level. However a more descriptive annotation
to every element of the model down to column level is required. For
example, not every developer would interpret
char(6) exactly the same way. Complete the following steps to add an exemplary annotation to
the EMPNO column in the table field, as shown in Figure 17.
- If the columns are not visible, right-click the project table and select Filters > Show/Hide compartment.
- Select the key and column.
- Repeat this for the employee table.
- Click EMPNO in the table employee column.
- Click Properties in the lower information area.
- Click Annotation.
- Click on the golden rhombus in the upper left corner of the lower information area.
- Rename the new annotation property to Description.
- Type a meaningful description in the Value field, such as The unique serial number of an employee of company xyz. This serial remains personally assigned even if the employee leaves the company and later decides to return.
Figure 17. Add annotations to every element
(View a larger version of Figure 17.)
Creating the target data model
To build the physical target data model that you will use in the mapping tutorial, you first create a logical data model from scratch. Later you will transform this logical to a physical data model.
Create a logical target data model
Complete the following steps to create the target data model from a logical level. This will be the input for the physical model described later.
- Go to the Data Project Explorer, and right-click Data Diagrams.
- Click New > Logical Data Model.
- Click Finish.
- Change the name of the newly created diagram by right-clicking on the new ldm-file and naming it Logical Target Data Model. Figure 18 shows the resulting window in your Data Project Explorer:
Figure 18. Data Project Explorer with new logical target data model
Note that a corresponding data model has automatically been created.
- Right-click the working pane of the new diagram, which is named
diagram1. This diagram is part of Logical Target Data model.ldm.
- Add a new entity to the diagram by clicking Add Data Object > Entity, as shown in Figure 19.
Figure 19. Adding a new logical entity
- Name this new entity Ent_proj_cost, as shown in Figure 20.
Figure 20. Enterprise project costs logical entity
- Add the attributes by right-clicking the new entity and choosing either Add Primary Key, as shown in Figure 21, or Add Attribute for the non-key attributes.
Figure 21. Adding a primary key to the logical entity
This entity contains the results of the transformations from the source tables that provide information about the costs caused by the salaries of the employees assigned to the respective project.
Create physical target data model
This section describes how to derive the physical target data model from the logical model you created in the previous section. Complete the following steps.
- Right-click Logical Target Data Model, and click Transform to Physical Data Model, as shown in Figure 22. You see a series of screens in which you will specify physical characteristics of your model.
Figure 22. Initialization of physical model creation
- Change the database to DB2 for Linux, UNIX, and Windows, change the Version to V9.7, and update the file name to Physical Target Data Model, as shown in Figure 23.
Figure 23. Specify filename and database version of physical data model
- Click Next. The resulting window prompts you to specify key attributes, as shown in Figure 24 and described in the following steps.
- Specify the Name Case for physical names to be Use lower case.
- Enter the Schema name as reporting.
- Select Transform diagrams.
- Select Column ordering (key columns first).
- Click Next after you have finished specifying attributes.
Figure 24. Transform to physical data model
- Click Next.
You have now successfully created the physical data model. Note that an entry for this new model Physical Target Data Model.dbm has been created in the Data Project Explorer, as shown in Figure 25.
Figure 25. New physical model created
Customize your view
To customize how your model is displayed, click the Properties tab, and select all of the following options on the Properties screen, as shown in Figure 26:
- Show key
- Show non-key
- Show name
- Show label
- Show data types
- Show foreign key decoration
Figure 26. Display options for models
Your physical data model is now ready to be used as a target in the transformations and mappings you will do in the next section.
Creating mappings between source and target tables
This section describes how to do the actual mapping between the source and target tables. IDA provides mapping models for this purpose. Complete the following steps to create such a model using the functionality of IDA.
- Go to Data Project Explorer, and right-click the Mappings folder.
- Click New > Mapping Model. The resulting window is shown in Figure 27.
Figure 27. Naming the mapping model
- Change the file name to source_to_target_mapping.msl. At this point, if you wanted to create a link to an existing mapping model, you would click Advanced, but that is not needed for this tutorial.
- Click Next.
- Click Add, and specify Source Database Model 1 for the source data model for the mapping you want to create, as shown in Figure 28.
Figure 28. Choosing the source database model for mapping
- Select MACHADO as the database schema for the mapping model, as shown in Figure 29.
Figure 29. Choosing the source schema for the mapping model
- Click Finish.
- Repeat steps 2-7 for the target data model Physical Target Data Model.dbm, as shown in Figure 30.
Figure 30. Choosing target data model and schema for mapping model
You have now created the mapping model with the tables required to proceed with the actual mapping between the source and the target tables.
Create a mapping to fill the target table
Complete the following steps to create a mapping to fill the target table with data on the costs caused by the salaries of each project member aggregated by project.
- Click the column, hold down Shift, and select the
- Right-click the Mappings pane, and select Create Mapping from the context menu. IDA creates graphical links between the selected source and the target columns.
- Click the Properties tab in the middle lower area of the screen, and scroll down to the transformation box.
- Manually enter the transformation rule in Listing 1, substituting the user name that you used
when creating the SAMPLE database for
machado, if needed.
Listing 1. Transformation rule
SELECT projname, sum(Salary) as summe FROM machado.employee, machado.proj where employee.empno = proj.respemp group by projname
Listing 1 also contains the selection and aggregation criteria.
Figure 31 shows your model.
Figure 31. Mapping model with first transformation rule defined
(View a larger version of Figure 31.)
- Repeat steps 1-4 to create a one-to-one mapping for the following fields:
- Project.Projectno > ent_proj_cost.projno
- Project.projname > ent.proj_cost.proj_name
- Add the text 1:1 in the transformation boxes of the respective mappings.
You have now successfully mapped all columns for this tutorial.
Analyzing your model using the reporting functions
IDA provides a series of pre-configured reports that can easily be used to produce documentation. You can now produce a report that shows details of the mapping model you built in the previous section.
- Go the menu bar, and select Run > Report > Report Configurations. The Report Configuration screen appears, as shown in Figure 32. From this screen, you can configure, manage, or run a report:
Figure 32. Report configuration
- Double-click the Data Model Classic XSLT Report to create a new blank report configuration page.
- Rename the report to be MyTutorialMappingReport.
- Go to the Report Input box, and click Browse.
- Select /MyMappingTutorial/source_to_target_mapping.msl as input for the report.
- Click Browse for the Built-In field, and click the mapping model Classic Mapping Model Report. The resulting window is shown in Figure 33.
Figure 33. Classic Mapping Model Report
- In the Report Output box, type the Location where you want the report to be stored, naming the report xslt_classic_mapping_model.pdf, as shown in Figure 34. Currently PDF is the only supported output format for the report.
Figure 34. Completed report configuration page
Create an actual report
You have two ways to create an actual report:
- Click Report at the lower end of the configuration dialog (mandatory for the initial run).
- Click Close at the lower end of the configuration dialog.
To continue the tutorial, click Run > Report > MyTutorialMappingReport from the menu bar. A PDF file opens containing the mapping report data. Because your mapping model is very simple, the report also displays a few mappings.
The mapping model consists of two main parts:
- Header information
- The name of the mapping model on which the report is based
- The source and target tables
- A table of contents based on the target objects
The header information for MyTutorialMappingReport is shown in Figure 35.
Figure 35. Header area of mapping report
- Detailed mapping information
- Which source column is mapped to which target column and based on which transformation rule
- The join criteria
The detailed mapping information for MyTutorialMappingReport is shown in Figure 36.
Figure 36. Detailed mapping report
This tutorial described how to get a first meaningful mapping model for the design of ETL jobs with just a few steps.
You started by creating a new project folder for mapping models in InfoSphere Data Architect (IDA). Next, you created a physical source data model (using the standard SAMPLE database in DB2). You extended the description of the fields in this model. After this you defined a new target physical data models with IDA from scratch. Then you started defining mapping models. You designed a few simple mappings from source to target, which showed the possibilities for designing ETL mappings with IDA. In the last section, you learned how to report the mappings in a first classical xslt report.
You can now use this approach for more complex scenarios with hundreds of mappings. Then you can take advantage of the automated features for easy changing, easy versioning, and better reporting functionality by using the data modeling tool InfoSphere Data Architect (IDA) from IBM.
- See the tutorial "Scoping the IBM Industry Model for banking using Enterprise Model Extender and InfoSphere Data Architect" (developerWorks, Mar 2010) for how the process of scoping can yield business objects that will guide you to developing a meaningful logical data model.
- View the InfoSphere Data Architect page on developerWorks for more resources for growing your IDA skills.
- Go to the book Mastering Data Warehouse Design by Claudia Imhoff, Nicholas Galemmo, and Jonathan Geiger for details on a balanced approach to data warehousing.
- Download an evaluation copy of InfoSphere Data Architect.
- Download an evaluation copy of DB2 for Linux, UNIX, and Windows.