In part 1 of the series, you read about how the retail company designed their data models, including both relational data models and dimensional data models. These models manage their sales transactions and sales analysis by forward engineering (creating) a new model in InfoSphere Data Architect. The new dimensional-physical and cubing data models were deployed to the production environment.
However, application developers have complained that storing the TIME and DATE tables separately is not intuitive for their needs. Instead of storing time and date information separately, they would like to store that data in a single table in order to build simpler applications for the business analysis team. Management has also requested for the data modeling team to model how to track territory information for each employee.
To fulfill the requirements, the data modeling team decides to redesign the relational and dimensional data model. Follow the workflow outlined in this article to learn how to update existing data models and minimize the impact on the production environment by reverse engineering from an existing data source.
The data modeling team performs the following steps.
- The team reverse engineers an existing data model to a new data model from a database.
- Then the team transforms the physical data model into a new logical data model.
- They update the logical data model to address the requirements outlined by the application development and management teams.
- Then the team enables dimensional notation in the logical data model (making it a dimensional-logical model) and denormalizes the model during the process.
- In order to deploy the model as-is, the team transforms the dimensional-logical data model into the existing dimensional-physical data model.
- Finally, the team transforms the dimensional-physical data model to a Cubing or Cognos model so that the Cubing team can improve upon the model for use with applications, and the report design team can update their report designs.
The retail company has already deployed the RETAIL_SALES schema to a DB2 for Linux, UNIX, and Windows V9.7 database, in the RETAIL database. In order to connect to the database and reverse engineer all or part of the database, you must create a connection profile in InfoSphere Data Architect. Follow the team as they create a physical data model by reverse engineering. The team leverages the existing RETAIL_SALES schema to create a physical data model.
- In InfoSphere Data Architect, create a database connection profile of your existing database.
- In InfoSphere Data Architect, create a Data Design Project named RETAIL_SALES_PROJECT. From the main menu, select File > New > Data Design Project. Name the project RETAIL_SALES_PROJECT.
- Right-click the RETAIL_SALES_PROJECT data design folder, then select New > Physical Data Model.
- In the New Physical Data Model wizard, specify
RETAIL_SALES_PDM as the name of the model, then
select the Create from reverse engineering option.
Figure 1. Creating a new physical data model from reverse engineering
- On the Source page, select Database
as your source.
Note: You can reverse engineer to a physical data model from two types of sources: database and DDL script. However, not both types of reverse engineering to a physical data model are supported for all databases (such as MySQL or Derby).
- On the Select Connection page, select the connection
profile that you created to the RETAIL database.
Figure 2. Select the connection profile as the source
- On the Select Objects page, select the
Tip: On the Select Objects page, all schemas are selected by default. You should select only the schemas that you want to reverse engineer and modify.
- On the Database Elements page, do not change the
Tip: In this page, you should select only the database elements that you want to reverse engineer in the physical data model.
Figure 3. Select the database elements that you want to reverse engineer
- On the Options page, click the Finish button to reverse engineer the data source.
The physical data model is created in Data Project Explorer.
Note: In general, reverse engineering is a short process. However, the time that is required to reverse engineer depends on many factors, such as the number of objects that you want to reverse engineer and the speed of your connection to the database.
The new physical data model is shown in Figure 4.
Figure 4. Physical data model created by reverse engineering
Now that your model is complete, you can create a new overview diagram within the physical data model to view the data objects (and the relationships between those objects) of the model in a diagram.
- Locate the Diagrams folder of the RETAIL_SALES schema at RETAIL_SALES_PROJECT > Data Models > RETAIL_SALES_PDM.dbm > RETAIL_SALES > RETAIL_SALES > Diagrams.
- Right-click the Diagrams folder and select New Overview Diagram. The Overview Diagram Selection window opens.
- Select the RETAIL_SALES schema, then click OK. The schema and all of its objects are added to the new diagram, and the diagram is saved to your Diagrams folder.
The overview diagram is shown in Figure 5:
Figure 5. Visualizing the physical data model with a diagram
Physical data models are specific to a database vendor, but the logical data model is not. Therefore, for larger design changes, you should use a logical data model to model your changes. Follow the data modeling team as they transform the physical data model to a logical data model in order to make changes without affecting the current design.
Note: You can update an existing logical data model by transforming a physical data model into that logical data model. If you transform to an existing model, you can update the logical data model by using the Compare editor. You can also create a new logical data model during the transformation process.
- Right-click the RETAIL_SALES_PDM.pdm physical data model, then click Transform to Logical Data Model. The Transform to Logical Data Model wizard opens.
- On the Target Logical Data Model page, select the
Create new model option.
Note: As specified previously in this article, you can transform to an existing logical data model. This scenario, however, describes transformation to a new data model.
- On the Logical Data Model File page, specify the RETAIL_SALES_PROJECT project as the destination folder, then specify RETAIL_SALES_LDM as the file name.
- On the Options page, keep the default options.
Note: You can customize the transformation options.
- On the Output page, click the Finish
button to create the new logical data model.
The RETAIL_SALES_LDM logical data model is created in the RETAIL_SALES_PROJECT data design project folder as shown in Figure 6.
Figure 6. Verifying the new logical data model
Now that the team has transformed to a logical data model, they update the model to satisfy the requirements specified previously in this article.
First, the team needs to add a relationship between the EMPLOYEES and TERRITORIES entities. This ensures that the existing TERRITORIES entity is linked to the EMPLOYEES entity, modeling the requirements that the management team specified. When the logical data model is transformed, the relationship will persist in the physical data model, and management can more easily access the information about where each employee is located.
- Right-click the EMPLOYEES entity, then select Add Data Object > Relationship. The Select the parent entity window opens.
- Select the TERRITORIES entity as the parent entity. The relationships between the entities are added to the EMPLOYEES and TERRITORIES entities.
- Select the EMPLOYEES_TERRITORIES_FK relationship in the Data Project Explorer. The properties of the relationship display in the Properties view.
- Open the Type tab of the Properties view. Create a non-identifying, optional relationship with a Zero or More cardinality. Next, the team should delete the relationship between DATE and STORE BILLING DETAILS entities.
- Right-click the relationship under the DATE entity, then select
Now, the team needs to add a new relationship between the TIME and DATE entities.
- Repeat steps 6 through 10 listed above to create a non-identifying,
optional relationship between the TIME and DATE entities. The DATE
entity should be the parent entity.
The team has completed its first updates to the logical data model, as shown in Figure 7. Next, the team will enable dimensional notation to the logical data model objects, accomplishing dimensional-logical data model.
Figure 7. Verifying the updated logical data model
In order to create a dimensional-logical data model, you simply enable dimensional notation on the logical data model. Let's follow the team as they enable dimensional notation:
- Restore the default settings as follows to ensure that you can
automatically discover the measures in fact tables or entities.
- Click Window > Preferences to open the Preferences window.
- Locate the dimensional preferences by expanding the Data Management node, then selecting the Dimensional page.
- Click the Restore Defaults button to restore
the default settings.
Note: Each of these options affects the automatic discovery process. The options are defined as follows.
Select the Overwrite the user specifications for dimensional objects when using the automatic discovery option to overwrite the dimensional classification of objects in your dimensional model. For example, you may classify a table as a fact table. If the automatic discovery tool determines that the table is a dimension table, your original classification is overwritten.
Select the Discover measures in fact tables or entities option to automatically discover the measures in fact tables or fact entities.
- Click OK to save your preferences. The Preferences window closes.
- Right-click the RETAIL_SALES_LDM logical data model, then select Use Dimensional Notation. You can now add dimensional objects to the model, but in order to discover existing dimensional objects, you need to discover the facts and dimensions of the model.
- Discover the facts and dimensions of the model as follows.
- Right-click the model and select Discover Facts and Dimensions.
- In the message that opens, select Yes to automatically generate hierarchies for the dimensions.
Note: A hierarchy basically consists of different levels that correspond to a dimension attribute. In other words, a hierarchy is a specification of levels that represents relationships between different attributes in a hierarchy. For example, one possible hierarchy in a date dimension is Year > Quarter > Month > Day.
The facts and dimensions of the model are discovered, as shown in Figure 8.
Figure 8. Viewing the dimensional objects in the Data Project Explorer
Now that you have created a dimensional model, you should visualize the relationships between objects by creating a dimensional diagram. Follow the team as they create this diagram.
- Locate the Diagrams folder of the logical data model. Expand the model as follows: RETAIL_SALES_LDM > Package1 > Diagrams.
- Right-click the Diagrams folder and select New Dimensional Blank Diagram. A blank diagram is created and opens in the diagram editor.
- Select all of the entities under the Package1 package, then drag them to the diagram editor. The entities and the relationships between them are shown in the diagram editor, as shown in Figure 9.
Figure 9.The snow schema as shown in dimensional diagram
You followed as the team created a physical data model from the existing database, then transformed it to a logical data model. The team then enabled dimensional notation and created a basic logical data model with dimensional properties.
Now that the updates to the logical data model are complete, the team must transform the logical data model into a model that can be deployed to a data warehouse system. The workbench can even automatically denormalize the logical data model to create a typical warehouse schema.
Follow the team as they create a reusable transformation configuration file that transforms the logical data model into a dimensional-logical data model, then transforms to a warehouse schema. To be consistent with the first article in this series, this article will only demonstrate how to transform a relational schema to a star schema which is a typical warehouse schema.
- Create a transformation configuration file to transform the logical
data model into a dimensional-logical data model as follows.
- Select the RETAIL_SALES_LDM file.
Note: In order to transform a model, you must first open the model by double-clicking it in the Data Project Explorer. Make sure that the RETAIL_SALES_LDM file is open in the editor.
- From the main menu, click Data > Transform > New Configuration. The New Transformation Configuration window opens.
- On the Specify a Configuration Name and Transformation page, specify LDM2DLDM as the name. Then, select the Logical Data Model to Dimensional-Logical Data Model from the list of transformation types.
- On the Source and Target page, select the Package1 package of the RETAIL_SALES_LDM file as the source. Then, select the RETAIL_SALES_PROJECT folder as the target.
- On the Properties page, select Star schema for the Degeneralize and, if dimensional notation exists, denormalize to property.
- Click Finish to create the transformation
configuration file. The
LDM2DLDM.tcfile is opened in the editor view, and the transformation configuration is saved to the Other Files folder of the data design project so that you can run the file whenever you make updates to the logical data model.
After you create the transformation configuration file, you must run it in order to generate the intended model (in this case, a new dimensional-logical data model).
- Select the RETAIL_SALES_LDM file.
- Run the transformation configuration file as follows.
- Validate the transformation configuration to ensure that it is
valid before you run the configuration. Click the
Validate the transformation configuration
button as shown in Figure 10.
Figure 10. Validating the transformation configuration
The results of the validation are shown in the Console view. Because the transformation is valid, there are no errors, warnings, or informational messages.
- Generate the new model by clicking the Run
button in the transformation configuration editor. The new
model is generated in the Data Project Explorer, named Package1_D.ldm.
Note: When you generate a dimensional model from an existing model, the new model is named after the top-level package or schema. For example, if you have a logical data model named
SAMPLESALES.ldm, which contains a top-level package that is named MARKETING, the new dimensional-logical data model is named MARKETING_D.ldm.
- Open the Package1_D.ldm file by
double-clicking it in the Data Project Explorer.
The new Package1_D.ldm model is shown in Figure 11.
Figure 11. Viewing the new dimensional-logical data model in the Data Project Explorer
- Validate the transformation configuration to ensure that it is valid before you run the configuration. Click the Validate the transformation configuration button as shown in Figure 10.
- Create a diagram of the dimensional-logical data model in order to
verify that this logical data model can be transformed and used in a
warehouse environment, as follows.
- Right-click the Diagrams folder under the Package1 package and select New Dimensional Blank Diagram. A new blank diagram is created and opens in the diagram editor.
- Select all of the entities under the Package1 package, then drag them to the blank dimensional diagram. All of the entities and the relationships between them are displayed in the editor.
The team verifies that the package can be transformed into a warehouse schema. For example, the transformation merged the DATE and TERRITORIES outriggers into the related TIME and EMPLOYEES dimensions.
The diagram of the package is shown in Figure 12.
Figure 12.The star schema as shown in dimensional diagram
You followed as the team created a new dimensional-logical data model from the logical data model. The transformation process, however, created some objects with names that aren't descriptive enough (such as Hierarchy) or that do not contain proper descriptions. Some of these objects are shown in Figure 13.
Figure 13. Auto generated hierarchy and level
In order to fix some of these common design issues, the team should run the Analyze Model wizard. Follow as the team uses the wizard to discover, then fix the common issues.
- Open and run the Analyze Model wizard as follows.
- Right-click the Package1 package in the
Package1_D.ldm model, then select Analyze
Model. The Analyze Model wizard opens.
Because you selected a logical data model object, all of the Logical Data Model rules are selected in the Analyze Model wizard.
Note: You can open the Analyze Model wizard from the root package or root database node of a logical or physical data model.
- Clear the Logical Data Model node. All of the rules are disabled.
- Enable the Dimensional Modeling rules by selecting the option
under the Logical Data Model node as shown in
Figure 14. Selecting rules to analyze the data model
The workbench analyzes the model, then returns any errors, warnings, or information messages to the Problems view, as shown in Figure 15.
Figure 15. Model analysis result
As you can see in the figure, some of the data objects are missing captions, and one caption attribute is required for each level.
- Right-click the Package1 package in the Package1_D.ldm model, then select Analyze Model. The Analyze Model wizard opens.
- Double-click each error to open the data object in the Data Project
Explorer. Use the Properties view to specify exactly
one caption attribute for each level, as shown in Figure 16. In the
example, you specify the STORE NAME as the caption
attribute for the STORE level.
Figure 16. Defining caption attributes for a level
- Rename the hierarchy and its level as followings.
- Select the Hierarchy hierarchy. Open the Properties view, and on the General tab, specify STORE as the name of the hierarchy.
- Select the STORE ID level under the STORE hierarchy. On the General tab of the Properties view, specify STORE as the name of the level.
- Change the measure type of the QUANTITY measure as follows.
- Expand the STORE BILLING DETAILS fact entity.
- Open the Dimensional tab, then make sure that
the Is a Measure option is selected. Then, in
the Type field, select
Additive, and select SUM
as the aggregation function as shown in Figure 17.
Figure 17. Defining the type for a measure as actual need
- After you have completed all of your updates, run the Analyze Model
wizard again. Because you have corrected all of the issues, no errors
are shown in the Problems view, as shown in Figure 18.
Figure 18. Validating the model
Now that the team has created this denormalized dimensional-logical data model, you should transform it to a physical data model so that you can transform the physical data model to Cubing or Cognos model for the business analysis team use and deploy it to a database.
With the workbench, you can transform to an existing physical data model
which can show the model changes during transformation. So the team
decides to enable dimensional notation on the existing
RETAIL_SALES_PDM.dbm data model. Then, they
will transform the
Package1_D.ldm logical data
model into the physical data model. During the transformation process, the
team compares the updates with the existing model to analyze the impact of
their changes on the data model and database.
- Enable dimensional notation for the RETAIL_SALES_PDM.dbm physical data
model as follows.
- Double click the model to open it.
- Right-click the data model, then select Use Dimensional Notation. Dimensional notation is enabled for the physical data model.
- Transform the
Package1_D.ldmdimensional-logical data model into the
RETAIL_SALES_PDM.dbmphysical data model by doing the following.
- Right-click the Package1 package, then select Transform to Physical Data Model. The Transform to Physical Data Model wizard opens.
- On the Target Physical Data Model page,
select the Update existing model option, then
browse for the
RETAIL_SALES_PDM.dbmmodel as shown in Figure 19.
Figure 19. Transform logical data model to existing physical data model
- On the Options page, select the
Generate traceability option, and specify
RETAIL_SALES as the schema name.
Note: The Generate traceability option adds a dependency to each column in the physical data model so that you can use the Analyze impact feature. Traceability helps you to identify dependent objects and their impact on the database.
- On the Filtering Criteria page, specify that you want to include all object types in the comparison.
- On the Output page, view the results of the transformation process, then click Finish to open the Compare editor. You will use the Compare editor to view the changes that you want to make, then implement those changes and generate DDL in order to deploy your changes when the changes are complete.
Note: The Compare editor shows all of the differences between the transformed model and the target data model. You should review the differences and carefully analyze (using the impact analysis feature) to determine how the database and resulting objects are affected.
Next, you will use the Compare editor to merge your changes into the RETAIL_SALES_PDM physical data model.
- Analyze the impact of your changes, then merge your changes as
- Select the STORE_BILLING_DETAILS table in the Structural Compare compartment of the Compare editor.
- Click the Analyze Right Impact button in the
Property Compare compartment of the
Compare editor. The Impact Analysis report is shown in the
Impacted Objects view. In this case, no
objects are impacted due to your updates to the
STORE_BILLING_DETAILS table as shown in Figure 20.
Figure 20. Impact analysis report on table STORE_BILLING_DETAILS in Compare editor
- Merge the remaining differences. Select the RETAIL_SALES node in the Structural Compare compartment of the Compare editor, then click the Copy From Left to Right button in the Property Compare compartment.
Note: You can analyze the impact of each difference separately to locate changes that can severely impact the performance of your database. This article does not analyze each object in detail.
Now that you have merged your changes, you must generate the delta DDL that is required to deploy your changes. The data objects are synchronized, but you need to generate DDL for the SQL objects that were affected. The delta DDL accelerates the design process by creating a script that deploys only the changes that you've made, minimizing the impact to your existing production environment.
- Generate the delta DDL that is required to deploy your changes as
- Click the Generate Right Delta DDL button in
the Property Compare compartment of the
Compare editor as shown in Figure 21.
Figure 21. Generating DDL that captures the changes that you have made to the model
- On the Save and Run DDL page, specify the
RETAIL_SALES_PROJECT folder as the folder
where you will store the DDL script. Specify
DeltaDDL.sql as the file name. Select the
Open DDL file for editing option.
Note: When you select the Open DDL file for editing option, the file opens in the SQL and XQuery editor. You can edit the script to add or remove SQL statements before you run the script on the server.
Alternatively, you can automatically run the script on the server by selecting the Run DDL on server option.
- Click Finish to generate the DDL file.
The physical data model is updated in the Data Models folder of the data design project. An overview diagram of this model is shown in Figure 22.
Figure 22. Dimensional overview diagram in synchronized dimensional physical data model
- Click the Generate Right Delta DDL button in the Property Compare compartment of the Compare editor as shown in Figure 21.
- Analyze the new dimensional-physical data model to make sure that it
is compliant with common enterprise standards as follows.
- Right-click the RETAIL_SALES database of the RETAIL_SALES_PDM physical data model, then select Analyze Model. The Analyze Model wizard opens. By default, all of the physical data model rules are selected.
- Clear the Physical Data Model node, then select the Dimensional Modeling option under the Physical Data Model node. When you select this option, you analyze the dimensional properties of the data model instead of every rule.
Now that the team has created the physical data model, they need to create a model that can be deployed to an instance of Cognos Framework Manager so that you can report on and further modify the model.
Let's follow the team as they generate a new model that can be imported into Cognos Framework Manager. The process is nearly identical to the process outlined in the Transform the logical data model from transactional schema to warehouse schema section that was previously documented in this article.
- Create a new transformation configuration for the physical data model
- Right-click the RETAIL_SALES_PDM.dbm physical data model in the Data Project Explorer, then select New > Transformation Configuration. The New Transformation Configuration wizard opens.
- On the Specify a Configuration Name and Transformation page, specify DPDM2Cognos as the file name, then select Dimensional-Physical Data Model to Cognos/Cubing Model from the list of transformation types.
- On the Source and Target page, select the RETAIL_SALES schema from the RETAIL_SALES_PDM model as the source. Select the RETAIL_SALES_PROJECT data design project as the target.
- On the Properties page, specify the following
- Name source of label and column for Logical / Dimensional View: Select Name from the list of options.
- Target dimensional model: Select Cognos Model from the list of options.
- Click Finish to create the transformation configuration file.
DPDM2Cognos.tcfile is created in the Other Files folder of the data design project. Just as you did before, you must validate the transformation configuration, then run it in order to create the new Cognos model.
- Click the Validate the transformation configuration button. There are no errors displayed in the Console view.
- Click the Run button to run the transformation configuration file and generate the Cognos model. The Cognos model is stored in the XML Schemas folder of your data design project.
You can import this file into Cognos Framework Manager.
The team will import the model into Cognos Framework Manager in order to further modify and report on it.
- Open Cognos Framework Manager.
- Create a new project as follows.
- Click File > New. The New Project window opens.
- Specify RETAIL_SALES as the project name,
then specify a location as shown in Figure 23.
Figure 23. Create new project in Cognos Framework Manager
- Click OK.
- In the Select Language window, select your preferred language. The data modeling team selects English from the list of options and click OK.
- On the Select Metadata Source page of the Metadata Wizard, select IBM Cognos Model and click Next.
- On the Select Model page, browse and locate
the transformed model in your data design project and click
Note: By default, Cognos Framework Manager looks for Framework Manager projects with the
.cpffile extension. In order to locate your file, you should select the All Files (*.*) file type.
- On the Select Objects page, select the
RETAIL_SALES object as shown in Figure
Figure 24. Select objects to be imported in Cognos Framework Manager
- Click Finish to complete the project creation and import the model into Cognos Framework Manager.
The team has imported the model into Cognos Framework Manager. They use the product to create a dimensional view of the dimensional objects in the model as shown in Figure 25.
Figure 25. The dimensional view of imported dimensional objects in Cognos Framework Manager
The team deploys the physical data models and uses Cognos Framework Manager to further improve the model to meet the needs of the enterprise.
Throughout the course of this article, you followed the data modeling team as they created dimensional models by reverse engineering from an existing data source. After the models are improved, then deployed, the company can use the updated dimensional schema to improve data applications. They can use Cognos Framework Manager to develop a model that is suitable for business intelligence operations.
You learned how to denormalize a dimensional schema and how to create reusable transformation configurations that help you transform data models from one type to another. Then, you followed the team as they generated the delta DDL that was necessary to deploy the changes to the database.
Throughout this article, you learned how the relational and dimensional data modeling capabilities of InfoSphere Data Architect accelerates iterative data design and development, speeding up the time from design to deployment.
Thanks to Erin Wilson and Zhen Xing Li for reviewing this article.
- Use an RSS feed to request notification for the upcoming articles in
this series. (Find out more about RSS feeds of
- Read the "Dimensional Modeling: In a Business Intelligence Environment"
redbook to better understand designing dimensional modeling in a business
- Participate in the discussion forum.
- Get the resources you need in the Information Management
area on developerWorks, to advance your skills on a wide variety
of IBM Information Management products.
- Learn more about Information Management at
Information Management zone. Find technical documentation, how-to
articles, education, downloads, product information, and more.
- Follow developerWorks on
- Watch developerWorks on-demand demos
ranging from product installation and setup demos for beginners, to
advanced functionality for experienced developers.
Get products and technologies
- Evaluate InfoSphere Data
Architect V7.6 in the way that suits you best: Download a product
trial of InfoSphere Data Architect
V7.6 to help you learn how to efficiently create dimensional
- Build your next
development project with IBM trial
software, available for download directly from developerWorks, or
spend a few hours in the SOA Sandbox learning how to
implement Service Oriented Architecture efficiently.
- Participate in the discussion forum.
- Get involved in the My developerWorks
community. Connect with other developerWorks users while exploring
the developer-driven blogs, forums, groups, and wikis.
- Check out the developerWorks
blogs and get involved in the developerWorks
Yun Feng Bai is a staff software engineer in China Development Lab, Beijing, China. He is currently focused on InfoSphere Data Architect QA area. Previously, he worked on DB2 Data Warehouse (renamed InfoSphere Warehouse), focusing on OLAP modeling and SQL warehousing.