Why integrate your metadata?
Enterprises and organizations deal with huge volumes of data on a daily basis. This data drives many day-to-day activities such as business decisions and regulatory reporting. In order for you to use data effectively, your IT systems must provide visibility into your metadata. This visibility leads to increased trust in data reliability, increased agility, and improved common understanding throughout your enterprise.
Some of the common problems faced by businesses and technical users in regard to understanding data are:
- Lack of trusted information
- Where does the data come from?
- How reliable is it?
- Lack of agility
- Was the data modified recently?
- If I change the data, will other systems be affected?
- Lack of common semantics
- Do "customer" and "client" conceptually mean the same thing?
- What should I call a specific entity so that the rest of the enterprise understands what I mean?
These problems can be solved through an enterprise architecture that integrates business and technical metadata from various sources. At a high level, metadata refers to information about data, such as its definition, structure, source, etc. For a more detailed description of metadata, refer to the "Integrating heterogeneous metadata" developerWorks article that is linked to in the Resources section.
A typical data integration architecture involves data from various source systems being extracted, transformed, and loaded (ETL) into data warehouses, marts, and cubes. To satisfy the needs of business users, the data is further integrated by reporting tools and visually presented as dashboard views. Technical users, on the other hand, work with the IT aspects of the integration. Based on this architecture, metadata integration (which is the solution to the issues in context) deals with integrating the following technical and business data:
- Data about what source systems were used to populate the data warehouse, marts, and cubes
- Data about what ETL jobs were used to perform the transformations
- Data about what data mart tables were used to populate a specific report
- Data about what specific terms in a report mean (business metadata)
The key to achieving such integration is to build an enterprise-level metadata repository that acts as a single source of truth for all metadata requirements. When such a repository is in place, different applications and users can link to this repository to build lineage and traceability solutions that help answer the questions listed in the previous section.
Figure 1 shows the outline of an enterprise metadata integration solution. The labels above the dotted lines refer to the various types of metadata (report names and packages, database tables, ETL jobs, and data source names).
Figure 1. Enterprise metadata integration solution
Note: The figure above shows the technical perspective of the integration architecture. Therefore, it does not show business metadata being populated into the metadata repository.
IBM tooling for metadata integration
The solution described in this tutorial is based on IBM InfoSphere Information Server V8.1, Cognos 8 Business Intelligence (BI) V8.4, and Import Export Manager V8.1, fixpack 1.
IBM InfoSphere Information Server is a data integration software platform that helps organizations derive useful information and value to business from simple or complex data sourced from multiple systems. The platform is composed of various components that profile, cleanse, transform, and integrate data, in order to deliver useful and meaningful information. The components of InfoSphere Information Server V8.1 that this tutorial uses to demonstrate metadata integration are:
- Metadata repository
- InfoSphere Metadata Workbench (referred to in the rest of this tutorial simply as Workbench)
- Import Export Manager
- InfoSphere Business Glossary
- InfoSphere DataStage
The InfoSphere metadata repository acts as a centralized data store for all the metadata that is available across the various other components such as DataStage and Business Glossary. Workbench plays a critical role in establishing the automated link between the data from various sources and provides useful lineage reports and impact analysis details. The Import Export Manager component consists of bridges (MetaBroker is also a bridge) that help in importing and exporting metadata to and from the metadata repository. It supports various data sources such as data files, database tables, data models, business glossary, Cognos reports, etc.
Figure 2 shows the relationship between the business user and technical user perspectives of the Cognos BI and InfoSphere integration.
Figure 2. Cognos BI and InfoSphere integration
The integration means that business users can use web links to navigate to the business glossary and lineage reports from the business reports.
For technical users, the integration means that:
- The report metadata from Cognos BI, ETL job metadata from DataStage, and the warehouse metadata from Database, need to be mapped to each other to enable traceability through lineage reports.
- The glossary data needs to be populated in the metadata repository to enable the glossary integration.
- For the linkage between the Cognos BI reports and the InfoSphere tools (Glossary and Workbench) to work, Cognos BI must be configured with the right URIs.
Steps for integrating metadata
This tutorial explains how you can implement this process of data mapping and configuration by following these steps:
- Use Import Export Manager to bring metadata about data files, data tables, business terms, reports, and models into Workbench.
- Establish manual and automated links between the metadata.
- Configure Cognos BI for InfoSphere integration.
The first step to integrate your metadata is to pull the metadata into the repository. MetaBrokers and bridges play a critical role in importing metadata from various tools into the metadata repository. You can also use these tools to export metadata from the repository to InfoSphere Information Server.
There are various types of bridges available for importing and exporting metadata for database tables, reports, models, user information, etc. This tutorial shows you how to use these bridges to import metadata into Workbench.
Importing metadata into Workbench
Importing metadata into Workbench involves the following tasks:
- Importing database objects
- Importing and publishing sequential files
- Importing Cognos BI OLAP models and report definitions
- Viewing DataStage jobs
- Importing Business Glossary terms
- Creating steward
Importing database objects
Follow these steps to use ODBC 3.0 MetaBroker to import the metadata that describes the database schema and tables into Workbench:
- Open the Import Export Manager tool, select ODBC 3.0
MetaBroker for the source as shown in Figure 3, and click
Figure 3. Import Export Manager showing ODBC 3.0 metabroker
Alternatively, from DataStage Designer you can use the Via bridges import option to display the Bridge Selection dialog as shown in Figure 4. From the dialog, select ODBC 3.0 MetaBroker and click OK.
Figure 4. DataStage and QualityStage Designer showing bridge selection
Use the ODBC Data Source Administrator tool to set up the ODBC DSN name as the system
DSN as shown in Figure 5.
Figure 5. ODBC Data source administrator showing the configured system DSN
From the Parameter Selection dialog, select the Verbose option as
shown in Figure 6 and click OK.
Figure 6. ODBC 3.0 MetaBroker parameters
- From the Schema\Table selection list pop-up window, select the preferred database schema and table that need to be
imported as shown in Figure 7 and click OK.
Figure 7. Database schema\table selection screen
- After the processing completes, click Filter on the status pop-up window.
From the Metadata Selection dialog,
select specific tables or the entire
schema as shown in Figure 8 and click OK.
Figure 8. Metadata selection-database filter option
The database metadata is now imported into the metadata repository and is available
for use by all components of InfoSphere Information Server.
Repeat these steps to import other database schemas and tables.
You can view the imported database metadata in
Workbench as shown in the Figure 9.
Figure 9. Workbench showing data servers and tables
Importing and publishing sequential data files
The source data for DataStage jobs is either in the form of database tables or sequential files. Also, the lookup data for jobs can be in the form of sequential files. It is important to import the metadata about these files into the metadata repository for its use and for dependency analysis. To do this, follow these steps:
- From DataStage and QualityStage Designer, select
Table Definitions > Sequential File Definitions
as shown in Figure 10.
Figure 10. DataStage designer showing sequential file table definition option
- From the Import Meta Data (Sequential) dialog,
enter details such as source directory, file name, and project folder
as shown in Figure 11
and click Import to start the import process.
Figure 11. Import metadata — sequential file
- From the Format tab of the Define Sequential Meta Data dialog, select the required delimiter
as shown in Figure 12.
Also, select the Fixed-width columns and First line
is column names options as needed.
Figure 12 Define sequential metadata format
- From the Define tab of the Define Sequential Meta Data dialog,
select the column name to use as the key as shown in Figure 13 and click
OK to complete the import.
Figure 13. Sequential file metadata — column definition
- Repeat these steps to import all the required sequential files.
Publishing the sequential file table definition to the metadata repository
Publishing the sequential file table definitions helps you obtain data dependencies. It also helps you understand the jobs that are dependent on the selected sequential files and gives a complete end-to-end data lineage report from source sequential file to Cognos reports. To publish sequential file table definitions, follow these steps:
- From the DataStage Designer, navigate to your newly created table definition from
Right click the table definition and select Shared Table Creation Wizard
as shown in Figure 14.
Select the respective table definition and click Next.
Figure 14. Shared Table Creation Wizard selection option
From the Create or Associate Tables screen of the Shared Table Creation Wizard, select
From the Create New Table dialog, specify the host system and directory path of
the file as shown in Figure 15 and add a note to document the intended use of the file.
Click OK, confirm the settings, and create
the shared table definition.
Figure 15. Shared table creation wizard
- You can now view the published sequential files in Workbench as shown in Figure 16.
Figure 16. Workbench — Data File properties
The structure of an imported sequential file upon publication as a shared table includes the data file, data file structure, and data file field as shown in Figure 17.
Figure 17. Structure of imported sequential file
Importing Cognos OLAP models and reports into Workbench
The bridges to import, report, and model are:
- Cognos 8 BI Reporting — Content Manager for reporting the multi-models, reports, etc.
- Cognos 8 BI Reporting — Content Manager Packages for importing the models
- Cognos 8 BI Reporting — Content Manager Report Studio for importing only the reports
Follow these steps to use the Import Export Manager to import the Cognos reports and OLAP models into Workbench:
Figure 18 shows the Import Export Manager with the Cognos BI bridges highlighted in the
Source selection list.
Based on your requirements to import the model, report, or both,
select the appropriate bridge and click Start.
Figure 18. Import Export Manager showing Cognos bridge option
- On the parameter selection pop-up window, enter values for the required parameters
version, login server URL, and repository subset:
- If you are using Cognos BI 8.4, you must select Cognos 8.3 to 8.4 for the version parameter. Otherwise, it will not provide the lineage from Job (source) to Cognos report.
- Specify your Cognos dispatch URL information for the login server URL.
- Specify your Cognos content folder/package information for the repository subset.
You can find this information in the Cognos Report property section as follows:
- Select Cognos Content > Report location to go to the Cognos Report folder.
- Click More.
- Click Set properties.
- Click View the search path, ID and URL as shown in Figure
Figure 19. Cognos search path
- Copy the content of the Search path box.
- Click Close and paste the search path content into the Repository subset field. Do the same for all the reports in Cognos.
Enter optional parameters (for example, host system name, database name, schema name,
DBMS name, and DBMS server instance name) as shown in Figure 20.
Figure 20. Import export manager showing the parameters for Cognos 8 BI reporting content manager
- When you are finished entering parameters, click OK to begin the import process.
- When the import process is finished and you receive a "Completed successfully" message in the status pop-up window, select Filter > Add All > OK.
From the Parameter Selection pop-up window,
enter the Metadata Server parameters as shown in Figure 21.
The mandatory parameters are
host name, port number, username, and password.
If you want
the bridge to check for duplicate objects with the same name before import,
select the Check for Duplicates option.
If a duplicate exists, the import fails and
you would then need to execute the import activity again.
Figure 21. Import Export Manager — parameter selection for Metadata Server
- Upon successful completion,
you may have to repeat the steps to import other
reports and models.
You can import all the reports in a particular package by specifying
//reportson the repository subset parameter as shown in the example below:
Repository subset=/content/package[@name='DEMO ERS']//reports
Once you have successfully imported the Cognos report and model into the metadata repository, you can view them from Workbench as shown in the Figure 22.
Figure 22. Metadata Workbench showing the BI reports imported into repository
Viewing DataStage jobs in Workbench
All components of the InfoSphere product suite share the same metadata repository. Therefore, the DataStage jobs are present in the same metadata repository and need not be imported.
You can view the DataStage jobs from Workbench as shown in Figure 23.
Figure 23. Metadata Workbench showing the DataStage jobs and stages
Importing Business Glossary
You can import a Business Glossary into Workbench in three ways:
- Create terms and categories manually from the Information Server console.
- Import a csv files containing the glossary terms.
- Import terms and categories from the glossary model in InfoSphere Data Architect, a modeling tool.
For more details on importing a Business Glossary, refer to the "Sharing a common vocabulary across InfoSphere Foundation Tools" developerWorks article that is linked to in the Resources section.
Figure 24 shows an example of how an imported business glossary appears in Workbench.
Figure 24. Metadata Workbench showing imported business terms
Creating stewards in Workbench
Stewards are users who are responsible for a particular IT asset. When you create stewards, you assign them specific assets that they are responsible for.
By using the Glossary tab of the Information Server web console, you can also create stewards who are the owners of an IT asset in Workbench. These stewards are the individuals who should be contacted in case of emergency or if someone needs additional information about the asset.
Figure 25 shows details of a steward user.
Figure 25. Information Server web console Glossary tab, showing stewards
Figure 26 shows the assets that the steward manages.
Figure 26. Information Server web console Glossary tab showing asset managed by steward
Any user who has Information Server administration rights can add a steward, edit steward details, and add or remove assets managed by a steward.
You can view steward details in Workbench as shown in Figure 27.
Figure 27 Metadata Workbench showing the steward details
Mapping metadata in Workbench
In order to view lineage and impact analysis reports, internally map heterogeneous metadata that has been imported from various sources into Workbench to other related metadata. This section of the tutorial shows you how to perform the following types of mapping that are required to view lineage reports:
- Mapping database instances
- Mapping database alias and parameters
- Automated mapping of DataStage job stages with database tables
- Manual mapping of DataStage job stages with database tables
Mapping database instances
There may be two instances of the same database listed on the Data servers tab.
- The first instance corresponds to the manual import of database objects.
- The second instance is created when a Cognos report is imported with its database schema.
Link the DataStage jobs and Cognos reports by creating the following mappings between the databases mentioned above:
- Cognos reports to the database tables
- Database tables to the DataStage jobs
On the Advanced tab of Workbench, select Data source Identity as shown in Figure 28 and map both the databases.
Figure 28. Data Source Identity selection on Advanced tab of Metadata Workbench
Mapping database alias and parameters
On the Advanced tab of Workbench, select Database Alias as shown in Figure 29 and map all the variables to the respective database if they are not mapped automatically.
Figure 29. Database Alias selection on Advanced tab of Metadata Workbench
Automated mapping of DataStage stages to physical database tables
After completing the steps described above, select Automated Services on the Advanced tab of Workbench. Select the project for which you want the Workbench to automatically detect the linkage and click Run as shown in Figure 30. When the job is finished, the job status changes from "in progress" to "complete."
Figure 30. Automated Services selection on Advanced tab of Metadata Workbench
Automated services analyze the following types of DataStage stages:
- Most stages that connect to databases and data source files
- Passive stages that connect to sources of data
Automated services do not analyze stages from active stages that simply transform data, such as the transformer, aggregator, sort, and join stages.
Manual binding of DataStage job to physical database tables
There may be instances when automated services do not bind the DataStage stages to the data items. In these cases, select Data Item binding on the Advanced tab of Workbench as shown in Figure 31 to bind DataStage stages to physical data items.
Figure 31. Data Item Binding selection on Advanced tab of Metadata Workbench
Viewing reports in Metadata Workbench
The data lineage and job lineage reports show the movement of data within a job or across multiple jobs. Impact analysis reports show the dependencies between assets.
You can start a report from the task list of an asset information page or from the context menu of an asset in a results list. You can track the flow from source to target or from target to source. This section of the tutorial describes these three types of reports:
Data lineage reports show the flow of data to or from a selected metadata asset, through stages and stage columns, across one or more jobs, into databases and business intelligence (BI) reports. Open any report in the Workbench as shown in Figure 32 and click Data Lineage. This displays a Report Finder-Data Lineage window similar to what is shown in Figure 33.
Figure 32. Metadata Workbench showing imported Cognos reports
From the Report Finder-Data Lineage window, select the Where does data come from? radio button and click Create Report. This displays a Data Lineage report screen that shows the BI Report Collection, Database Tables, DataStage stages, and Data File element linked to the selected Cognos BI report.
Figure 33. Metadata Workbench showing data lineage for a BI report
To see a graphical view of the data lineage, click the Graph View button at the bottom of the Data Lineage report screen. Select the appropriate database column or DataStage stage to be traced back and click Graph View to see a graphical view similar to what is shown in Figure 34.
Figure 34. Metadata Workbench showing graphical view of data lineage
Figure 35 shows an example of the linkage to the actual source data file.
Figure 35. Metadata Workbench showing data lineage report with linkage to data source file
Job lineage shows the flow of data to or from a selected metadata asset across one or more jobs, database tables, database views, data file elements, business intelligence (BI) reports, and information services operations. From the BI report you have open in Workbench (Figure 32), select Job Lineage. This displays a Report Finder-Job Lineage window similar to what is shown in Figure 36. From the Report Finder-Job Lineage window, select the Where does data come from? radio box and click Create Report. This displays a Job Lineage report screen that shows information about the linked DataStage jobs along with DB tables and BI report collection.
Figure 36. Metadata Workbench showing a job lineage for BI report
To see a graphical view of the job dependency similar to what is shown in Figure 37, click the Graph View button at the bottom of the Job Lineage report screen.
Figure 37. Metadata Workbench showing job lineage graphical report
The impact analysis report displays the other assets that depend on the selected asset or the other assets that the selected asset depends on. To view the impact analysis report, select an asset, such as a physical database table, and click Impact Analysis. This displays data such as dependent jobs, tables, reports, and models in an impact analysis report similar to what is shown in Figure 38.
Figure 38. Metadata Workbench showing Impact analysis report
Figure 39 shows an example of the graphical view of the dependency between DataStage stages and the database tables.
Figure 39. Graphical view of Impact Analysis
Configuring Cognos 8 BI for lineage reports and Business Glossary
To view a lineage report, select the Lineage link that appears on the pop-up menu of the Cognos report. You can view the glossary definition of the report field by selecting the Glossary link. Figure 40 shows an example of these links.
Figure 40. Cognos report showing lineage and glossary option
To configure lineage and glossary from Cognos reports to InfoSphere Metadata Workbench, you need to set the URI value by navigating to Cognos Administrator > Configuration > Dispatcher and Services Properties as shown in Figure 41.
Figure 41. IBM Cognos Administrator showing selection of Dispatcher and Services Configuration Properties
To configure lineage, set Metadata Information Services URI to a value of:
<ip> is the Workbench
server IP address.
To configure glossary, set the Business Glossary URI to a value of:
localhost with the server IP address.
Viewing the glossary for any report field
As shown in Figure 40,
you can open any report in Cognos Connection and right click on any term
to view a link to a glossary.
Click the Glossary link, login on the InfoSphere
Business Glossary login screen, and go to the list of Business Glossary terms and categories screen.
The terms and categories screen contains a set of
extended search results that includes terms for all the values that match the
words in the term you selected.
For example, the term:
Value adjustments and provisions associated with the original
would be split into
original, etc. as shown in Figure 42
Figure 42. Glossary search result for selected report field
Viewing lineage for any report field from Cognos 8 BI
To view a data lineage report, open any report in Cognos Connection or any other Cognos 8 BI product component (for example, Framework Manager, Report Studio, etc.) and select the lineage link on the pop-up menu as shown in Figure 40. This takes you to the InfoSphere Metadata Workbench and shows a data lineage report for the selected BI report field similar to what is shown in Figure 43.
Figure 43. BI report field data lineage in Metadata Workbench
This tutorial showed you how to enhance the business intelligence system of an enterprise with lineage reports and a business glossary by using IBM Cognos and InfoSphere products to integrate the technical and business metadata of your the enterprise.
Special thanks to Marc Haber and Hayden Marchant from IBM Metadata Development at the Israel Software Lab for reviewing this tutorial and for their guidance and support on metadata products. Thanks also to Chintamani M. Chhatre who is the Industry Architect for Banking and Financial Markets at Global Business Solution Center, India for his support in publishing this tutorial.
- Learn more about IBM InfoSphere Information Server on ibm.com.
- Consult this support document for a list of supported brokers and bridges for Information Server Version 8.1 and 8.2.
- To learn more about Business Glossary integration, read "Understanding leads to trust: Sharing a common vocabulary across InfoSphere Foundation Tools" (developerWorks, February 2009).
- "Integrating heterogeneous metadata: Usage scenarios for integrating metadata from IBM Cognos Business Intelligence and InfoSphere Information Server" (developerWorks, July 2009) explains what metadata is, and explains how to use it to derive value from complex, heterogeneous information.
- To learn about extended lineage report, read " Exploring data lineage: Get a complete picture of your data flows"(developerWorks, January 2010).
- Get more information on IBM InfoSphere Information server at the IBM InfoSphere Information Server Information Center.
- Get more information on Cognos BI at IBM Cognos 8 BI Information Center.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Participate in the discussion forum.
- Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Keep up with the best and latest technical info to help you tackle your development challenges.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.