Integrating heterogeneous metadata

Usage scenarios for integrating metadata from IBM Cognos Business Intelligence and IBM InfoSphere Information Server

This article describes the importance of using metadata to help you derive more value from the complex, heterogeneous information spread across the enterprise. You will walk through usage scenarios for integrating Cognos™ Business Intelligence reports and IBM InfoSphere™ Business Glossary. You will also learn about using data lineage, impact analysis, and customized metadata ad-hoc reports.

Share:

Werner Schuetz (werner_schuetz@de.ibm.com), IBM Certified IT Specialist, IBM  

Werner Schuetz photoWerner Schuetz is an IBM Certified IT Specialist, IBM Certified Advanced Database Administrator, and Certified Application Developer for DB2 9 for Linux, UNIX, and Windows. He works as a Cognos and Metadata Integration Solution Specialist in the European Information on Demand Technical Center of Excellence. He develops both industry and non-industry-specific showcases for metadata integration.



23 July 2009

Also available in Spanish

Introduction

Mergers and acquisitions, globalization, competitive pressures, and various other factors drive the need for companies to effectively leverage their information and gain more business value out of it. But often these business conditions offer a wide range of information-related challenges. They deal with massive volumes of data, often without much insight into the content, quality, and structure of that data. Complex business transactions from customers and partners plus operational information moving within the enterprise are often the basis upon which key business decisions are made. These decisions are often undermined by the lack of insight and understanding of the data. Common questions arise as a result of this lack of insight and understanding, including:

  • Where is my customer data?
  • How do I find the information I need?
  • What does this information mean in the current business context?
  • Can I trust this information?

Additionally, the lack of a common understanding of the relevance of information between technical and business users affects the usefulness of delivered information.

IBM InfoSphere Information Server is a data integration software platform that helps organizations derive more value from the complex, heterogeneous information spread across their systems. By providing automated data profiling and analysis capabilities necessary to unlock the mystery of source data content, quality, and structure, it builds the foundation for enabling metadata-driven integration.

This article introduces the idea of metadata and demonstrates the metadata integration capabilities of the IBM InfoSphere Information Server. Usage scenarios demonstrate integrating Cognos Business Intelligence reports and IBM InfoSphere Business Glossary as well as using data lineage, impact analysis, and customized metadata ad hoc reports.

What is metadata?

Metadata is data about data, or, in other words, descriptive information about something. Consider a food can as an example. The can itself is the container, but the information around the outside, such as the trademarks, content, composition, weight, nutrition panel, and expiration date, is the metadata, as shown in Figure 1.

Figure 1. A food can that shows metadata
A drawing of a food can label

In an IT context, metadata can describe the design, development, and implementation of information systems, and it can describe the movement of data. For example, the following are considered metadata:

  • The description of a database table
  • The relationships between database objects
  • The definitions of the columns in the table
  • The data types of the table's columns
  • The derivations that transform the columns

Metadata enables you to put context and meaning to things. It is generated and used by each organization and software product.

Types of metadata

Metadata can be categorized into three types:

  • Business metadata
    • Business metadata is critical to providing context for an integration project. It helps define terms in every-day language, without regard to technical implementation. For example, the language used to describe what a customer is and how to categorize a customer is often business-specific, and it might differ between company divisions.
    • Examples: business rules, stewardship, business definitions, auditing terminology, glossaries, algorithms, and lineage using business language
    • Audience: business users
  • Technical metadata
    • Technical metadata is often used by more technical staff, such as developers. It includes items such as table definitions and data types. These objects are used frequently during the application design and development process.
    • Examples: definition of source and target systems, their table and fields structures and attributes, documentation for auditing derivations and dependencies
    • Audience: specific tool users (BI, ETL, profiling, modeling)
  • Operational metadata
    • Operational metadata refers to the metadata generated and captured when a process executes. It allows administrators to manage the system and ensure things are running smoothly. Operational metadata also helps administrators troubleshoot issues if there is a problem with a process.
    • Examples: information about application runs, including their frequency, record counts, component-by-component analysis, and other statistics for auditing purposes
    • Audience: operations, management and business users

Unifying these types of metadata creates an end-to-end relationship, enabling users to understand not just where information is stored and what happened to it as it moved through the organization, but also the business context of that information.

Using IBM InfoSphere Information Server to integrate metadata

IBM InfoSphere Information Server combines the technologies within the IBM Information Integration Solutions portfolio into a single, unified platform that enables companies to understand, cleanse, transform, and deliver trustworthy and context-rich information. This unified platform delivers shared metadata that is leveraged across the various components that make up InfoSphere Information Server. Several of these components play a key role in metadata integration. The subset of components related to the scenarios that are shown in this article are:

  • IBM InfoSphere Business Glossary. IBM InfoSphere Business Glossary enables business analysts and subject-matter experts to create, manage, and share a common enterprise vocabulary and classification system. This enables users to link business terms to more technical artifacts that InfoSphere Information Server manages. These business terms are shared with InfoSphere Data Architect, InfoSphere Information Analyzer, and InfoSphere FastTrack, creating a common set of semantic tags for reuse by data modelers, data analysts, business analysts, and end users. IBM InfoSphere Business Glossary Packs for Industry Verticals accelerate the implementation and deployment of Business Glossary by immediately providing rich, relevant, and industry-specific glossary content to any business. They are currently available for the Financial Markets & Banking, Insurance, Healthcare, Telecommunication, and Retail industries.
  • IBM InfoSphere Metadata Workbench. IBM InfoSphere Metadata Workbench offers key metadata visualization and exploration capabilities for Information Server, modeling, and business intelligence metadata assets. Acting as a control station, the workbench provides cross-tool impact analysis and data lineage, as well as extensive reporting and administration functions for Information Server.
  • Bridges and MetaBrokers in IBM InfoSphere Import Export Manager. Bridges and MetaBrokers in IBM InfoSphere Import Export Manager convert metadata from one format to another by mapping the elements to a standard model that translates the semantics of the source tool into the semantics of the target tool. They are used to import metadata from a particular tool, file, or database into the metadata repository of InfoSphere Metadata Server. They are also used to export metadata from the metadata repository to tools from an independent software vendor. See Resources for a list of bridges that are supported by IBM InfoSphere MetaBrokers and Bridges, Version 8.1.
  • IBM InfoSphere Information Analyzer. InfoSphere Information Analyzer evaluates the content and structure of your data for consistency and quality, and it helps to improve the accuracy of your data by making inferences and identifying anomalies.

In the scenarios in this article, these components are used as follows:

  • IBM InfoSphere Business Glossary. The IBM InfoSphere Business Glossary Pack for Retail is imported into IBM InfoSphere Business Glossary in the form of a Glossary Archive (XMI) file.
  • Bridges and MetaBrokers in IBM InfoSphere Import Export Manager. The Cognos metadata have been imported using IBM InfoSphere Import Export Manager (Cognos 8 BI Reporting - Content Manager Import Bridge) into the metadata repository of IBM InfoSphere Information Server.
  • IBM InfoSphere Information Analyzer. The content and structure of the database tables for the Sales Analysis project are evaluated and published by InfoSphere Information Analyzer and can be regarded in the Table Analysis view in IBM InfoSphere Business Glossary.
  • IBM InfoSphere Metadata Workbench. IBM InfoSphere Metadata Workbench is the main tool for metadata visualization and exploration.

Integrating Cognos Business Intelligence Reports and IBM InfoSphere Business Glossary

This scenario demonstrates the direct link from a Cognos report to IBM InfoSphere Business Glossary. A business user reading his Cognos BI report directly invokes IBM InfoSphere Business Glossary in order to get more information about some terms in the report. With the help of the Business Glossary browser, he not only gains an understanding of the used terms, but he also receives more information, such as synonyms, abbreviations, related terms, and IT assets, as shown in Figure 2. He also discovers the responsible data steward and his contact details.

Figure 2. Cognos Sales Analysis Report
Screen caps: Cognos Sales Analysis Report showing sales revenue and targets

Click here for a larger view of Figure 2.

Studying his sales analysis report, a business user analyzes Revenue by Customer, Revenue by Product, and Customer Revenue Ranking. But when reading the Customer Target Attainments, he is not absolutely sure about the meaning of the column titles YTD, FYT, and TAP. Because Cognos was configured to directly access IBM InfoSphere Business Glossary or IBM InfoSphere Metadata Workbench data lineage out of a Cognos report, he immediately invokes IBM InfoSphere Business Glossary, as shown in Figure 3.

Figure 3. Invoking Business Glossary
Screen cap: Customer 2008 Targets Attainment with YTD right-clicked

He sees a short description for YTD: Year to Date

Figure 4. Business Glossary entry window
Screen cap: YTD is defined as Year To Date

Drilling down for more information, the business user opens the Business Glossary browser to obtain more detailed information, such as a long description, the responsible data steward, synonyms, abbreviation, and related IT assets. For example, he discovers that the term Year to Date (YTD) is also related to the terms Full Year Target (FYT) and Target Attainment Percentage (TAP), as shown in Figure 5. These terms are also used in his sales analysis report.

Figure 5. Business Glossary term details
Screen cap: Terms related to YTD

Click here for a larger view of Figure 5.

The business user also wants to know who the responsible Data Steward is, which assets she manages, and how to contact her. This information is in the Business Glossary user details window, as shown in Figure 6.

Figure 6. Business Glossary user details window
Screen cap: Contact information for user Jackie Steward

He decides to email the data steward by clicking on her email address, as shown in Figure 7.

Figure 7. Business Glossary email contact
Screen cap: Email window launched from user details window

He is also interested in more information about the database table on which the BI report is based and on its content and quality. First he looks at the underlying database summary table. From the Assigned Assets section in the Term Details window (Figure 5), he drills down into database table SALES_SUM to see detailed information about the database columns and their primary keys, as shown in Figure 8.

Figure 8. Database Table Details
Screen cap: Showing links for CUSTID, PRODID, and so on

From the Analysis section, the business user drills down into the Table Analysis Summary Details, which provide information about the content and structure of the SALES_SUM table relating to consistency and quality, such as number of fields, number of rows, and whether there are any primary key duplicates or foreign key violations. IBM InfoSphere Information Analyzer published this information, as shown in Figure 9.

Figure 9. Table Analysis Summary Details
Screen cap: Showing SALES_SUM details, including 5 fields and 348 rows

Creating data lineage reports using the IBM InfoSphere Metadata Workbench

Metadata workbench users can create reports that analyze the flow of data from data sources, through jobs and stages, and into databases, data files, and business intelligence reports. The data lineage report shows the movement of data within a job or across multiple jobs, and it shows the order of activities within the run of a job.

Suppose a technical IT staff member has been assigned new responsibility for the Sales Analysis project. New enhancements for this project are expected any day, and he has very limited time to learn the details. With the help of the Metadata Workbench, he can come up to speed quickly.

As described in the following sections, the staff member begins by taking a look at the metadata of the Sales Analysis report in the Metadata Workbench. He explores a graphical view of all related objects, and he looks through the detailed metadata coming from the Cognos BI report. Then he examines the data flow representing where the data came from that the Metadata Workbench data lineage report offers for the BI report. Then he studies the details about ETL jobs and operational metadata.

Metadata from the Cognos BI report

The staff member completes the following steps to look at the metadata of the Sales Analysis report in the Metadata Workbench.

  1. The staff member opens the Metadata Workbench, as shown in Figure 10.
Figure 10. Metadata Workbench welcome page
Screen cap: Metadata welcome page with BI Report for Asset Type and Sales Analysis typed in for Contains
  1. From the Sales Analysis BI Report entry, he invokes the Graph View. He sees an overview of the responsible data steward, the related terms, BI report collections, report fields, and database tables, as shown in Figure 11.
Figure 11. Graph View for analysis report
Screen cap: Vertical flow chart of Sales Analysis BI report

Click here for a larger view of Figure 11.

  1. The staff member opens the details for the Sales Analysis BI Report and sees detailed information about BI report collections and report fields, database tables, stewards, and terms, as shown in Figure 12.
Figure 12. BI report details
Screen cap: BI report details, including TAP listed as a Report Field
  1. He sees details about the report field TAP, including the definition of the term TAP in IBM InfoSphere Business Glossary, the calculation expression, and the underlying database columns, as shown in Figure 13.
Figure 13. BI report fields details
Screen cap: Details of TAP BI report, including Database Column TARGET
  1. The staff member sees more information about the database field TARGET and other details, such as the database, table, data type, length, uniqueness, nullability, related terms, and responsible data steward, as shown in Figure 14.
Figure 14. Database column details
Screen cap: Details about the TARGET database column

Data lineage reports

After having gained insight into the BI report, report fields, and database columns, the staff member invokes a data lineage report called Where does the data come from? in order to explore the movement of data across multiple jobs.

He selects to examine an end-to-end perspective from the Sales Analysis BI Report to the Load_Sales_Sum ETL job in a graphical view, as shown in Figure 15.

Figure 15. Data lineage graphical view
Screen cap: DataStage Image selected from Load_Sales_Sum

Click here for a larger view of Figure 15.

Note: Because the BI report is at the top level of this scenario's perspective, only the Where does the data come from? lineage report is invoked. However, in scenarios with an object at a lower level, the Where does the data goto? report presents the flow of the data toward the top level.

The staff member sees a complete overview of the data flow underlying the BI report to obtain information about the BI report model, the database tables, and the ETL jobs from the Where does the data come from? data lineage report, as shown in Figures 16, 17, and 18.

Figure 16. Underlying BI report model
Screen cap: Data lineage report finder for Sales Analysis BI Report selecting DWORKS information
Figure 17. Underlying database tables
Screen cap: Data lineage report finder selecting DWORKS information

Note: The view in Figure 17 shows both database table metadata from a DB2 database DWORKS and table metadata from the BI report (database DWRKS_CG).

Figure 18. Underlying ETL jobs
Screen cap: Showing parallel DWORKS jobs

The staff member examines all the processing steps underlying the summary table SALES_SUM, which is analyzed by the BI report. He calls data lineage from the SALES_SUM database table entry in the BI report details (see Figure 12) to see a list of all involved processing steps, as shown in Figure 19.

Figure 19. SALES_SUM asset list
Screen cap: Data lineage report finder for SALES_SUM BI Report showing Stage tab

To explore the end-to-end processing path from the first input file with the customer transactions up to the population of the summary table, he clicks Display Final Assets, and he selects the Sequential_Transactions stage, as shown in Figure 20.

Figure 20. Final assets for summary table SALES_SUM
Screen cap: Data lineage report finder for SALES_SUM BI Report selecting Sequential_Transactions asset

The staff member can now sustain a detailed processing chain from customer transactions input up to the population of the summary table, which explains the flow of the data by the processing stages, ETL jobs, and database tables, as shown in Figure 21.

Figure 21. Processing chain from customer transactions input up to the population of the summary table
Screen cap: Showing Sequential_Transactions to SALES_SUM in a list

The graphical view displays all end-to-end processing steps for all paths (including parallel paths) in one diagram, as shown in Figure 22.

Figure 22. Graphical view of processing chain
Screen cap: Horizontal flow chart showing Sequential_Transactions to SALES_SUM

Click here for a larger view of Figure 22.

ETL job details and operational metadata

Finally, the staff member analyzes the metadata information about the last ETL job LOAD_SALES_SUM, populating the summary table. He pays special attention to the job design and operational information, as shown in Figure 23.

Figure 23. ETL job and job design
Screen cap: Parallel Job: Load_Sales_Sum details

In the job operational information, he gains insight into the job runtime, job completion status, and job runtime parameters, as shown in Figure 24. The number of records read and written in each processing step helps the staff member trace whether the correct volume of data was processed. This operational information will be very helpful to debug problems later, such as inconsistent data or ETL job failure.

Figure 24. ETL job operational information
Screen cap: Showing details of Job Runs

Understanding impact analysis reports

Impact analysis reports show the dependencies between assets. For example, a new business application requires that the length of a database field should be enlarged. Impact analysis offers a better understanding about other data files, database fields, jobs, and BI reports that this change will affect.

In the example scenario, a company merger requires the length of the database field TOT_CUST (total revenue per customer) to be increased from 8 to 10. The IT technical staff member examines the impact this change will have on other BI report objects, database columns, and columns within ETL jobs. The metadata are shown in Figure 25.

Figure 25. Metadata for database column TOT_CUST
Screen cap: Details for database column TOT_CUST in SALES_SUM table

By invoking the What depends on this asset? impact analysis report, the staff member sees the information about dependent BI report members, report fields, and report items, as shown in Figure 26.

Figure 26. The What depends on this asset? impact analysis report
Screen cap: Report showing Asset Name TAP in BI Report Field and Asset Name Query1.TOT_CUST in BI Report Item

By invoking the What does this asset depend on? impact analysis report, the staff member sees the information about stage and database columns that the database column TOT_CUST depends on, as shown in Figure 27.

Figure 27. The What does this asset depend on? impact analysis report
Screen cap: TOT_CUST report showing PRICE as Stage Column and QTY as Database Column

As the staff member gains in-depth insight into the impact of the database field length increase, he can give a better cost estimate for this modification.

In another example, the IT Operational Manager has been notified that an ETL job for loading the fact table has been aborted when reading its input in the processing stage DB2_Input_Facts. He needs a quick overview about the impact on other application components, including ETL stages, database tables, and BI reports. The What depends on this asset? impact analysis report delivers the needed information, as shown in Figure 28.

Figure 28. The What depends on this asset? impact analysis report on job abortion
Screen cap: Aggr_Cust as Stage, SALES_SUM as Database Table, SALES_SUM as BI Report Collection, Sales Analysis as BI Report, and Query1 in BI Report Set

Using customized ad-hoc reports

Ad-hoc reporting (queries in the Metadata Workbench) enables users and administrators to define, preserve, and report across information assets. Ad-hoc reports are also used to filter query results and export them to a file. The IBM InfoSphere Metadata Workbench delivers several pre-built queries that report across varied asset types and their relationships. However users can build and manage their own queries, which they can publish and make available for all IBM InfoSphere Metadata Workbench users.

Assume that a financial expert has no appreciable skills in development tools (BI reporting or ETL), but he requires more transparency about the calculation formulas used in BI reports or ETL jobs. Figures 29 and 30 show two customized ad-hoc reports that can help him.

Figure 29. Ad-hoc report for BI reports, including their fields, expressions, and data types
Screen cap: Field details of Sales Analysis BI Report
Figure 30. Ad-hoc report for DataStage jobs, their stages, fields, and expressions
Screen cap: Output Column Name showing Load Time and Load Facts

Because query results can be saved in data format (CSV) or report format (XLS), he saves his results and enhances them in a spreadsheet, as shown in Figure 31.

Figure 31. Save report as a spreadsheet
Screen cap: Save Report As prompt

Conclusion

IBM InfoSphere Information Server is a data integration software platform that helps organizations derive more value from the complex, heterogeneous information spread across their systems. The Integration of Cognos Business Intelligence Reports and IBM InfoSphere Business Glossary, for example, enables business and IT users to search a common vocabulary for terms in the Cognos BI report. Data lineage reports show how information flows through DataStage stages or columns, through physical data sources and into BI report fields. Impact analysis reports show the dependencies between assets. Ad-hoc reporting enables users to obtain customized information out of the metadata repository.

Acknowledgments

Special thanks to Hayden Marchant from IBM Metadata Development at the Israel Software Lab for his excellent Metadata Workbench support. Thanks also to Marc Haber from IBM Software Group Israel and Eberhard Hechler from the European Information On-Demand Technical Center of Excellence for reviewing this article.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Big data and analytics
ArticleID=415710
ArticleTitle=Integrating heterogeneous metadata
publish-date=07232009