Mergers and acquisitions, globalization, competitive pressures, and various other factors drive the need for companies to effectively leverage their information and gain more business value out of it. But often these business conditions offer a wide range of information-related challenges. They deal with massive volumes of data, often without much insight into the content, quality, and structure of that data. Complex business transactions from customers and partners plus operational information moving within the enterprise are often the basis upon which key business decisions are made. These decisions are often undermined by the lack of insight and understanding of the data. Common questions arise as a result of this lack of insight and understanding, including:
- Where is my customer data?
- How do I find the information I need?
- What does this information mean in the current business context?
- Can I trust this information?
Additionally, the lack of a common understanding of the relevance of information between technical and business users affects the usefulness of delivered information.
IBM InfoSphere Information Server is a data integration software platform that helps organizations derive more value from the complex, heterogeneous information spread across their systems. By providing automated data profiling and analysis capabilities necessary to unlock the mystery of source data content, quality, and structure, it builds the foundation for enabling metadata-driven integration.
This article introduces the idea of metadata and demonstrates the metadata integration capabilities of the IBM InfoSphere Information Server. Usage scenarios demonstrate integrating Cognos Business Intelligence reports and IBM InfoSphere Business Glossary as well as using data lineage, impact analysis, and customized metadata ad hoc reports.
What is metadata?
Figure 1. A food can that shows metadata
In an IT context, metadata can describe the design, development, and implementation of information systems, and it can describe the movement of data. For example, the following are considered metadata:
- The description of a database table
- The relationships between database objects
- The definitions of the columns in the table
- The data types of the table's columns
- The derivations that transform the columns
Metadata enables you to put context and meaning to things. It is generated and used by each organization and software product.
Types of metadata
Metadata can be categorized into three types:
- Business metadata
- Business metadata is critical to providing context for an integration project. It helps define terms in every-day language, without regard to technical implementation. For example, the language used to describe what a customer is and how to categorize a customer is often business-specific, and it might differ between company divisions.
- Examples: business rules, stewardship, business definitions, auditing terminology, glossaries, algorithms, and lineage using business language
- Audience: business users
- Technical metadata
- Technical metadata is often used by more technical staff, such as developers. It includes items such as table definitions and data types. These objects are used frequently during the application design and development process.
- Examples: definition of source and target systems, their table and fields structures and attributes, documentation for auditing derivations and dependencies
- Audience: specific tool users (BI, ETL, profiling, modeling)
- Operational metadata
- Operational metadata refers to the metadata generated and captured when a process executes. It allows administrators to manage the system and ensure things are running smoothly. Operational metadata also helps administrators troubleshoot issues if there is a problem with a process.
- Examples: information about application runs, including their frequency, record counts, component-by-component analysis, and other statistics for auditing purposes
- Audience: operations, management and business users
Unifying these types of metadata creates an end-to-end relationship, enabling users to understand not just where information is stored and what happened to it as it moved through the organization, but also the business context of that information.
Using IBM InfoSphere Information Server to integrate metadata
IBM InfoSphere Information Server combines the technologies within the IBM Information Integration Solutions portfolio into a single, unified platform that enables companies to understand, cleanse, transform, and deliver trustworthy and context-rich information. This unified platform delivers shared metadata that is leveraged across the various components that make up InfoSphere Information Server. Several of these components play a key role in metadata integration. The subset of components related to the scenarios that are shown in this article are:
- IBM InfoSphere Business Glossary. IBM InfoSphere Business Glossary enables business analysts and subject-matter experts to create, manage, and share a common enterprise vocabulary and classification system. This enables users to link business terms to more technical artifacts that InfoSphere Information Server manages. These business terms are shared with InfoSphere Data Architect, InfoSphere Information Analyzer, and InfoSphere FastTrack, creating a common set of semantic tags for reuse by data modelers, data analysts, business analysts, and end users. IBM InfoSphere Business Glossary Packs for Industry Verticals accelerate the implementation and deployment of Business Glossary by immediately providing rich, relevant, and industry-specific glossary content to any business. They are currently available for the Financial Markets & Banking, Insurance, Healthcare, Telecommunication, and Retail industries.
- IBM InfoSphere Metadata Workbench. IBM InfoSphere Metadata Workbench offers key metadata visualization and exploration capabilities for Information Server, modeling, and business intelligence metadata assets. Acting as a control station, the workbench provides cross-tool impact analysis and data lineage, as well as extensive reporting and administration functions for Information Server.
- Bridges and MetaBrokers in IBM InfoSphere Import Export Manager. Bridges and MetaBrokers in IBM InfoSphere Import Export Manager convert metadata from one format to another by mapping the elements to a standard model that translates the semantics of the source tool into the semantics of the target tool. They are used to import metadata from a particular tool, file, or database into the metadata repository of InfoSphere Metadata Server. They are also used to export metadata from the metadata repository to tools from an independent software vendor. See Resources for a list of bridges that are supported by IBM InfoSphere MetaBrokers and Bridges, Version 8.1.
- IBM InfoSphere Information Analyzer. InfoSphere Information Analyzer evaluates the content and structure of your data for consistency and quality, and it helps to improve the accuracy of your data by making inferences and identifying anomalies.
In the scenarios in this article, these components are used as follows:
- IBM InfoSphere Business Glossary. The IBM InfoSphere Business Glossary Pack for Retail is imported into IBM InfoSphere Business Glossary in the form of a Glossary Archive (XMI) file.
- Bridges and MetaBrokers in IBM InfoSphere Import Export Manager. The Cognos metadata have been imported using IBM InfoSphere Import Export Manager (Cognos 8 BI Reporting - Content Manager Import Bridge) into the metadata repository of IBM InfoSphere Information Server.
- IBM InfoSphere Information Analyzer. The content and structure of the database tables for the Sales Analysis project are evaluated and published by InfoSphere Information Analyzer and can be regarded in the Table Analysis view in IBM InfoSphere Business Glossary.
- IBM InfoSphere Metadata Workbench. IBM InfoSphere Metadata Workbench is the main tool for metadata visualization and exploration.
Integrating Cognos Business Intelligence Reports and IBM InfoSphere Business Glossary
This scenario demonstrates the direct link from a Cognos report to IBM InfoSphere Business Glossary. A business user reading his Cognos BI report directly invokes IBM InfoSphere Business Glossary in order to get more information about some terms in the report. With the help of the Business Glossary browser, he not only gains an understanding of the used terms, but he also receives more information, such as synonyms, abbreviations, related terms, and IT assets, as shown in Figure 2. He also discovers the responsible data steward and his contact details.
Figure 2. Cognos Sales Analysis Report
Click here for a larger view of Figure 2.
Studying his sales analysis report, a business user analyzes Revenue by Customer, Revenue by Product, and Customer Revenue Ranking. But when reading the Customer Target Attainments, he is not absolutely sure about the meaning of the column titles YTD, FYT, and TAP. Because Cognos was configured to directly access IBM InfoSphere Business Glossary or IBM InfoSphere Metadata Workbench data lineage out of a Cognos report, he immediately invokes IBM InfoSphere Business Glossary, as shown in Figure 3.
Figure 3. Invoking Business Glossary
He sees a short description for YTD: Year to Date
Figure 4. Business Glossary entry window
Drilling down for more information, the business user opens the Business Glossary browser to obtain more detailed information, such as a long description, the responsible data steward, synonyms, abbreviation, and related IT assets. For example, he discovers that the term Year to Date (YTD) is also related to the terms Full Year Target (FYT) and Target Attainment Percentage (TAP), as shown in Figure 5. These terms are also used in his sales analysis report.
Figure 5. Business Glossary term details
Click here for a larger view of Figure 5.
The business user also wants to know who the responsible Data Steward is, which assets she manages, and how to contact her. This information is in the Business Glossary user details window, as shown in Figure 6.
Figure 6. Business Glossary user details window
He decides to email the data steward by clicking on her email address, as shown in Figure 7.
Figure 7. Business Glossary email contact
He is also interested in more information about the database table on which the BI report is based and on its content and quality. First he looks at the underlying database summary table. From the Assigned Assets section in the Term Details window (Figure 5), he drills down into database table SALES_SUM to see detailed information about the database columns and their primary keys, as shown in Figure 8.
Figure 8. Database Table Details
From the Analysis section, the business user drills down into the Table Analysis Summary Details, which provide information about the content and structure of the SALES_SUM table relating to consistency and quality, such as number of fields, number of rows, and whether there are any primary key duplicates or foreign key violations. IBM InfoSphere Information Analyzer published this information, as shown in Figure 9.
Figure 9. Table Analysis Summary Details
Creating data lineage reports using the IBM InfoSphere Metadata Workbench
Metadata workbench users can create reports that analyze the flow of data from data sources, through jobs and stages, and into databases, data files, and business intelligence reports. The data lineage report shows the movement of data within a job or across multiple jobs, and it shows the order of activities within the run of a job.
Suppose a technical IT staff member has been assigned new responsibility for the Sales Analysis project. New enhancements for this project are expected any day, and he has very limited time to learn the details. With the help of the Metadata Workbench, he can come up to speed quickly.
As described in the following sections, the staff member begins by taking a look at the metadata of the Sales Analysis report in the Metadata Workbench. He explores a graphical view of all related objects, and he looks through the detailed metadata coming from the Cognos BI report. Then he examines the data flow representing where the data came from that the Metadata Workbench data lineage report offers for the BI report. Then he studies the details about ETL jobs and operational metadata.
Metadata from the Cognos BI report
The staff member completes the following steps to look at the metadata of the Sales Analysis report in the Metadata Workbench.
- The staff member opens the Metadata Workbench, as shown in Figure 10.
Figure 10. Metadata Workbench welcome page
- From the Sales Analysis BI Report entry, he invokes the Graph View. He sees an overview of the responsible data steward, the related terms, BI report collections, report fields, and database tables, as shown in Figure 11.
Figure 11. Graph View for analysis report
Click here for a larger view of Figure 11.
- The staff member opens the details for the Sales Analysis BI Report and sees detailed information about BI report collections and report fields, database tables, stewards, and terms, as shown in Figure 12.
Figure 12. BI report details
- He sees details about the report field TAP, including the definition of the term TAP in IBM InfoSphere Business Glossary, the calculation expression, and the underlying database columns, as shown in Figure 13.
Figure 13. BI report fields details
- The staff member sees more information about the database field TARGET and other details, such as the database, table, data type, length, uniqueness, nullability, related terms, and responsible data steward, as shown in Figure 14.
Figure 14. Database column details
Data lineage reports
After having gained insight into the BI report, report fields, and database columns, the staff member invokes a data lineage report called Where does the data come from? in order to explore the movement of data across multiple jobs.
He selects to examine an end-to-end perspective from the Sales Analysis BI Report to the Load_Sales_Sum ETL job in a graphical view, as shown in Figure 15.
Figure 15. Data lineage graphical view
Click here for a larger view of Figure 15.
Note: Because the BI report is at the top level of this scenario's perspective, only the Where does the data come from? lineage report is invoked. However, in scenarios with an object at a lower level, the Where does the data goto? report presents the flow of the data toward the top level.
The staff member sees a complete overview of the data flow underlying the BI report to obtain information about the BI report model, the database tables, and the ETL jobs from the Where does the data come from? data lineage report, as shown in Figures 16, 17, and 18.
Figure 16. Underlying BI report model
Figure 17. Underlying database tables
Note: The view in Figure 17 shows both database table metadata from a DB2 database DWORKS and table metadata from the BI report (database DWRKS_CG).
Figure 18. Underlying ETL jobs
The staff member examines all the processing steps underlying the summary table SALES_SUM, which is analyzed by the BI report. He calls data lineage from the SALES_SUM database table entry in the BI report details (see Figure 12) to see a list of all involved processing steps, as shown in Figure 19.
Figure 19. SALES_SUM asset list
To explore the end-to-end processing path from the first input file with the customer transactions up to the population of the summary table, he clicks Display Final Assets, and he selects the Sequential_Transactions stage, as shown in Figure 20.
Figure 20. Final assets for summary table SALES_SUM
The staff member can now sustain a detailed processing chain from customer transactions input up to the population of the summary table, which explains the flow of the data by the processing stages, ETL jobs, and database tables, as shown in Figure 21.
Figure 21. Processing chain from customer transactions input up to the population of the summary table
The graphical view displays all end-to-end processing steps for all paths (including parallel paths) in one diagram, as shown in Figure 22.
Figure 22. Graphical view of processing chain
Click here for a larger view of Figure 22.
ETL job details and operational metadata
Finally, the staff member analyzes the metadata information about the last ETL job LOAD_SALES_SUM, populating the summary table. He pays special attention to the job design and operational information, as shown in Figure 23.
Figure 23. ETL job and job design
In the job operational information, he gains insight into the job runtime, job completion status, and job runtime parameters, as shown in Figure 24. The number of records read and written in each processing step helps the staff member trace whether the correct volume of data was processed. This operational information will be very helpful to debug problems later, such as inconsistent data or ETL job failure.
Figure 24. ETL job operational information
Understanding impact analysis reports
Impact analysis reports show the dependencies between assets. For example, a new business application requires that the length of a database field should be enlarged. Impact analysis offers a better understanding about other data files, database fields, jobs, and BI reports that this change will affect.
In the example scenario, a company merger requires the length of the database field TOT_CUST (total revenue per customer) to be increased from 8 to 10. The IT technical staff member examines the impact this change will have on other BI report objects, database columns, and columns within ETL jobs. The metadata are shown in Figure 25.
Figure 25. Metadata for database column TOT_CUST
By invoking the What depends on this asset? impact analysis report, the staff member sees the information about dependent BI report members, report fields, and report items, as shown in Figure 26.
Figure 26. The What depends on this asset? impact analysis report
By invoking the What does this asset depend on? impact analysis report, the staff member sees the information about stage and database columns that the database column TOT_CUST depends on, as shown in Figure 27.
Figure 27. The What does this asset depend on? impact analysis report
As the staff member gains in-depth insight into the impact of the database field length increase, he can give a better cost estimate for this modification.
In another example, the IT Operational Manager has been notified that an ETL job for loading the fact table has been aborted when reading its input in the processing stage DB2_Input_Facts. He needs a quick overview about the impact on other application components, including ETL stages, database tables, and BI reports. The What depends on this asset? impact analysis report delivers the needed information, as shown in Figure 28.
Figure 28. The What depends on this asset? impact analysis report on job abortion
Using customized ad-hoc reports
Ad-hoc reporting (queries in the Metadata Workbench) enables users and administrators to define, preserve, and report across information assets. Ad-hoc reports are also used to filter query results and export them to a file. The IBM InfoSphere Metadata Workbench delivers several pre-built queries that report across varied asset types and their relationships. However users can build and manage their own queries, which they can publish and make available for all IBM InfoSphere Metadata Workbench users.
Assume that a financial expert has no appreciable skills in development tools (BI reporting or ETL), but he requires more transparency about the calculation formulas used in BI reports or ETL jobs. Figures 29 and 30 show two customized ad-hoc reports that can help him.
Figure 29. Ad-hoc report for BI reports, including their fields, expressions, and data types
Figure 30. Ad-hoc report for DataStage jobs, their stages, fields, and expressions
Because query results can be saved in data format (CSV) or report format (XLS), he saves his results and enhances them in a spreadsheet, as shown in Figure 31.
Figure 31. Save report as a spreadsheet
IBM InfoSphere Information Server is a data integration software platform that helps organizations derive more value from the complex, heterogeneous information spread across their systems. The Integration of Cognos Business Intelligence Reports and IBM InfoSphere Business Glossary, for example, enables business and IT users to search a common vocabulary for terms in the Cognos BI report. Data lineage reports show how information flows through DataStage stages or columns, through physical data sources and into BI report fields. Impact analysis reports show the dependencies between assets. Ad-hoc reporting enables users to obtain customized information out of the metadata repository.
Special thanks to Hayden Marchant from IBM Metadata Development at the Israel Software Lab for his excellent Metadata Workbench support. Thanks also to Marc Haber from IBM Software Group Israel and Eberhard Hechler from the European Information On-Demand Technical Center of Excellence for reviewing this article.
- Learn more about Cognos Business Intelligence and Financial Performance Management.
- Learn more about IBM InfoSphere Information Server.
- Refer to this list of bridges that are supported by IBM InfoSphere MetaBrokers and Bridges, Version 8.1.
- Learn more about IBM Industry Models.
- Learn more about IBM InfoSphere Business Glossary Packs.
- Browse New to Information Integration to find out more about IBM software solutions for Information Integration.
- Read Understanding leads to trust: Sharing a common vocabulary across InfoSphere Foundation Tools for detailed information about InfoSphere Business Glossary, and learn how to import metadata from the Metadata Server into InfoSphere Data Architect or export metadata from InfoSphere Data Architect into the Metadata Server.
- Read the series The information perspective of SOA design to learn more about the use of the IBM InfoSphere Business Glossary, IBM InfoSphere Information Analyzer, and IBM InfoSphere Data Architect in SOA design.
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Check out the developerWorks blogs and get involved in the developerWorks community.