Enterprises and organizations deal with huge volumes of data on a daily basis. This data drives many day-to-day activities such as business decisions and regulatory reporting. In order for you to use data effectively, your IT systems must provide visibility into your metadata. This visibility leads to increased trust in data reliability, increased agility, and improved common understanding throughout your enterprise.
Some of the common problems faced by businesses and technical users in regard to understanding data are:
- Lack of trusted information
- Where does the data come from?
- How reliable is it?
- Lack of agility
- Was the data modified recently?
- If I change the data, will other systems be affected?
- Lack of common semantics
- Do "customer" and "client" conceptually mean the same thing?
- What should I call a specific entity so that the rest of the enterprise understands what I mean?
These problems can be solved through an enterprise architecture that integrates business and technical metadata from various sources. At a high level, metadata refers to information about data, such as its definition, structure, source, etc. For a more detailed description of metadata, refer to the "Integrating heterogeneous metadata" developerWorks article that is linked to in the Resources section.
A typical data integration architecture involves data from various source systems being extracted, transformed, and loaded (ETL) into data warehouses, marts, and cubes. To satisfy the needs of business users, the data is further integrated by reporting tools and visually presented as dashboard views. Technical users, on the other hand, work with the IT aspects of the integration. Based on this architecture, metadata integration (which is the solution to the issues in context) deals with integrating the following technical and business data:
- Data about what source systems were used to populate the data warehouse, marts, and cubes
- Data about what ETL jobs were used to perform the transformations
- Data about what data mart tables were used to populate a specific report
- Data about what specific terms in a report mean (business metadata)
The key to achieving such integration is to build an enterprise-level metadata repository that acts as a single source of truth for all metadata requirements. When such a repository is in place, different applications and users can link to this repository to build lineage and traceability solutions that help answer the questions listed in the previous section.
Figure 1 shows the outline of an enterprise metadata integration solution. The labels above the dotted lines refer to the various types of metadata (report names and packages, database tables, ETL jobs, and data source names).
Figure 1. Enterprise metadata integration solution
Note: The figure above shows the technical perspective of the integration architecture. Therefore, it does not show business metadata being populated into the metadata repository.
The solution described in this tutorial is based on IBM InfoSphere Information Server V8.1, Cognos 8 Business Intelligence (BI) V8.4, and Import Export Manager V8.1, fixpack 1.
IBM InfoSphere Information Server is a data integration software platform that helps organizations derive useful information and value to business from simple or complex data sourced from multiple systems. The platform is composed of various components that profile, cleanse, transform, and integrate data, in order to deliver useful and meaningful information. The components of InfoSphere Information Server V8.1 that this tutorial uses to demonstrate metadata integration are:
- Metadata repository
- InfoSphere Metadata Workbench (referred to in the rest of this tutorial simply as Workbench)
- Import Export Manager
- InfoSphere Business Glossary
- InfoSphere DataStage
The InfoSphere metadata repository acts as a centralized data store for all the metadata that is available across the various other components such as DataStage and Business Glossary. Workbench plays a critical role in establishing the automated link between the data from various sources and provides useful lineage reports and impact analysis details. The Import Export Manager component consists of bridges (MetaBroker is also a bridge) that help in importing and exporting metadata to and from the metadata repository. It supports various data sources such as data files, database tables, data models, business glossary, Cognos reports, etc.
Figure 2 shows the relationship between the business user and technical user perspectives of the Cognos BI and InfoSphere integration.
Figure 2. Cognos BI and InfoSphere integration
The integration means that business users can use web links to navigate to the business glossary and lineage reports from the business reports.
For technical users, the integration means that:
- The report metadata from Cognos BI, ETL job metadata from DataStage, and the warehouse metadata from Database, need to be mapped to each other to enable traceability through lineage reports.
- The glossary data needs to be populated in the metadata repository to enable the glossary integration.
- For the linkage between the Cognos BI reports and the InfoSphere tools (Glossary and Workbench) to work, Cognos BI must be configured with the right URIs.
This tutorial explains how you can implement this process of data mapping and configuration by following these steps:
- Use Import Export Manager to bring metadata about data files, data tables, business terms, reports, and models into Workbench.
- Establish manual and automated links between the metadata.
- Configure Cognos BI for InfoSphere integration.
The first step to integrate your metadata is to pull the metadata into the repository. MetaBrokers and bridges play a critical role in importing metadata from various tools into the metadata repository. You can also use these tools to export metadata from the repository to InfoSphere Information Server.
There are various types of bridges available for importing and exporting metadata for database tables, reports, models, user information, etc. This tutorial shows you how to use these bridges to import metadata into Workbench.