Managing metadata

The metadata repository of IBM® InfoSphere® Information Server stores metadata from suite tools and external tools and databases and enables sharing among them. You can import metadata into the repository from various sources, export metadata by various methods, and transfer metadata assets between design, test, and production repositories.

The metadata repository

The single metadata repository provides users of each suite tool with a common understanding of the structure of the data that flows through the tools of the InfoSphere Information Server suite. With a shared repository, changes that are made in one suite tool are automatically and instantly visible throughout the suite.

The single repository ensures that you can use a database table that is imported from a database or design tool in the following ways, among others:
  • For analysis in IBM InfoSphere Information Analyzer
  • To create mappings in IBM InfoSphere FastTrack
  • To create table definitions in an IBM InfoSphere DataStage® and QualityStage® job
The same table can also be assigned a term and a steward in InfoSphere Information Governance Catalog. The table can also be part of a data lineage report that links it to the original database design, to the job that uses the table, and to the business intelligence (BI) report that is based on the table.
The metadata repository shares, stores, and reconciles a comprehensive spectrum of metadata:
Business metadata
Provides business context for information technology assets and adds business meaning to the artifacts that are created and managed by other IT applications. Business metadata includes glossary terms, stewardship, and examples.
Operational metadata
Describes the runs of IBM InfoSphere DataStage and QualityStage jobs, including rows written and read, and the database table or data files that are affected. You can use InfoSphere Information Governance Catalog to create data lineage reports that combine design and operational information.
Technical metadata
Provides details about the following types of assets:
  • Implemented data resources, including host computers, databases and data files, and their contents. The assets can be imported from a design tool, a database, or a BI tool.
  • Profiling, quality, and ETL processes, projects, and users, including jobs and projects that are created in InfoSphere DataStage and QualityStage and analyses from IBM InfoSphere Information Analyzer.
  • BI report and model metadata that is imported by MetaBrokers and bridges from BI tools such as IBM Cognos® and BusinessObjects.
The metadata repository is an IBM WebSphere® J2EE application. The repository uses standard relational database technology (such as IBM DB2® or Oracle) for persistence. These databases provide backup, administration, scalability, transactions, and concurrent access.

Importing and exporting metadata

InfoSphere Information Server offers many methods of importing metadata assets into the metadata repository. Some methods include the ability to export metadata from the repository to other tools, files, or databases. InfoSphere Metadata Asset Manager imports assets into the metadata repository by using bridges and connectors.

Connectors, operators, and plug-ins
InfoSphere DataStage and QualityStage use connectors, operators, and plug-ins to connect to various databases to extract, transform, and load data. InfoSphere Information Analyzer and InfoSphere FastTrack use connectors to access databases. In all cases, metadata about the implemented data resources, including host, database, schemas, tables, and columns, is stored in the metadata repository for use by other suite tools.
InfoSphere Metadata Integration Bridges
Bridges let you import metadata into the metadata repository from external applications, databases, and files, including design tools and BI tools. Some bridges can also export metadata. You can import many types of metadata, including the following:
  • Hosts, databases, schemas, stored procedures, database tables, database columns, and foreign keys
  • Data files, data file structures, data file fields
  • BI reports, models, and their contained assets
  • Logical data models and physical data models from design tools such as CA ERwin and IBM InfoSphere Data Architect
  • Users and groups to designate as stewards for assets in the metadata repository
Exchange of XML and CSV files
Several suite tools provide interfaces for import and export of XML and comma-separated values (CSV) files that contain metadata of different types:
  • You can use InfoSphere Information Governance Catalog to import extension mapping documents and extension data sources that capture information about processes and data sources from tools, scripts, and other programs that do not save their metadata to the metadata repository. You can also use the catalog to import glossary content, including categories, terms, and relationships to other assets.
  • You can use InfoSphere FastTrack to import and export mapping specifications in CSV format.

Browsing, analyzing, and deleting repository metadata

Users of each suite tool can browse and select the types of metadata assets that the tool uses. For example, users of InfoSphere DataStage and QualityStage can select jobs and the table definitions and stages that are used by jobs. Several tools provide a wider view of the contents of the metadata repository:
  • Users of InfoSphere Information Governance Catalog can browse and query the full spectrum of assets in the repository and run data lineage and impact analysis reports. Users can also find and browse assets of many types to assign terms to the assets or designate stewards or the assets.
  • By using the repository management functionality of InfoSphere Metadata Asset Manager, you can browse all implemented data resources, logical data model assets, physical data model assets, and BI assets in the metadata repository. You can delete or merge duplicate assets.

Moving assets between metadata repositories

After you have developed and tested your jobs and processes, you can move them to a production environment. You can use the istool command line to move assets from one InfoSphere Information Server repository to another. For example you can move assets from a development environment to a test environment, and from a test environment to a production environment.

By using the command line, you can move multiple types of assets and the relationships between them:
  • Jobs and projects from InfoSphere DataStage and QualityStage
  • Categories, terms, and stewards from InfoSphere Information Governance Catalog
  • Analysis summaries, projects, and metrics from InfoSphere Information Analyzer
  • Mapping specifications from IBM InfoSphere FastTrack
  • Implemented data resources, including metadata for databases, schemas, tables, columns, and data files.
  • Logical data model assets and physical data model assets.
  • BI metadata, including BI reports, BI models and their contained assets.
  • InfoSphere Information Server users, roles, and reports
The following tools also have user interfaces for moving assets between metadata repositories:
  • InfoSphere DataStage and QualityStage
  • InfoSphere Data Architect
  • InfoSphere Information Governance Catalog
  • InfoSphere Information Analyzer

Scenario for metadata management

The comprehensive metadata management capability provides users of InfoSphere Information Server with a common way to deal with descriptive information about the use of data. The following scenarios describe uses of this capability.

Business analytics
A large, for-profit education provider needed to devise a strategy for better student retention. Business managers needed to analyze the student life cycle from application to graduation in order to direct their recruiting efforts at students with the best chance of success.

To meet this business imperative, the company designed and delivered a business intelligence solution using a data warehouse. The warehouse contains a single view of student information that is populated from operational systems.

The IT organization uses InfoSphere Information Server and its metadata repository to coordinate metadata throughout the project. Other tools that are used include Embarcadero ER/Studio for data modeling and IBM Cognos for business intelligence. The reports that are produced show an accurate view of student trends over the lifecycle from application to graduation.

The consumers are able to understand the meaning of the fields in their BI reports by accessing the business definitions in InfoSphere Information Governance Catalog. This enables them to identify key factors that correlate student characteristics and retention. They are also able to understand the origin of data in the reports by using business lineage, which enables them to trust the sources and flow of the data that they are looking at. The net result is the ability to make better decisions with more confidence, allowing the education provider to design and implement effective initiatives to retain students.