Establish an information governance policy framework in InfoSphere Information Governance Catalog
Using policies and rules
Before you start
Learn how to apply pre-built policy and rule content for InfoSphere Information Governance Catalog (IGC) to get an information governance initiative under way.
With the substantial growth in data volume, velocity, and variety comes a corresponding need to govern and manage the risk, quality, and cost of that data and provide higher confidence for its use. This is the domain of information governance, a focal area for InfoSphere Information Server. This is a broad topic, and further details on information governance practices and solutions can be found in the Resources.
IGC provides a meaningful directory of governed information. It supports this through a metadata repository that can include governed business vocabulary (a business glossary); semantic policies and governance rules; stewardship assignment for information domains; a catalog of information assets; relationships and linkage across the vocabulary and assets; and a range of tools to understand those relationships, including impact analysis, business and data lineage, and queries and reports.
By taking advantage of these capabilities, organizations can:
- Enable information governance
- Semantic policies and rules enable precise communication of governance requirements.
- Common language streamlines information development for business requirements.
- Stewardship at all levels of the information supply chain.
- End-to-end data lineage and impact analysis.
- Support accountability and responsibility
- Assign stewards as single point of contact.
- Link between business metadata and technical metadata to ensure compliance.
- Improve information accessibility
- Administrators can tailor the tool to the needs of business users.
- Access enterprise information you need when you need it.
- Use and reuse information assets based on a common semantic hub.
- Enable collaboration
- Capture and share annotations between team members.
- Greater understanding of the context of information.
- More prevalent use and reuse of trusted information.
One challenge organizations find with new information governance initiatives is establishing a foundation and structure to get under way, particularly with sufficient components to understand how the various parts of the catalog fit together and can be utilized to support their governance initiatives.
In this tutorial, you learn how to install the pre-built package of IGC terms, policies, and governance rules to jump-start your information governance efforts. We will show what the available content includes, how to install the content for immediate use, and how to delve deeper into usage of the content.
Overall the goal of this package is to provide insight on how to leverage business information in an information governance context, specifically to:
- Provide base content to facilitate getting started with:
- Working models with relevant assets.
- Working models that span components of information governance.
- A framework for constructing and expanding information governance.
- Examples for educational purposes among members of your teams.
- Recommend approaches for policy and rule entry and creation.
- Use the policy tree and referenced rules, including browse and search.
- Incorporate naming standards plus required or desired attributes.
Note that the content package is not intended as a full end-to-end solution and does not reflect all possible requirements for information governance.
This tutorial is written for users of InfoSphere Information Server V11.3 who are learning or are familiar with InfoSphere Information Governance Catalog and its use.
To utilize the pre-built content, you need an InfoSphere Information Server V11.3 platform with Information Governance Catalog installed. The following patch release for IGC available on IBM Fix Central should also be installed before importing the pre-built content: is113_IGC_ru5_server_client_multi.
IGC is an interactive browser-based tool that enables users to create, manage, and share an enterprise vocabulary and classification system along with a framework for understanding and managing information governance policies and requirements and data stewards, a repository of metadata assets (such as the database tables that hold key business data), and a query capability to report on relationships within the catalog.
A business glossary is designed to help users understand business language and the business meaning of information assets like databases, jobs, database tables and columns, and business intelligence reports. In addition to categories and terms, the catalog also contains information about other assets such as database tables, jobs, and reports in the metadata repository.
A catalog of information governance policies and rules provides an interactive environment to communicate precise intent for how information must be managed throughout its life cycle. Such policies may represent government regulations, corporate standards, or line-of-business processes at a broad level. The governance rules provide detail to describe the specific requirements, the terms or assets that are governed, and the assets that implement the requirements.
Underlying the information governance capabilities are metadata assets. These include the noted glossary terms, governance policies and rules, the representation of databases with their tables and columns, logical and physical data models, applications, business intelligence reports, data integration (ETL) jobs, and many other assets. Impact analysis and end-to-end data lineage across data sources and assets allows users of the catalog to understand the relationships between the business language and the technical implementations, a critical part of the information governance story.
The material in this package includes:
- A sample glossary to facilitate understanding information governance concepts.
- An information governance policy structure with associated policies and governance rules.
- A set of metadata queries to review contents and their relationships, and allow you to start defining approaches for information governance policy development and administration.
The glossary provided for the IGC base content package is derived from larger industry-specific models available from IBM, but is focused on a limited subset for the specific subject areas of person, location, and customer information. Subsequent versions and releases of this content may expand on these dimensions.
This tutorial describes the steps to import the information governance content, and potential usage of the content detailed further in the included PDF document (see Downloadable resources).
IGC package content
The IGC Base Content package consists of a compressed (.zip) file named IGC_OOTB_v1.zip containing a series of associated assets, including:
- POLICIES — A policy framework for organizing information governance policies plus example governance policies and rules across three focus domains: Master Data Management subjects, data privacy, and information quality.
- SUBJECT CONTENT — A set of categories and terms that include person-, location-, and customer-related terms; and information governance-related terms.
- RELATIONSHIPS — Relationships that span the above artifacts, including policies to rules and rules to terms and assets.
- QUERIES — A set of queries that allow you to see some of the relationships and connections between the artifacts.
The contents are stored as XML files, which are imported into the IGC through the Administration tab of the UI by someone with IGC administrator privileges. The files:
Importing IGC content
IGC content is imported with its import function. To import the business glossary content, you must have IGC administrator privileges. For more examples of how to import content into the catalog, see Importing and exporting glossary content of the catalog.
The subsequent import steps assume that you download, extract, and save the content XML files to wherever the IGC browser can access them.
To perform the import of the IGC terms content:
- Open the IGC and select the Administrator tab.
- Select Tools > Import.
- Choose XML as the type of file to import and click Next, as shown below.
- Choose the Merge option (if other glossary content exists, it is recommended you choose the Ignore option to avoid overwriting someone else's work) and click Next.
- Browse to the directory location of the IGC-governance-base-xml-export-terms-2014-09-23.xml file and click Import.
- Review the import summary as shown in Figure 2. There should be 37 categories and 195 terms.
- Click Close.
Figure 1. XML file selection
Figure 2. Summary of terms import
To perform the import of the IGC policy and rules content:
- Repeat the process for the IGC-governance-base-xml-export-rulesassets-2014-09-23.xml file and Click Import.
- Review the import summary as shown below. There should be 72 policies, 110 governance rules, and updates to 62 terms.
- Click Close.
Figure 3. Summary of policies import
Review the imported IGC content
After import, you can browse the glossary and review the sample content.
Review the categories and terms
From the Catalog tab, select the Glossary tab, then choose Browse Category Hierarchy. Depending on your environment, your glossary may contain other content, but you should find two categories: one called Business Information and one called Information Governance.
- Business Information — This category centers on terms for person, location, and general transactions for a person acting as a customer, and provides example insight into the relationships available for terms within the glossary.
- Information Governance — This category focuses on categories and terms relevant for information governance and provides insight into useful information governance concepts particularly the classification of information important in discerning key governance focal areas.
The Business Information category, for instance, contains example content for Calendar, Customer, Location, Organization, Payment Card, Person, and Transaction.
Figure 4. Business information category
Expand the view for a category such as location information, then select one of the subcategories such as physical address. This will highlight the category overview as well as list the associated terms. You can then select one of the associated terms such as street address and review the description and general information provided as shown below.
Figure 5. Business information term
The business information terms provide examples of the types of relationships available in the glossary. Relationships expand on the understanding of a given term such as whether it has or encompasses other terms; is a specific type of term; or simply falls within a common category of terms. For instance, the term Street Address illustrates a number of these relationships:
- Street Address is a term in the category Physical Address, which in turn is part of the category Location Information. Categories are a natural organization of related terms.
- Street Address is governed by two governance rules (e.g., address must be validated and verified against a postal reference source). It is a bi-directional relationship, so if it is set in one location (whether in the term or the governance rule), it is visible in both.
- Street Address is a type of Address. This is the converse of the has types relationship and is set bi-directionally. The term Address has two types in this content set: Street Address and Box Address. Address provides the broad term, but Street Address and Box Address provide more specific terminology for these mutually distinctive terms.
- Street Address also uses the has a relationship. Has a describes components included in a larger term. In this instance, Street Address has a City, State, Postal Code, and Country (as well as several other components). The converse of this relationship is the is of relationship.
You can continue to review other terms. Initially, the terms are not linked to any associated assets, but as such content becomes available in the metadata repository, it is possible to connect or assign the terms to assets to get a broader understanding of what data is associated with key business concepts.
Review the governance policies and rules
From the Catalog tab, select the Glossary tab, then choose Browse Policy Hierarchy. Depending on your environment, your glossary may contain other content, but you should find five high-level policies, as shown below.
Figure 6. Governance policy hierarchy
These top-level policies in the policy tree align with the main groups of policies outlined for an information governance program:
- Information governance approaches
- Standards adopted by the information governance program to increase consistency, reduce discrepancies, and remove unnecessary processing. These could include, for example, the practices and processes used to manage the Information Governance Catalog or to monitor assignment of data stewards to terms or assets.
- Information governance delegations
- The set of core policies delegated to another governance domain (for example, Audit or Risk Management governance domains). For example, the validation of fraud reporting might be considered part of information governance, but in your organization may be part of the fraud and risk management department, so these policies are considered delegated to that area.
- Information governance domain policies
- The set of core policies that cover the basic information domains of the business (for example, Customer, Employee, Product). These information governance policies may fall primarily within a specific line of business, but because they span multiple points in the business, they are considered core information domains that must be included in the information governance program.
- Information governance obligations
- The set of core policies
delegated to the Information Governance organization from
other governance focus areas and domains, including:
- Corporate requirements — Obligations defined between one or more groups within an organization (for example, sales and IT or security enforcement over data stores).
- Government regulations — Mandated laws and requirements for an organization from a national entity or its departments and agencies.
- Industry standards — Formalized standards, often from a standards body, that provide best practices, but not mandated, guidance for a specific topic.
- Service-level agreements — Obligations to meet a specified service level (for example, delivery of data by 10 p.m.).
- Third-party contracts — Contracted obligations between an organization and other third parties.
- The set of core policies delegated to the Information Governance organization from other governance focus areas and domains, including:
- Information governance principles
- Principles define the high-level goals and approaches of the information governance program. Such principles are the overarching goals and directives of your information governance efforts and should be understood by everyone in your organization. Some of these principles may incorporate specific policies and rules, others will not.
You can review the policy tree and policies provided in more detail. Use the governance policy to summarize a specific organizational obligation, whether external or internal, with its objective. A policy should include a short identifiable name, a short description of its intent, a long description and a data steward before publication, and a custom attribute such as the included "Link to more information" for URL links to the actual policy for reference (many policies are simply too long to include all detail in the catalog). You may find it useful to add a label to associate related or transient links (for example, project or issuing agency).
Many of the policies will contain one or more governance rules. These governance rules may be declarations of how a policy's goals are to be achieved or discrete specifications of how some data will be processed, evaluated, monitored, or remediated to comply with a policy's goals. Governance rules provide linkage between the policies and associated terms and data assets. The governance rules include two relationships to support this linkage: Governs and Implemented by. The former relationship describes those terms and assets that fall under, or are governed by, the rule. The latter relationship describes assets used to actually implement the governance rule (as the governance rule is descriptive in nature, it cannot by itself be used to process, validate, monitor, or otherwise affect data). Use the governance rules to delineate specific requirements of the policy rather than putting those details in the policy. Generally, you should avoid embedding rules or requirements at the policy level as these cannot be linked to other catalog assets. You can usually recognize such rules by the use of action-oriented verbs: must be masked, must be validated, must be monitored, etc.
From the Catalog tab, select the Glossary tab, then choose Browse Policies. Scroll down (or to the next page) until you find the Know Your Customer (KYC) policy, and click the policy name to open it.
Figure 7. Information Governance policy — Know Your Customer
In this policy example, you can see a number of features of the policy and review associated governance rules:
- Know Your Customer is a policy specific to the domain of customer. The parent policy describes where it exists in the policy hierarchy (it can only have one parent, so you do have to determine the most logical location to place it).
- The policy includes a name, and short, and long descriptions. There is a link to an external reference, in this instance a Wikipedia reference. A link could be made to an accessible site internal to your organization instead.
- The policy references 25 specific governance rules. These are the details or requirements of the policy. For instance, the first rule listed is that address must be validated and verified against a postal reference source. If you click on this governance rule, you will find specifics of that rule such as its name and description. It may also contain references to implementations and terms that it governs (for example, the term Street Address that you reviewed in regards to term content).
You can continue to review other governance policies and rules and their relationships to get a broader understanding of how the components of the IGC connect with key business concepts. This set of information governance content, both terminology and policies, is a foundation allowing you to begin focusing on key roles, processes, or information areas while continuing to expand understanding of information governance through your organization.
Refer to the PDF included in the downloads for further discussion on the creation, development, and usage of the IGC content package.
Import the IGC queries
IGC provides the ability to query or report on all the content and relationships in its repository, including policies, governance rules, terms, and assets. These queries are powerful tools to help with policy administration, implementation, monitoring, and enforcement.
The subsequent import steps assume that you download, extract, and save the query content file IGC-governance-base-GovQueries-2014-09-24.wbq to wherever the IGC browser can access it. To import the catalog queries content, you must have Information Governance Catalog Glossary Administrator or Information Asset Administrator privileges.
To perform the import of the IGC query content:
- Open the IGC and select the Catalog tab, then the Queries tab.
- Click Import.
- Browse to the directory location of the IGC-governance-base-GovQueries-2014-09-24.wbq and click Import.
- Review the imported query list as shown below. There should be at least 10 queries present, although there may be others depending on your environment.
Figure 8. Imported queries
The queries provide a means to search and present information pertinent to your information governance initiative. The functionality can provide details as simple as terms within categories (the Glossary Categories and Terms query) or more complex output with specific filters, such as finding policies do not yet have associated rules (the Governing Policies without associated Rules query). The queries can become an active part of how you implement your information governance program and your tools to monitor the environment.
In this tutorial, you have learned how to import and review content for the InfoSphere Information Governance Catalog that can help you jump-start an information governance initiative. You can now apply this knowledge to develop and use relevant governance terms, policies, and rules based on your needs. For additional usage of the IGC content, please review the document IBMInfoSphereInformationServer_IGC_OOTB_Usage_v1.pdf included (see Downloadable resources).
- "The data governance story: Building a business language glossary"
- IBM InfoSphere Information Governance Catalog support Start your free trial to catalog your data, understand its meaning and track its usage all in one place
- "Develop an iOS application with the InfoSphere Business Glossary REST API" is a guide to using the glossary in custom iOS applications.
- "Develop an Android application with the InfoSphere Business Glossary REST API" is a guide to using the glossary in custom Android applications.
- "Use InfoSphere Business Glossary to define a common business language among modeling tools" explains using the glossary to support data modeling.