Getting started with data governance and catalog
How you implement data governance depends on the needs of your organization. You can implement data governance in a linear or iterative manner. You can rely on default features and predefined artifacts, or customize your solution.
To implement data governance, your organization might follow this process:
- Establish your business vocabulary
- Define rules to protect your data
- Curate and consolidate your data
- Share your data in catalogs
IBM watsonx.data intelligence provides the tools and processes that your organization needs to implement a data intelligence solution.
1. Establish your business vocabulary
To meet the challenges, your team needs to establish a business vocabulary by importing or creating governance artifacts that act as metadata to classify and describe the data:
- Before you can automate data privacy, your team needs to ensure that the data to control is accurately identified.
- Before you can analyze data quality, you need to identify the format of the data.
- To make data easy to find, your team needs to ensure that the content of the data is accurately described.
In this first step of the process, your governance team can build on the foundation of the predefined governance artifacts and create custom governance artifacts that are specific to your organization. You can create artifacts to describe the format, business meaning, sensitivity, range of values, and governance policies of the data. You can also use Knowledge Accelerators which offer pre-created, extensive, curated industry-specific vocabularies to improve data classification, regulatory compliance, self-service analytics, and other governance operations.
| What you can use | What you can do | Best to use when |
|---|---|---|
| Categories | Use the predefined category to store your governance artifacts. Create categories to organize governance artifacts in a hierarchical structure similar to folders. Add collaborators with roles that define their permissions on the artifacts in the category. |
You need more than the predefined category. You want fine-grained control of who can own, author, and view governance artifacts. |
| Governance artifacts | Use the predefined business terms, data classes, and classifications. Create governance artifacts that act as metadata to enrich, define, and control data assets. |
You want to add knowledge and meaning to assets to help people understand the data. You want to improve data quality analysis. |
| Knowledge Accelerators | Import a set of predefined governance artifacts to improve data classification, regulatory compliance, self-service analytics, and other governance operations. | You need a standard vocabulary to describe business issues, business performance, industry standards, and regulations. You want to save time by importing pre-created governance artifacts. |
2. Define rules to protect your data
In the next step of the process, your team defines rules to ensure compliance with data privacy regulations by controlling who can see what data. Your team creates data protection rules to define how to protect data in governed catalogs. Your team can use these data protection rules to mask sensitive data based on the content, format, or meaning of the data, or the identity of the users who access the data.
| What you can use | What you can do | Best to use when |
|---|---|---|
| Data protection rules | Protect sensitive information from unauthorized access in governed catalogs by denying access to data, masking data values, or filtering rows in data assets. Dynamically and consistently mask data in governed catalogs at a user-defined granular level. |
You need to automatically enforce data privacy across your governed catalogs. You want to retain availability and utility of data while you also comply with privacy regulations. |
| Policies and governance rules | Describe and document your organization’s guidelines, regulations, standards, or procedures for data security. Describe the required behavior or actions to implement the governance policy. |
You want the people who use the data understand the data governance policies. |
3. Curate data to share in catalogs
Data stewards curate high-quality data assets in projects and publish them to catalogs where the people who need the data can find them. Data stewards enrich the data assets by assigning governance artifacts as metadata that describes the data and informs the semantic search for data.
| What you can use | What you can do | Best to use when |
|---|---|---|
| Metadata import | Automatically import technical metadata for the data that is associated with a connection to create data assets. | You need to create many data assets from a data source. You need to refresh the data assets that you previously imported. |
| Metadata enrichment | Profile multiple data assets in a single run to automatically assign data classes and identify data types and formats of columns. Automatically assign business terms to assets and generate term suggestions based on data classification. Rerun the import and the enrichment jobs at intervals to discover and evaluate changes to data assets. |
You need to curate and publish many data assets that you imported. |
| Data quality analysis | Run data quality checks on your data sets to scan for quality issues in your data. Continuously track changes to content and structure of data, and recurringly analyze changed data. |
You need to know whether the quality of your data might affect the accuracy of your data analysis or models. Your users need to identify which data sets to remediate. |
| Data lineage | Track, visualize, transform, and optimize your data flow from origin to consumption. | You need to ensure accuracy, trust, and compliance by mapping the journey of your data. |
4. Share or work with your data
The catalog helps your teams understand your data and makes the right data available for the right use. Data scientists and other types of users can help themselves to the data that they need while they remain compliant with corporate access and data protection policies. They can add data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.
| What you can use | What you can do | Best to use when |
|---|---|---|
| Catalogs | Organize your assets to share among the collaborators in your organization. Take advantage of AI-powered semantic search and recommendations to help users find what they need. |
Your users need to easily understand, collaborate, enrich, and access the high-quality data. You want to increase visibility of data and collaboration between business users. You need users to view, access, manipulate, and analyze data without understanding its physical format or location, and without having to move or copy it. You want users to enhance assets by rating and reviewing assets. |
| Global search | Search for assets across all the projects, catalogs, and deployment spaces to which you have access. Search for governance artifacts across the categories to which you have access. |
You need to find data or another type of asset, or a governance artifact. |
| Data Product Hub | Share data products; data producers can publish curated data products to share with data consumers in their community, and data consumers can easily access data products for your business needs. | You need to package, productize, and share your data-rich assets. |
| Data Refinery | Cleanse data to fix or remove data that is incorrect, incomplete, improperly formatted, or duplicated. Shape data to customize it by filtering, sorting, combining, or removing columns. |
You need to improve the quality or usefulness of data. |