Governing and curating data (Watson Knowledge Catalog)
With the Watson Knowledge Catalog service, you can have catalogs of curated assets that are supported by a governance framework.
Service This service is not available by default. An administrator must install this service on the IBM Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.
Watch the following video for an overview of Watson Knowledge Catalog features.
How you get started depends on your user role and permissions and your goal. To see which roles and permissions you have, click your user avatar, select Profile and settings, and then view the Permissions page. If you need more permissions, contact your Cloud Pak for Data administrator.
Create governance artifacts to govern data assets
To view or create governance artifacts, open the main menu and choose Organize > Data and AI governance, and then the name of the artifact.
Data Stewards and Data Quality Analysts can govern data assets in catalogs by using governance artifacts in these ways:
- Protect sensitive information from unauthorized access by creating policies and data protection rules that deny access or mask data values in data assets. Policies and data protection rules apply to all catalogs that have data protection enabled.
- Describe data to catalog users by associating classifications, data classes, business terms, and governance rules with data assets to help catalog users understand the data.
- Identify data for other governance artifacts. Classifications, data classes, and business terms can identify the data to protect in data protection rules. Reference data sets help match data classes to data columns, as well as manage standards for data consistency and quality.
Administrators can configure workflows for governance artifacts to require explicit approvals for new or updated artifacts.
Curate data assets
To curate data, open the main menu and choose Organize and then Curation or Data quality.
Basic data curation tools are available to all catalog collaborators. Basic data curation tools help you develop valuable data assets in these ways:
- Leave your data where it is in the cloud or on-premises and just add the connection information to access it within a catalog or an analytics project.
- Publish assets from an analytics project to a catalog.
- Upload files to the file storage that’s associated with the catalog.
- Automatically discover all data sets associated with a connection to a relational data source and add them to an analytics project.
- Refine data to improve its quality and usefulness in an analytics project.
- Assign classifications, data classes, and business terms to data assets within a catalog.
- Automatically profile data sets to assign data classes within a catalog.
- Rate and review data assets within a catalog.
- Create tags and add them to data assets within a catalog.
Advanced data curation tools are reserved for the default catalog, and require the Data Steward and Data Quality Analyst roles. Advanced data curation tools help you develop valuable data assets in these ways:
- Import data assets to the default catalog.
- Automatically discover all data sets associated with a connection to a data source and add them to a project. Alternatively, quickly analyze data sample of large data sources to get a general overview of the data quality, and later add selected data sets to a project.
- Analyze data quality of data sets in the project to determine whether the assets have sufficient quality to publish to a catalog.
- Run custom data rules and rule sets to check specific conditions of your data.
- Customize analysis settings and results in the project.
- Automatically suggest and assign business terms and data classes to data assets in a project.
- Publish assets from a project to the default catalog.
See Curate data.
You can search across all catalogs that you are a member of by entering one or more words in the global search field.
To open a catalog, open the main menu and choose Organize > All catalogs, and then click the name of a catalog.
You can find assets within a catalog in these ways:
- Search with keywords and filters that are based on subject tags, business terms, and other asset properties.
- Look the previews of asset contents to make sure you pick the correct assets.
- Read reviews about assets that are provided by catalog collaborators.
- Choose from recommended assets that are automatically compiled based on your usage history, similar assets, and other factors.
- Choose from the most highly rated assets.
If you have the Business Analyst, Data Steward, or Data Quality Analyst role, you can view more information about assets that are in the default catalog in the Information assets view:
- View lineage reports that analyze the flow of data from data sources, through jobs and stages, and into databases, data files, business intelligence reports, and other assets.
- View custom attributes of data assets.
- View the results of data quality analysis for data assets.
- View relationships between data assets.
Work with assets in analytics projects
To discover insights by working with data, you need to move the assets to an analytics project. You can also use a project as a staging area to curate data assets before publishing them to the catalog. Projects contain a select subset of catalog collaborators.
To open or create a project, open the main menu and choose Projects.
You have these capabilities for working with assets in projects with Watson Knowledge Catalog:
- Add assets from a catalog to a project to work with them.
- Publish assets from a project to a catalog to make them available for others to use.
- Discover assets from a connection to automatically create them in a project before publishing them to the catalog.
- Cleanse and shape relational data assets with the Data Refinery tool.
See Analytics projects.
If you have the Watson Studio service installed, you can analyze data and build models. See Overview of Watson Studio.
View Watson Knowledge Catalog APIs
To use Watson Knowledge Catalog APIs in your application, view the API documentation at this URL:
Your base URL is the IP address or name of your Cloud Pak for Data application server.