Frequently asked questions
What is IBM Watson® Knowledge Catalog?
IBM Watson Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including machine learning models and structured and unstructured data wherever they reside. This enables these assets to be more easily accessed and used to fuel data science and all forms of AI.
For selected source types, IBM Watson Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards and business analysts to find, understand, share and use the assets. AI-powered search and recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used and social connections between users.
IBM Watson Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to help ensure that the right data goes to the right people.
Through the IBM Watson Knowledge Catalog business glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.
Do I need to move my data into IBM Watson Knowledge Catalog?
No. You can keep your data in its existing repositories. Watson Knowledge Catalog stores the metadata of your assets.
What data sources and asset types are supported?
IBM provides over 30 connectors to cloud or on-premises data sources, allowing you to connect to your remote data assets. For example, connectors to IBM® Db2® in the cloud or on premises, IBM Cloudant®, IBM Cloud® Object Storage, Oracle, Microsoft SQL Server, Microsoft Azure, Amazon S3, Salesforce.com, Hortonworks HDFS, Sybase and many more are available from IBM.
In addition to assets from remote data sources, IBM Watson Knowledge Catalog supports other asset types, such as structured (row/column), semi-structured and unstructured data. For example, you can add CSV, Microsoft Excel, PDF, Text, Microsoft Word, Jupyter Notebook (IPYNB), image and HTML files, to name a few, to the catalog to profile and share with other users.
What is the maximum number of assets I can have in IBM Watson Knowledge Catalog?
With the Professional plan, there is no limit in the number of assets you can have in IBM Watson Knowledge Catalog. With the Standard and Lite plans, the limits are 1,000 and 50 assets, respectively.
Does IBM Watson Knowledge Catalog provide governance services?
IBM Watson Knowledge Catalog includes an automated policy enforcement engine that determines outcomes based upon the policies and the action that has taken place. IBM Watson Knowledge Catalog provides the ability to set up your governance policies within the system, so that you can restrict access to data or transform the data by masking sensitive content.
Can you delete or change the original source of data with a data policy that masks data?
No. When a data protection policy anonymizes sensitive data in the catalog, only the preview data that is managed by the application is transformed. The original source data is not modified.
Are there best practices for managing governance artifacts in IBM Watson Knowledge Catalog?
In IBM Watson Knowledge Catalog in IBM Cloud Pak for Data v3.5, you can now assign users and data stewards to categories so you can determine who can view or manage governance artifacts owned by the category. The category collaborator roles can also be leveraged in workflows to automatically direct workflow steps to the right people for reviews and approvals. With this new capability, the business community can be empowered to self-govern their own business assets.
Learn about some key steps as you plan and implement the stewardship of your governance artifacts.
Does IBM Watson Knowledge Catalog provide classification services?
IBM Watson Knowledge Catalog can automatically classify columns in your data assets when they are added to the catalog. Built-in components provide over 160 attribute classifiers, including names, emails, postal addresses, credit card numbers, driver's license numbers, government identification numbers, dates of birth, demographic information, Data Universal Numbering System (DUNS) numbers and more. Catalogs also profile unstructured data assets and extract metadata from content, such as categories, concepts, sentiment and emotion.
Are there data preparation capabilities in IBM Watson Knowledge Catalog?
Yes. Data preparation capabilities are available through IBM Data Refinery, which is part of IBM Watson Knowledge Catalog. Data Refinery provides a rich set of capabilities that allows you to discover, cleanse and transform your data with built-in operations. It also comes with powerful profiling and visualization tools, such as charts, graphs and stats to help you interact with and understand your data. Data access and transform policies defined in IBM Watson Knowledge Catalog are also enforced in Data Refinery to help ensure that sensitive data that originated from governed catalogs remains protected.
How do you access the reference data management capabilities?
To access the reference data management feature, log in to your IBM Cloud Pak for Data instance. From the left-hand navigation bar, access "Reference Data" under the "Governance" section.
Once you are on the "Reference Data" page, you get to see the list of all Published and the list of Draft reference data sets defined in the system. To start with, the list will be empty and you can create a new reference data set from the button "Add Reference Data set" > "New Reference Data Set."
Can you set up access groups for people in different lines of business?
Yes. Access groups can be set up through IBM Cloud® Identity and Asset Management. In the Access Control module of IBM Watson Knowledge Catalog, you can add a collaborator or a user group.
What are capacity unit hours (CUH)?
Data Refinery flows, Data Refinery interactive UI, and profiling jobs are charged for the number of whole or capacity units required per hour for each capacity type:
- Data Refinery flows require 1.5 capacity units per hour with a default Spark environment. For other custom environments, the calculation depends on the number of executioners and resources used for Spark driver and executor.
- Data Refinery interactive UI requires 1.5 capacity units per hour – beginning when the refinery UI starts and ending when it is closed.
- Profiling jobs require 6 capacity units per hour. A minimum charge of 0.96 (equivalent to 10 minutes) will apply for each job execution.
A set number of free capacity unit hours is included in each plan for the month. For Standard and Professional plans, charges will apply after the plan limit is reached for that month. For a Lite plan, after the plan limit for that month is reached, no Data Refinery flows or profiling jobs can be run until the next month, or until the plan is upgraded to the Standard or Professional plan.
Data Refinery flow examples using default Capacity Type 3:
- One Data Refinery flow runs for 1 hour: 1.5 CUHs
- Two Data Refinery flows run for 1 hour each: 2 hours * 1.5 CUHs = 3 CUHs
- One Data Refinery flow runs for 30 minutes: 0.5 hours * 1.5 CUHs = 0.75 CUHs
- Interactive Data Refinery UI is used for 1 hour: 1.5 CUHs
Profiling examples (profiling jobs can be automatically or manually triggered):
- A Profiling job runs for 30 minutes: 0.5 hours * 6 CUHs = 3 CUHs
- A Profiling job runs for 9 minutes. The minimum charge applies in this scenario: 0.16 hours * 6 CUHs = 0.96 CUHs
After purchase of a Standard or Professional plan, how much setup is required in order to take full advantage of the product?
IBM Watson Knowledge Catalog is all self-service, so an administrator can start by creating a catalog, then adding and curating assets right away. Additional tasks can include:
- Building a business glossary
- Defining data protection policies to govern access to data
- Inviting users to the catalog
Is this available on IBM Cloud Pak® for Data?
Yes. Explore more about this integrated data and AI platform from IBM.
Try IBM Watson Knowledge Catalog
Activate business-ready data for AI and analytics with intelligent cataloging.