What is IBM Watson Knowledge Catalog?

IBM Watson® Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including machine learning models and structured and unstructured data wherever they reside, so that they can be easily accessed and used to fuel data science and all forms of AI.

For selected source types, Watson Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards and business analysts to find, understand, share and use the assets. AI-powered search and recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used, and social connections between users.

Watson Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to ensure that the right data go to the right people. 

Through the Watson Knowledge Catalog Business Glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.

To what regions can you deploy Watson Knowledge Catalog?

If you have regional data restrictions, you can deploy to one of the following cities: Dallas, London, Frankfurt or Tokyo.

Is Watson Knowledge Catalog available anywhere else in the world?

Yes. In addition to the US, you can sign up for Watson Knowledge Catalog in the UK and in Germany.

Do I need to move my data into Watson Knowledge Catalog?

No. You can keep your data in its existing repositories. Watson Knowledge Catalog stores the metadata of your assets.

What data sources and asset types are supported?

IBM provides over 30 connectors to cloud or on-premises data-source types that will allow you to connect to your remote data assets. For example, connectors to IBM Db2® in the cloud or on premises, IBM Cloudant®, IBM Cloud™ Object Storage, Oracle, Microsoft SQL Server, Microsoft Azure, Amazon S3, Salesforce.com, Hortonworks HDFS, Sybase and many more are available from IBM.

In addition to assets from remote data sources, Watson Knowledge Catalog supports other asset types, such as structured (row/column), semi-structured and unstructured data. For example, you can add CSV, Microsoft Excel, PDF, Text, Microsoft Word, Jupyter Notebook (IPYNB), image and HTML files, to name a few, to the catalog to profile and share with other users.

 

What is the maximum number of assets I can have in Watson Knowledge Catalog?

With the Professional plan, there is no limit in the number of assets you can have in Knowledge Catalog. With the Standard and Lite plans, the limits are 500 and 50 assets, respectively.

Does Watson Knowledge Catalog provide governance services?

Watson Knowledge Catalog includes an automated policy-enforcement engine that will determine outcomes based upon the policies and the action that has taken place. Watson Knowledge Catalog provides the ability to set up your governance policies within the system, so that you can restrict access to data or transform the data by masking sensitive content. 

Can you delete or change the original source of data with a data policy that masks data?

No. When a data-protection policy anonymizes sensitive data in the catalog, only the preview data which is managed by the application is transformed. The original source data is not modified.

Does Watson Knowledge Catalog provide classification services?

Watson Knowledge Catalog can automatically classify columns in your data assets when they are added to the catalog. Built-in components provide over 160 attribute classifiers, including names, emails, postal addresses, credit card numbers, driver's license numbers, government identification numbers, dates of birth, demographic information, Data Universal Numbering System (DUNS) numbers and more. Catalogs also profile unstructured data assets and extract metadata from content, such as categories, concepts, sentiment and emotion. See Profile data assets.

Are there data-preparation capabilities in Watson Knowledge Catalog?

Yes. Data-preparation capabilities are available through Data Refinery, which is part of Watson Knowledge Catalog. Data Refinery provides a rich set of capabilities that not only allow you to discover, cleanse, and transform your data with built-in operations, but it also comes with powerful profiling and visualization tools, such as charts, graphs and stats to help you interact with and understand your data. Data-access-and-transform policies defined in Watson Knowledge Catalog are also enforced in Data Refinery to ensure that sensitive data that originated from governed catalogs remains protected.

Can you set up access groups for people in different lines of business?

Yes. Access groups can be set up through IBM Cloud Identity and Asset Management. In the Access Control module of Watson Knowledge Catalog, you can add a collaborator or a user group.

What are Capacity Unit-Hours?

Data Refinery Flow, Profiling and Sampling jobs are charged for the number of whole or partial hours they run multiplied by the number of Capacity Units required per hour for each capacity type. At the moment, IBM supports only one capacity type in Watson Knowledge Catalog, which requires six Capacity Units per hour. A minimum charge of 0.96 Capacity Unit-Hours will apply for each job execution. A set number of free Capacity Unit-Hours are included in each plan for the month. For the Professional plan, charges will apply once the plan limit is reached for that month. For the Lite plan, once the plan limit for that month is reached, a data-flow or profiling job cannot be run until the next month or until the plan is upgraded to the Professional plan.

 

Examples:

1) A Data Refinery Flow job runs for 30 minutes

0.5 hours x 6 Capacity Units Required Per Hour x USD 0.50 per Capacity Unit-Hour = USD 1.50

 

2) A Profiling job runs for nine minutes (profiling jobs can be automatically or manually triggered)

The minimum charge applies in this scenario:

0.16 hours x 6 Capacity Units Required Per Hour x USD 0.50 per Capacity Unit-Hour = USD 0.48

 

3) A Data Refinery Flow job runs for 12 hours

12 hours x 6 Capacity Units Required Per Hour x USD 0.50 per Capacity Unit-Hour = USD 36.00

After purchase of a Standard or Professional plan, how much set up is required in order to take full advantage of the product?

Watson Knowledge Catalog is all self service, so an administrator can start by creating a catalog, then adding and curating assets right away. Additional tasks can include:

  • Building a business glossary
  • Defining data protection policies to govern access to data
  • Inviting users to the catalog

Try Watson Knowledge Catalog

Take advantage of machine learning and AI to analyze your data. Catalog your data to make it easy to find.