What is IBM Watson® Knowledge Catalog?
Watson Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including machine learning models and structured and unstructured data wherever they reside, so that they can be easily accessed and used to fuel data science and all forms of AI.
For selected source types, Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards and business analysts to find, understand, share and use the assets. AI-powered search & recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used, and social connections between users.
Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to ensure that the right data go to the right people.
Through Knowledge Catalog's business glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.
Do I need to move my data into Watson Knowledge Catalog?
No, you can keep your data in their existing repositories. Knowledge Catalog stores the metadata of your assets.
What data sources and asset types are supported?
We provide over 30 connectors to cloud or on premises data source types that will allow you to connect to your remote data assets. For example, we provide connectors to IBM Db2 in the cloud or on premises, Cloudant, Cloud Object Storage, Oracle, Microsoft SQL Server, Microsoft Azure, Amazon S3, Salesforce.com, Hortonworks HDFS, Sybase and many more.
In addition to assets from remote data sources, Knowledge Catalog supports other asset types such as structured data (row/column), semi-structured, and unstructured data. For example, you can add csv, Excel, pdf, text, Microsoft Word, ipynb, image, html files etc to the catalog to profile and share with other users.
What is the maximum number of assets I can have in Watson Knowledge Catalog?
With the Professional plan, there is no limit in the number of assets you can have in Knowledge Catalog. With the Lite and Standard plans, the limits are 50 and 500 assets, respectively.
Does Watson Knowledge Catalog provide governance services?
Knowledge Catalog includes an automated policy enforcement engine that will determine outcomes based upon the policies and the action taken place. Knowledge Catalog provides the ability to set up your governance policies within the system so that you can restrict access to data or transform the data by masking sensitive content.
Does Watson Knowledge Catalog provide profiling and classification services?
Knowledge Catalog automatically classifies your data assets when they are added to the catalog. Out of the box, it provides over 160 classifiers including names, emails, postal addresses, credit card numbers, driver's licenses, government identification numbers, date of birth, demographic information, DUNS number and more. It can also profile unstructured assets and extract meta-data from content such as concepts, entities, keywords, sentiment and emotion.
Are there data preparation capabilities in Watson Knowledge Catalog?
Yes, data preparation capabilities are available through Data Refinery, which is part of Watson Knowledge Catalog. Data Refinery provides a rich set of capabilities that not only allow you to discover, cleanse, and transform your data with built-in operations, but it also comes with powerful profiling and visualization tools such as charts, graphs and stats to help you interact with and understand your data. Data access and transform policies defined in Knowledge Catalog are also enforced in Data Refinery to ensure that sensitive data that originated from governed catalogs remain protected.
What are Capacity Unit-Hours?
Data Refinery Flow, Profiling and Sampling jobs are charged for the number of whole or partial hours they run multiplied by the number of Capacity Units required per hour for each capacity type. At the moment, we only support one capacity type in Watson Knowledge Catalog which requires 6 Capacity Units per hours. A minimum charge of 0.96 Capacity Unit-Hours will apply for each job execution. A set number of free Capacity Unit-Hours are included in each plan for the month. For Professional plan, charges will apply once the plan limit is reached for that month. For Lite plan, once the plan limit for that month is reached, a data flow or profiling job cannot be run until the next month or when the plan is upgraded to the Professional plan.
1) A Data Refinery Flow job runs for 30 minutes
0.5 hours * 6 Capacity Units Required Per Hour * 0.5 USD per Capacity Unit-Hour = $1.50
2) A Profiling job runs for 9 minutes (profiling jobs can be automatically or manually triggered)
The minimum charge applies in this scenario:
0.16 hours * 6 Capacity Units Required Per Hour * 0.5 USD per Capacity Unit-Hour = $0.48
3) A Data Refinery Flow job runs for 12 hours
12 hours * 6 Capacity Units Required Per Hour * 0.5 USD per Capacity Unit-Hour = $36
Watson Knowledge Catalog is generally available.