What is Data Catalog?

Data Catalog is a cloud-based enterprise metadata repository that lets you catalog your data sources and assets wherever they reside. For selected source types, Data Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards and business analysts to find, understand, share and use the assets.

Data Catalog also provides an intelligent and robust governance framework that lets you define and enforce governance policies to ensure that the right data go to the right people.  

Through Data Catalog's business glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.

Do I need to move my data into Data Catalog?

No, you can keep your data in their existing repositories. Data Catalog stores the metadata of your assets.

What data sources and asset types are supported?

We provide close to 30 connectors to cloud or on premises data source types that will allow you to connect to your remote data assets. For example, we provide connectors to IBM Db2 in the cloud or on premises, Cloudant, Cloud Object Storage, Oracle, Microsoft SQL Server, Microsoft Azure, Amazon S3, Salesforce.com, Hortonworks HDFS, Sybase and many more.

In addition to assets from remote data sources, Data Catalog supports other asset types such as structured data (row/column), semi-structured (social, memos, etc.), csv, Excel files, ipynb documents, images etc.

What is the maximum number of assets I can have in Data Catalog?

There is no limit in the number of assets you can have in Data Catalog, and that includes the lite/free plan.

Does Data Catalog provide governance services?

Data Catalog includes an automated policy enforcement engine that will determine outcomes based upon the policies and the action taken place. Data Catalog provides the ability to set up your governance policies within the system and allow you to restrict access to data based upon the defined policies.

Does Data Catalog provide classification services?

Data Catalog automatically classifies your data assets when they are added to the catalog. Out of the box, it provides over 160 classifiers including names, emails, postal addresses, credit card numbers, driver's licenses, government identification numbers, date of birth, demographic information, DUNS number and more.

Are there data wrangling capabilities in Data Catalog?

No, but data wrangling capabilities can be made available through Data Refinery, which is another offering under Watson Data Platform. Once you provision Data Refinery through IBM Cloud, the data wrangling capabilities will be immediately visible.

In fact, Data Catalog, Data Refinery and Data Science Experience are all part of the fully integrated Watson Data Platform fabric, and all three offerings work seamlessly together in an integrated experience, allowing data professionals in different roles to collaborate effectively.

Get started

Data Catalog is generally available.