Profiles of assets (Watson Knowledge Catalog)
The profile of a data asset includes generated metadata and statistics about its content. You can see the profile on the asset’s Profile page in a catalog or in an analytics project. All catalog or project members can see data asset profiles.
The profile of a data asset that contains relational or structured data shows information about each column in the data set. By default, the profile is created based on the first 5,000 rows of data. However, if the data asset has more than 250 columns, the profile is created based on the first 1,000 rows of data. The profile shows the inferred data classes and statistics about the data for each column. Data classes describe the contents of the data in the column: for example, city, account number, or credit card number. Data classes are necessary to mask data with policies. Also, they can be used to restrict access to data assets with policies. The data classes appear for each column on the asset’s Overview page and on the Profile page.
These types of relational and structured data are profiled by column:
- Data assets from relational databases from a connection to the data sources listed here.
- Data assets from partitioned data sets, where a partitioned data set consists of multiple files and is represented by a single folder uploaded from the local file system or from file-based connections to the data sources listed here.
- Data assets from files uploaded from the local file system or from file-based connections to the data sources listed here, with these formats:
- CSV
- Avro
- Parquet
However, structured data files are not profiled when data assets do not explicitly reference them, such as in these circumstances:
- The files are within a folder asset. Files that are accessible from a folder asset are not treated as assets and are not profiled.
- The files are within an archive file. The archive file is referenced by the data asset and the compressed files are not profiled.
In governed catalogs, profiles for data assets are created by default.
In projects and in catalogs without data protection rule enforcement, you must manually create profiles for data assets.