Profiles of assets
The profile of a data asset includes generated metadata and statistics about its content. You can see the profile on the asset's Profile page in a catalog or in an analytics project. All catalog or project members can see data asset profiles.
The profile of a data asset that contains relational or structured data shows information about each column in the data set. By default, the profile is created based on the first 5,000 rows of data. However, if the data asset has more than 250 columns, the profile is created based on the first 1,000 rows of data. The profile shows the inferred data classes and statistics about the data for each column. Data classes describe the contents of the data in the column: for example, city, account number, or credit card number. Data classes can be used to mask data with data protection rules. Also, they can be used to restrict access to data assets with policies. The data classes appear for each column on the asset's Overview page and on the Profile page.
These types of relational and structured data are profiled by column:
- Data assets from relational databases from a connection to the data sources.
- Data assets from partitioned data sets, where a partitioned data set consists of multiple files and is represented by a single folder uploaded from the local file system or from file-based connections to data sources.
-
Data assets from files uploaded from the local file system or from file-based connections to the data sources, with these formats:
- CSV
- XLSX (Only the first sheet in a workbook is profiled.)
- Avro
- Parquet
However, structured data files are not profiled when data assets do not explicitly reference them, such as in these circumstances:
- The files are within a folder asset. Files that are accessible from a folder asset are not treated as assets and are not profiled.
- The files are within an archive file. The archive file is referenced by the data asset and the compressed files are not profiled.
In governed catalogs, profiles for data assets are created by default.
In projects and in catalogs without data protection rule enforcement, you must manually create profiles for data assets.
Learn more
Parent topic: Asset types and properties