Advanced data profiling
Advanced profiling provides more accurate results than regular profiling but takes longer to complete because large amounts of data must be processed.
Base Premium Standard Unless otherwise noted, this information applies to all editions of IBM Knowledge Catalog.
The DataStage service must be deployed for running advanced profiling.
If any of the connections to the data sources are locked, you are asked to enter your personal credentials. This is a one-time step that permanently unlocks the connections for you.
To run advanced data profiling on one or more assets:
-
Open the metadata enrichment asset.
-
On the Assets tab, select assets as required.
-
Select Enrich > Run advanced data profiling from the toolbar.
-
Optional: Customize settings.
-
Select whether you want to write frequency distribution information to a database table and determine how many distinct values you want to capture.
Without an output table, the first 100 distinct values are captured and stored internally. You can view and download that information from the Statistics page of a column profile.
If you choose to write frequency distribution information to a table, enable the External output option. The section is prepopulated with the default enrichment settings. See Advanced profiling settings. You can change the settings as required for this individual advanced profiling run. If you change the output table, you can also set this table as the new default location, thus overwriting the previous default setting.
You can access this table by using standard database queries or through the detailed column profile. For more information, see Frequency distributions.
If a Google BigQuery output table is configured, remember that it might take some time until the data is updated. To avoid results accumulating in the output table, wait at least 90 minutes before you rerun advanced profiling. For more information, see Stream data availability. Alternatively, you can define a different output table.
-
Select a sampling type. See Designing metadata enrichment.
-
-
Click Run. You are notified when the analysis is complete.
Learn more
Parent topic: Enriching your data assets