Frequency distribution tables

When you run advanced profiling on a data asset, a detailed frequency distribution is determined for the distinct values in each column of the asset based on the source data.

When you configure the settings for an advanced profiling run, you can choose to write all or part of the frequency distribution information to a database table. See Advanced data profiling. You can access this table by using standard database queries or the Watson Data API or through the detailed column profile. However, the column profile will show only the first 100 distinct values regardless of how many values are actually stored.

For each distinct value, the table contains the following information:

Frequency distribution table
Column name Description
AssetId The ID of the data asset in the project.
ChangeDate The date on which the information was updated.
ColumnName The name of column in the data asset.
DataClassification A list of IDs of the data classes assigned to the column in the data asset separated by comma (,). If no data class is assigned to the column, the table shows U.
DistinctValue The actual data value in the column. The maximum length in byte is 4096 or 2,048 characters for Unicode.
All values are stored as strings irrespective of the actual data type. Thus, string sort order is applied when you sort the values in the detailed column profile.
FrequencyCount How often this value occurs.
GeneralFormat The format that represents the character pattern of a data value. Every alphabetic character is represented by an uppercase or lowercase letter A, depending on the capitalization of the character.
Every numeric character is represented by the number 9. Spaces and special characters are shown as they appear.
InferredDataType The inferred data type, such as integer, string, or date.
ProjectId The ID of the project in which the analysis was run.
PropertyLength The length of a string field.
PropertyPrecision The total length of a numeric field.
PropertyScale The scale of a numeric value is the total length of the decimal component of a numeric field.

These additional columns are reserved for internal use and are subject to change without notice:

  • Class
  • ChangedByUser
  • DataClassificationStatusFlag
  • DomainPattern
  • DomainValueFlag
  • DomainValueFlagDate
  • DomainValueFlaggedByUser
  • FieldNumber
  • FormatFlag
  • FormatFlagDate
  • FormatFlaggedByUser
  • InvalidReasonCode
  • ODBCType
  • SourceOfDistinctValue
  • TypeCode
  • TypeOfDomainValue

Learn more

Parent topic: Reviewing metadata enrichment results