Processed data in the reports - examples

When you review the SQA reports, decide if the values in the dictionary columns of the records make sense. If the values do not make sense, review the patterns of those values to decide how to modify the rule sets to get the results that you need.

Standardization Quality Assessment (SQA) report

The following sections of the SQA summary report identify data in your records processed by the Standardize stage:
Standardization Summary
Pie chart that shows 96.73% fully standardized and 3.27% partially or non-standardized records
Light blue square that indicates the color of fully standardized records in the pie chartFully standardized recordsDark blue square that indicates the color of partially or nonstandardized records in the pie chartPartially or nonstandardized records
The pie chart shows the percentage of records that were fully standardized by the job and the percentage of records that were either partially standardized or not standardized at all by the job. Fully standardized means that the job was able to place all the record values into dictionary columns other than the UnhandledData column.
In the Standardization Quality Assessment Record Examples report, look at the values in the fully standardized data columns to decide what to do next:
  • If the values in the columns make sense, you might decide that your current rule set is good enough and that no adjustments are necessary.
  • If the values in the columns are not consistent with your understanding of the data, adjust the rule set to better standardize the data.
Frequency of Records by Populated Dictionary Column

HouseNumber (528)

Example of populated dictionary column percentage

The bar chart shows the number of unique values (shown in parentheses) for a given dictionary column. Also shows the percentage of total records that contain a value in that dictionary column. For example, a column and number such as HouseNumber (528) that shows 78.39% means that the HouseNumber column contains 528 unique values and 78.39% of the processed data contain a value in that column.

Composition Sets
Example of composition sets
The data grid shows columns of sets that contain records for which the same dictionary columns are populated by the job. Additionally, the percentage shown beneath the set heading is the percent of total records that fall within that set.

For example, Set 1 represents 72.03% of the total records whose values populate the same eight dictionary columns. This percentage does not mean that the values in each column are the same, only that the same columns in this particular set are populated by the job. The remaining sets continue in descending percentages of the total data.

The heading of the Composition Sets page shows you the percentage of the processed records that are represented by the composition sets in the report. The higher the set number, the smaller the percentage of total records in a set that the stage processed. For example, in Set 20 the percentage is typically less than 1%. Set 20 might contain only two records for which the same columns are populated by the job.

Standardization Quality Assessment (SQA) Record Examples report

The SQA Record Examples report contains sheets that correspond to composition sets as summarized in the SQA summary report.

Set 2 of a Record Examples report
spreadsheet that shows Columns A through F of 10 record examples in Set 2
A spreadsheet that shows Columns G through L of 10 record examples in Set 2
The report shows the record examples that are contained by Set 2 of the records processed by the stage. The set numbers correspond to the sheet numbers in the Record Examples report.

Compare Set 2 in the Record Examples report to the first page of composition sets in the SQA summary report. Notice that 11.69% of the total records are contained by this set, which is the same percentage that the summary report shows you for the set.

In this example, the fifth dictionary column in the summary report, StreetName, contains a check mark. In the Record Examples report, the first column of the report table contains the input records. The other columns of the report are the dictionary columns. StreetName is the first dictionary column for the input records in Set 2 of the Record Examples report. Each input record contains a value in the StreetName column. The remaining check marks in the Set 2 column of the summary report follow the same pattern as the StreetName column. The second check mark represents the second dictionary column in the Record Examples report, and so on, until the Record Examples report shows you the data values for the UserOverrideFlag column. The subsequent sheets correspond to the remaining sets in the summary report.

Use the Record Examples report to verify that the data values make sense for a particular column. If the values are not what you expect, modify the rule set to better handle the data.