Fields and Field Details

Figure 1. Fields
Fields

The Fields view displays the processed fields and whether ADP recommends using them for model building. Clicking on any field name displays more information about the field in the linked view.

  1. Click income.
    Figure 2. Field details for Household income in thousands
    Field details for Household income in thousands

    The Field Details view shows the distributions of the original and transformed Household income in thousands. According to the processing table, records identified as outliers were trimmed (by setting their values equal to the cutoff for determining outliers) and the field was standardized to have mean 0 and standard deviation 1. The "bump" at the far right of the histogram for the transformed field shows that a number of records, perhaps more than 200, were identified as outliers. Income has a heavily skewed distribution, so this may be a case in which the default cutoff is too aggressive in determining outliers.

    Also note the increase in predictive power of the transformed field over the original field. This appears to be a useful transformation.

  2. In the Fields view, click job_start_date_day. (Note that this is different from job_start_date_days.)
    Figure 3. Field details for job_start_date_day
    Field details for job_start_date_day

    The field job_start_date_day is the day extracted from Employment starting date [job_start_date]. It is highly unlikely that this field has any real bearing on whether a claim is fraudulent, and so the insurance company wants to remove it from consideration for model building.

    Figure 4. Field details for Household income in thousands
    Field details for Household income in thousands
  3. In the Fields view, select Do not use from the Version to Use dropdown in the job_start_date_day row. Perform the same operation for all fields with the _day and _month suffixes.
  4. To apply the transformations, click Run.

The dataset is now ready for model building, in the sense that all recommended predictors (both new and old) have their role set to Input, while non-recommended predictors have their role set to None. To create a dataset with only the recommended predictors, use the Apply Transformations settings in the dialog.

Next