Choosing Between Objectives
- To run Automated Data Preparation interactively, from the menus
choose:
Figure 1. Objective tab The first tab asks for an objective that controls the default settings, but what is the practical difference between the objectives? By running the procedure using each of the objectives we can see how the results differ.
- Make sure Balance speed & accuracy is
selected and click Analyze.
Figure 2. Analysis tab, field processing summary for balanced objective Focus automatically switches to the Analysis tab while the procedure processes the data. The default main view is of the Field Processing Summary, which gives you an overview of how the fields were processed by automated data preparation. There is a single target, 18 inputs, and 18 fields recommended for model building. Of the fields recommended for modeling, 9 are original input fields, 4 are transformations of original input fields, and 5 are derived from date and time fields.
Figure 3. Analysis tab, predictive power for balanced objective The default auxiliary view is of the Predictive Power, which quickly gives you an idea of which recommended fields will be most useful for model building. Note that while 18 predictors are recommended for analysis, only the first 10 are shown by default in the predictive power chart. To show more or fewer fields, use the slide control below the chart.
With Balance speed & accuracy as the objective, Type of claim is identified as the "best" predictor, followed by Number of people in household and the claimant's current age in months (the computed duration from the date of birth to the current date).
- Click Clear Analysis, then click the Objective tab.
- Select Optimize for speed and click Analyze.
Figure 4. Analysis tab, field processing summary when optimized for speed Focus again automatically switches to the Analysis tab while the procedure processes the data. In this case, only 2 fields are recommended for model building, and both are transformations of the original fields.
Figure 5. Analysis tab, predictive power when optimized for speed With Optimize for speed as the objective, claim_type_transformed is identified as the "best" predictor, followed by income_transformed.
- Click Clear Analysis, then click the Objective tab.
- Select Optimize for accuracy and click Analyze.
Figure 6. Analysis tab, predictive power when optimized for accuracy With Optimize for accuracy as the objective, 32 fields are recommended for model building, as more fields are derived from dates and times by extracting days, months, and years from dates, and hours, minutes and seconds from times.
Figure 7. Analysis tab, predictive power when optimized for accuracy Type of claim is identified as the "best" predictor, followed by the number of days since the claimant started their most recent job (the computed duration from the job start date to the current date) and the year the claimant started their current job (extracted from the job start date).
To summarize:
- Balance speed & accuracy creates fields usable in modeling from dates, and may transform continuous fields like reside to make them more normally distributed.
- Optimize for accuracy creates some extra fields from dates (it also checks for outliers, and if the target is continuous, may transform it to make it more normally distributed).
- Optimize for speed does not prepare dates and does not rescale continuous fields, but does merge categories of categorical predictors and bin continuous predictors when the target is categorical (and perform feature selection and construction when the target is continuous).
The insurance company decides to explore the Optimize for accuracy results further.
- Select Fields from the main view dropdown.