Basic syntax and parameters for the PredictColValue procedure

To predict an outcome, use the PredictColValue procedure. The basic call of the PredictColValue procedure includes required parameters. It creates a view.

Basic Syntax

IDMMX.PredictColValue(predView,
                      inputTable,
                      targetColumn,
                      targetValue)

You can choose to specify optional parameters in addition to the basic syntax.

Parameters

With the PredictColValue procedure, you must specify the following parameters:

viewName

The name of the view that you want to build.

The PredictColValue procedure creates a view and a model. The model is stored in the table IDMMX.PredictColValue under the same name as the generated view. If a model with the same name already exists, the previous model is replaced with the new model. If a view with the same name already exists, the previous view is replaced with the new view.

This parameter is of type VARCHAR. Its size is 240.

inputTable

The name of the input table or the input view.

The columns of the input table that are unlikely to be useful to create a model are ignored by the Easy Mining procedure. These are, for example, key columns.

This parameter is of type VARCHAR. Its size is 257.

targetColumn

The name of the column whose values are to be predicted.

This parameter is of type VARCHAR. Its size is 128.

For information about the valid SQL types of categorical and numerical fields, see Mining field types.

targetValue

The name of a value in the target column, for example, YES or NO.

This parameter is of type VARCHAR. Its size is 1024.

Output

The PredictColValue procedure creates a view. This view contains the columns of the input table and the following additional columns:

DATA_SET

This column indicates whether the data is used for testing or for validation.

TARGET_VALUE

This column contains the target value of the target column, for example, SIGNED, PROLONGED, or CANCELED.

CONFIDENCE

This column contains the estimated confidence that is calculated by the Classification mining function that the target column contains the target value. It indicates the reliability that the prediction is correct.

The values in this column can range from 0 to 1.

A value close to 0 indicates a low certainty that the prediction is correct.
A value close to 1 indicates that the prediction is reliable.

With the confidence value, you can determine whether an individual prediction is sufficiently reliable for your application.

PRED_VALUE

This column contains the estimated confidence that is calculated by the Regression mining function that the target column contains the target value. It has the same validity range and meaning as the column CONFIDENCE.

It might be useful to consider the results of different prediction techniques. For example, for a mailing campaign it might be more appropriate to use a target audience selected by the Classification mining function instead of a target audience selected by the Regression mining function. It depends on the size of the target audience. If you use a different size of target audience, it might be more appropriate to select the target audience by the Regression mining function. The values in the CLASSIFICATION_CONFIDENCE column and the PREDICTED_VALUE column help you to make this decision.

For more information, seeDetermining the best suited model.

Data Flow of the PredictColValue Procedure

The PredictColValue procedure does not use the original target column for the mining runs. To build a classification model, it creates a new categorical target column that includes the following values:

<targetValue>
!=<targetValue>

To build a regression model, the PredictColValue procedure creates a numeric column. This column can contain the values 1 or 0.

It contains 1, if the value of the original target column is equal to the target value.
It contains 0, if the value of the original target column is different to the target value.

Prediction models are computed on a subset of the input data. The PredictColValue procedure splits the input data into the following disjoint data sets:

Training data set: The training data set is used to compute the prediction model.
Validation data set: The quality of the prediction model is based on the records of the validation data set.

The model quality indicates how well the model might perform on unknown data. You can use the validation results to select the model that is best suited according to the requirements of your application. For more information, see Splitting tables into training data sets and test data sets.

Figure 1 shows the data flow of the PredictColValue procedure.

This graphic shows the data flow of the PredictColValue procedure. This procedure uses the input table to generate a model and an output view. — Figure 1. Data flow of the PredictColValue procedure