Basic syntax and parameters for the PredictColumn procedure
The basic call of the PredictColumn procedure includes required parameters. It creates a view.
Basic Syntax
IDMMX.PredictColumn(predView,
inputTable,
targetColumn)
You can choose to specify optional parameters in addition to the basic syntax.
Parameters
- viewName
- The
name of the view that you want to build. The PredictColumn procedure creates a view and a model. Depending on the mining function that is used to build the model, the model is stored in one of the following tables under the same name as the generated view:
IDMMX.ClassifModels
if the target column is categoricalIDMMX.RegressionModels
if the target column is numeric
This parameter is of type VARCHAR. Its size is 240.
- inputTable
- The name of the input table or the
input view.
The columns of the input table that are unlikely to be useful to create a model are ignored by the Easy Mining procedure. These are, for example, key columns.
This parameter is of type VARCHAR. Its size is 257.
- targetColumn
- The name
of the target column.
The PredictColumn procedure derives the values in this column from the values of the other columns in the input table. If the target column is categorical, the Classification mining function is used. If the target column is numeric, the Regression mining function is used.
This parameter is of type VARCHAR. Its size is 128.
For information about the valid SQL types of categorical and numerical fields, see Mining field types.
Data Flow
PREDICTION
and CONFIDENCE
. 
Output
- PREDICTION
- This column contains the predicted values of the target column. These values are derived from the values of the input table.
- CONFIDENCE
- This column contains the
confidence value of the prediction. If the target column is categorical, the confidence value can range from 0 to 1.
- A value close to 0 indicates a low probability that the prediction is correct.
- A value close to 1 indicates a high probability that the prediction is correct.
If the target column is numeric, this column contains only null values.
With the prediction confidence, you can select the most reliable predictions.
To analyze the prediction model in detail, you can use the visualizer in the Design Studio.
Data Flow of the PredictColumn Procedure
- Training data set
- The training data set is used to compute the prediction model.
- Validation data set
- The quality of the prediction model is based on the records of the validation data set.
The model quality indicates how well the model might perform on unknown data. Typically, the model quality is better on the training data than on the validation data because the model might be tuned towards the records of the training data set.
In the extreme case it is as if you learned all records of the training data set by heart. This means that you would have an optimal model quality for the training data set because for all records of the training data set the predictions would be correct. On the other hand, you would not know what to predict for a record of the validation data set unless it had the same values as a record in the training data set. Therefore, for computing the quality of the model, it is better to use data records that were not used in the training phase.