Data mining — Properties view of the Predictor operator

Use this view to define the properties for the Predictor operator.

A Predictor operator is a graphical icon representing a mining task that you place on the mining editor canvas to create a prediction model that can predict values for the selected target column.

In the Properties view, set the properties for this operator by completing the fields in the following tabs:

General
Model Name
Mining Settings
Cost Matrix
Column Properties

General tab

Label: You can rename the operator by specifying a new name. This new name appears on the operator icon in the mining editor canvas.

Description: You can add a description for the operator. When the generated model is stored in the IDMMX.ClassifModels table (for classification) or the IDMMX.RegressionModels table (for regression) table, the description is copied into the DESCRIPTION column of the table.

Model Name tab

Prefix: The prefix for the name of the model created by the operator.
Default value: Name of the data warehousing project for the mining flow

Model name: The name of the model that you want to build. The model is stored in the IDMMX.ClassifModels table for classification and the IDMMX.RegressionModels table for regression. If a model with the same name already exists, the previous model is replaced with the new model. The maximum name length is 240 characters.
Default value: System determined

Mining Settings tab

Target column: The column you select to predict values for (must select a value).

Optional Parameters: With optional parameter strings, you can modify default parameters of the Easy Mining procedures. This is for advanced users only. For supported optional parameters refer to the Intelligent Miner® Easy Mining Procedures documentation.

Classification specific settings (when selecting categorical target field)

Algorithm

The following classification algorithms are available:

Tree

The Tree algorithm provides the following settings:

Maximum purity: The maximum purity value for the decision tree that is built by tree classification. You can customize the binary decision tree by specifying the maximum purity per internal node. This value is a limit to stop further splitting of a node that has reached the specified purity value when the initial decision tree is being built. Valid values are percentages between 0 and 100.; Default value: 0, which means 100%

Maximum depth: The maximum depth of the decision tree. You can customize the binary decision tree by specifying the tree depth. Maximum tree depth is a limit to stop further splitting of nodes when the specified tree depth has been reached during the building of the initial decision tree. Valid values are positive integers.; Default value: 0, which means unlimited depth

Minimum number of records per leaf node: The minimum number of records that must be in each leaf node of the generated decision tree. You can customize the binary decision tree by specifying the minimum number of records per internal node. This sets a limit value. It prevents further splitting of a node that has reached the specified minimal size when the initial decision tree is being built. Valid values are positive integers.; Default value: 0, which means use the algorithms default value which is 5 records

Naive Bayes

The Naive Bayes algorithm provides the parameter Probability Threshold. This parameter specifies a value that is used whenever a probability of zero is encountered in the model equations. The default value is 0.001.

Logistic regression

This classification algorithms does not provide specific settings.

By default, the Tree algorithm is used.

Regression specific settings (when selecting numeric target fields)

Algorithm

Intelligent Miner supports the following Regression algorithms:

Polynomial Regression
Linear Regression
Transform Regression
Radial Basis Function (RBF) Regression

By default, the Transform Regression algorithm is used. You can specify to use the Polynomial Regression algorithm, the Linear Regression algorithm, or the RBF algorithm by using DM_setAlgorithm('Polynomial'), DM_setAlgorithm('Linear'), or DM_setAlgorithm('RBF') in the DM_RegSettings object.

Algorithm settings: With algorithm settings, you can set values for algorithm-specific parameters. For supported algorithm-specific parameters, refer to the IBM® DB2® Intelligent Miner Modeling documentation.

Cost Matrix tab

Cost Matrix parameters are only available for the Tree classification algorithm.

By default, all classifications have an equal weight. The cost matrix is automatically adjusted so that values of the target field that appear less frequently are not discriminated. You can also manually adjust the cost matrix.

With a cost matrix you are assigning weights to combinations of actual and predicted value. What is the cost, if the actual risk class for a customer is "high" but the model predicts "low".

You can either add Actual values to the cost matrix manually by clicking the Add a new value icon or you can load the values from a table by pressing the Load values from categorical column icon on the toolbar. The other icons on the toolbar are self-explanatory

Refer to Cost matrix for more information.

Column Properties tab

The tab shows a table that allows you to define various properties for each column in the input (virtual) table. The table has four columns:

Input Column: This column contains the names of all columns of the input table.; For all the other column of the input table, column properties can be defined here. The values in this column cannot be edited.
SQL Type: This column contains the SQL data types of all columns of the input table. The values in this column cannot be edited.
Field Type: Field types are categorical or numerical. You can set the field types to System Determined or Categorical.; By default, the system treats numeric database fields like INTEGER or DOUBLE as numerical and character fields as categorical.; Tip: If you encode categorical information like marital status in a numeric field by using a code like 1 for 'married' and 2 for 'single', override the default value by explicitly selecting Categorical.; If you use the Naive Bayes algorithm for the Predictor operator, you can set the field type also to set-valued categorical.

Field Usage Type: The type of field usage can be user-specified as System determined, Active, or Inactive.; System determined means that the algorithm decides whether the field should be included in the training run.; Active means the field must be used for the training run.; Inactive means the field must not be used for the training run.