Use this view to define the properties for the Predictor
operator.
A Predictor operator is a graphical icon representing
a mining task that you place on the mining editor canvas
to create a prediction model that can predict values for the selected
target column.
In the Properties view, set the properties for
this operator by completing the fields in the following tabs:
- General
- Model Name
- Mining Settings
- Cost Matrix
- Column Properties
General tab
- Label
- You can rename the operator by specifying a new name. This new
name appears on the operator icon in the mining editor canvas.
- Description
- You can add a description for the operator. When the generated
model is stored in the IDMMX.ClassifModels table (for classification)
or the IDMMX.RegressionModels table (for regression) table, the description
is copied into the DESCRIPTION column of the table.
Model Name tab
- Prefix
- The prefix for the name of the model created by the operator.
Default
value: Name of the data warehousing project for
the mining flow
- Model name
- The name of the model that you want to build. The model is stored
in the IDMMX.ClassifModels table for classification and the IDMMX.RegressionModels
table for regression. If a model with the same name already exists,
the previous model is replaced with the new model. The maximum name
length is 240 characters.
Default value: System determined
Mining Settings tab
- Target column
- The column you select to predict values for (must select a value).
- Optional Parameters
- With optional parameter strings, you can modify default parameters
of the Easy Mining procedures. This is for advanced users only. For
supported optional parameters refer to the Intelligent Miner® Easy Mining Procedures
documentation.
- Classification specific settings (when selecting categorical target
field)
- Algorithm
- The following classification algorithms are available:
- Tree
- The Tree algorithm provides the following settings:
- Maximum purity
- The maximum purity value for the decision tree that is built by
tree classification. You can customize the binary decision tree by
specifying the maximum purity per internal node. This value is a limit
to stop further splitting of a node that has reached the specified
purity value when the initial decision tree is being built. Valid
values are percentages between 0 and 100.
- Default value: 0, which means 100%
- Maximum depth
- The maximum depth of the decision tree. You can customize the
binary decision tree by specifying the tree depth. Maximum tree depth
is a limit to stop further splitting of nodes when the specified tree
depth has been reached during the building of the initial decision
tree. Valid values are positive integers.
- Default value: 0, which means unlimited depth
- Minimum number of records per leaf node
- The minimum number of records that must be in each leaf node of
the generated decision tree. You can customize the binary decision
tree by specifying the minimum number of records per internal node.
This sets a limit value. It prevents further splitting of a node that
has reached the specified minimal size when the initial decision tree
is being built. Valid values are positive integers.
- Default value: 0, which means use the algorithms default value
which is 5 records
- Naive Bayes
- The Naive Bayes algorithm provides the parameter Probability Threshold.
This parameter specifies a value that is used whenever a probability
of zero is encountered in the model equations. The default value is
0.001.
- Logistic regression
- This classification algorithms does not provide specific settings.
By default, the Tree algorithm is used.
- Regression specific settings (when selecting numeric target fields)
- Algorithm
- Intelligent
Miner supports
the following Regression algorithms:
- Polynomial Regression
- Linear Regression
- Transform Regression
- Radial Basis Function (RBF) Regression
By default, the Transform Regression algorithm is used. You
can specify to use the Polynomial Regression algorithm, the Linear
Regression algorithm, or the RBF algorithm by using DM_setAlgorithm('Polynomial'), DM_setAlgorithm('Linear'),
or DM_setAlgorithm('RBF') in the DM_RegSettings object.
- Algorithm settings
- With algorithm settings, you can set values for algorithm-specific
parameters. For supported algorithm-specific parameters, refer to
the IBM® DB2® Intelligent
Miner Modeling documentation.
Cost Matrix tab
Cost Matrix parameters are
only available for the Tree classification algorithm.
By default,
all classifications have an equal weight. The cost matrix is automatically
adjusted so that values of the target field that appear less frequently
are not discriminated. You can also manually adjust the cost matrix.
With
a cost matrix you are assigning weights to combinations of actual
and predicted value. What is the cost, if the actual risk class for
a customer is "high" but the model predicts "low".
You can either
add Actual values to the cost matrix manually
by clicking the Add a new value icon or you
can load the values from a table by pressing the Load values
from categorical column icon on the toolbar. The other
icons on the toolbar are self-explanatory
Refer to Cost matrix for more information.
Column Properties tab
The tab shows a table
that allows you to define various properties for each column in the
input (virtual) table. The table has four columns:
- Input Column
- This column contains the names of all columns of the input table.
- For all the other column of the input table, column properties
can be defined here. The values in this column cannot be edited.
- SQL Type
- This column contains the SQL data types of all columns of the
input table. The values in this column cannot be edited.
- Field Type
- Field types are categorical or numerical. You can set the field
types to System Determined or Categorical.
- By default, the system treats numeric database fields like INTEGER
or DOUBLE as numerical and character fields as categorical.
Tip: If you encode categorical information like marital
status in a numeric field by using a code like 1 for
'married' and 2 for 'single', override the default
value by explicitly selecting Categorical.
- If you use the Naive Bayes algorithm for the Predictor operator,
you can set the field type also to set-valued categorical.
- Field Usage Type
- The type of field usage can be user-specified as System determined,
Active, or Inactive.
- System determined means that the algorithm decides whether
the field should be included in the training run.
- Active means the field must be used for the training run.
- Inactive means the field must not be used for the training
run.