Data mining — Easy Mining optional parameters for the PredictColumn procedure

In addition to the basic parameters for the PredictColumn procedure, you can specify optional parameters. Optional parameters modify the default settings of the PredictColumn procedure. For example, you can use an optional parameter to specify a subset of columns for the input table to the PredictColumn procedure.

The required quality of a prediction model depends on the application for which you want to use it. Being significantly better than random guessing might be a good result if you are using a prediction model. For example, if you want to predict cross-selling opportunities, you want to be sure that your prediction model has a reasonable quality of predicting what products a customer might buy together.

In other application areas, a very high prediction quality is mandatory, for example, if you want to use a reliable model to predict diseases from symptoms.

Removing fields from the input table

At times you might want to remove fields from the input table if there are causal relationships that are affecting the prediction quality of your model. This section describes a scenario on when you want to remove fields from your input table.

You might think that the prediction quality always should be as good as possible. In general, this is true. However, sometimes the result might be too good. The reason for a result that is too good can be that there are causal dependencies between the target field and some input fields.

For example, possessing a bank card leads to an increasing amount of debit transactions. Therefore, a high number of debit transactions seems to be a good indicator for possessing a bank card. However, the causal relationship is just the inverse. The possession of a bank card implies a higher amount of debit transactions. It does not imply that a high number of debit transactions entails having a bank card.

There are more reasons for having a high amount of debit transactions.

You should always examine the tests near the root node of the classification tree. You should check whether there are columns that have causal relationships. Whenever you identify columns with causal relationships, you must remove them from the input fields.

You might want to remove one or more fields from the input table because you do not want to use all fields to compute the model. To remove one field, you can use the DM_remDataSpecFld option. For example, to remove the column NBR_YEARS_CLI from the input table, you can use the following optional parameter string:

'DM_remDataSpecFld(''NBR_YEARS_CLI'')'

To remove more fields from the input table, you can use several DM_remDataSpecFld options in an Easy Mining procedure, or you can create a view that contains only the columns that you want to use. Use this view as input for the Easy Mining procedure.

Setting the depth of a classification tree

You can set more parameters of the Tree Classification mining function by using optional parameter strings. For example, if the target field is categorical, you might want to set the maximum tree depth.

Intelligent Miner® uses the Tree Classification mining function to predict the values of a categorical column. Classification trees can get very complex. You can limit the tree depth of classification trees by using the DM_setTreeClasPar option. For example, to set the tree depth to 5, the DM_setTreeClasPar option looks like this:

DM_setTreeClasPar('MaxDth', 5)

Example command with optional parameters

To determine the characteristics of bank card holders and non-bank card holders, you can use the PredictColumn procedure.

Use the following command to run the Easy Mining procedure:

db2 "call IDMMX.PredictColumn('BANK.BANKCARD_PRED', 
                              'BANK.BANKCUSTOMERS',
                              'BANKCARD'
                              'DM_setTreeClasPar(''MaxDth'', 5)')"

Where:

BANK.BANKCARD_PRED: is the name of the result view to be generated
BANK.BANKCUSTOMERS: is the name of the input table
BANKCARD: is the name of the target column

For a listing of optional parameter strings, see Optional parameter strings.