The required quality of a prediction model depends on the application for which you want to use it. Being significantly better than random guessing might be a good result if you are using a prediction model. For example, if you want to predict cross-selling opportunities, you want to be sure that your prediction model has a reasonable quality of predicting what products a customer might buy together.
In other application areas, a very high prediction quality is mandatory, for example, if you want to use a reliable model to predict diseases from symptoms.
At times you might want to remove fields from the input table if there are causal relationships that are affecting the prediction quality of your model. This section describes a scenario on when you want to remove fields from your input table.
You might think that the prediction quality always should be as good as possible. In general, this is true. However, sometimes the result might be too good. The reason for a result that is too good can be that there are causal dependencies between the target field and some input fields.
For example, possessing a bank card leads to an increasing amount of debit transactions. Therefore, a high number of debit transactions seems to be a good indicator for possessing a bank card. However, the causal relationship is just the inverse. The possession of a bank card implies a higher amount of debit transactions. It does not imply that a high number of debit transactions entails having a bank card.
There are more reasons for having a high amount of debit transactions.
You should always examine the tests near the root node of the classification tree. You should check whether there are columns that have causal relationships. Whenever you identify columns with causal relationships, you must remove them from the input fields.
'DM_remDataSpecFld(''NBR_YEARS_CLI'')'
To remove more fields from the input table, you can use
several DM_remDataSpecFld options in an Easy
Mining procedure, or you can create a view that contains only the
columns that you want to use. Use this view as input for the Easy
Mining procedure.You can set more parameters of the Tree Classification mining function by using optional parameter strings. For example, if the target field is categorical, you might want to set the maximum tree depth.
DM_setTreeClasPar('MaxDth', 5)
To determine the characteristics of bank card holders and non-bank card holders, you can use the PredictColumn procedure.
Use the following command to run the Easy Mining procedure:
db2 "call IDMMX.PredictColumn('BANK.BANKCARD_PRED',
'BANK.BANKCUSTOMERS',
'BANKCARD'
'DM_setTreeClasPar(''MaxDth'', 5)')"
Where: For a listing of optional parameter strings, see Optional parameter strings.