This tutorial introduces you to the prediction data mining
function. The sample data contains customer information for a bank.
This tutorial shows how to build and test a mining flow that
predicts the customer lifetime (how long a customer will be a customer).
By using a set of historic records about existing customers, you can
train and test a prediction model that can be used to predict the
customer lifetime for new customers based on facts you know about
existing customers.
- Create a new mining flow in the MiningTutorial project
for the prediction model.
- Right-click the Mining Flows folder
under the MiningTutorial project that was created
and click New, The New wizard opens.
- Click Mining Flow and then click Next.
The New File window opens.
- Type the name of the mining flow and click Next. For our tutorial, enter PredictCustomerLifetime as
the mining flow name and click Next. The Select
Connection window opens.
- Click the Use an existing connection option,
select DWESAMP, and click Finish.
The title in the palette area of the window reflects the mining flow
name as well as the Properties area, in this case PredictCustomerLifetime.
- Add operators to the mining editor canvas:
- Place a Table source operator
on the canvas. The Select Database Table window opens.
- Expand the BANK schema, select the BANKCUSTOMERS table,
and then click Finish. Click the plus icon
to expand the table source operator and notice the target column NBR_YEARS_CLI
contained in the BANKCUSTOMERS table.
- To split the available records into a training set from
which to create the model and a test set from which to verify the
model quality, place the Random Split operator to the right of the Table source operator.
- Connect the operators in the canvas
- From the Palette, click Connection.
- Click the OutputPort of the Table
Source operator and then click the Input port
of the Random Split operator.
- View the random split operator properties:
- Select the Random Split operator.
- On the Properties page, note that the default percentage
for the Percentage of test data field is 50.
In this tutorial, 50% of the records are used for training and 50%
are used as a test set.
- To create a classification model for categorical target
fields and a regression model for numerical target fields, add a Predictor
operator to the canvas.
- Place a Predictor operator on the canvas to the right
of the Random Split operator.
- Connect the operators on the canvas. From the Palette,
click Connection. Click the Random Split Training output
port and then click the Predictor Input port.
- On the Properties page, click the Model Name tab
and change the Model name field to CustomerLifetime.
- Click the Mining Settings tab.
From the Target column list, select NBR_YEARS_CLI as
the field to be predicted.
Tip: Since this
is a numerical field, the Predictor will build a regression model
to predict the values of this field.
- To verify the model quality with a different set of records
from those used for training, add a Tester operator to the flow.
- Place a Tester operator on the canvas.
- Connect the Model port of the
Predictor operator to the Model port of the
Tester operator.
- Connect the Testing port of the
Random Split operator to the Input port of
the Tester operator.
- To display the mining model, add a Visualizer operator
to the flow.
- Place a Visualizer operator on the canvas.
- Connect the TestResult port of
the Tester operator to the Model port of the
Visualizer operator.
- Edit some of the properties of the Tester operator.
- Select the Tester operator.
- From the Properties view, click the Test
Result Name tab.
- Change the Test Result Name to CustomerLifetimeTest.
- Click the Save icon on the toolbar
to save the mining flow.
- Start the mining flow.
- Click the Execute Mining Flow icon
on the toolbar. The Execution of Flow window opens.
- Accept the default values and click Execute.
The Mining Flow Execution status window opens.
Note: You
can observe the mining flow progress in the Data Output page. First,
you see the SplitData Mining and the BuildRegModel procedures are
called to build the regression model. Next, you see that the test
set of records is processed to verify the model quality when IDMMX.TestRegModel
is called.
Upon completion, the Quality page of the regression
visualizer is displayed. This measures the overall quality of the
model based on the test set of records.
- Click the Gains/Lift tab. The
Lift Factor graph is displayed.
- Right-click the graph. The Customize Gains/Lift menu
is displayed. Click Gains. The
Gains/Lift View shows the ranking quality of the regression model.
The ranking quality indicates the capability of a model to correctly
order records based on a predicted property. The blue curve represents
our model and the green curve is the optimum that would be obtained
with a perfect prediction.
- Close the regression visualizer. In the Data Source
Explorer tree, expand . Both the regression
model, CustomerLifetime, and the test result, CustomerLifetimeTest,
are now available in the DWESAMP database.