Building and testing a prediction model

This tutorial introduces you to the prediction data mining function. The sample data contains customer information for a bank.

The database DWESAMP needs to be prepared on the database server. See Preparing the data for the tutorials

This tutorial shows how to build and test a mining flow that predicts the customer lifetime (how long a customer will be a customer). By using a set of historic records about existing customers, you can train and test a prediction model that can be used to predict the customer lifetime for new customers based on facts you know about existing customers.

Create a new mining flow in the MiningTutorial project for the prediction model.
1. Right-click the Mining Flows folder under the MiningTutorial project that was created and click New, The New wizard opens.
2. Click Mining Flow and then click Next. The New File window opens.
3. Type the name of the mining flow and click Next. For our tutorial, enter PredictCustomerLifetime as the mining flow name and click Next. The Select Connection window opens.
4. Click the Use an existing connection option, select DWESAMP, and click Finish. The title in the palette area of the window reflects the mining flow name as well as the Properties area, in this case PredictCustomerLifetime.
Add operators to the mining editor canvas:
1. Place a Table source operator on the canvas. The Select Database Table window opens.
2. Expand the BANK schema, select the BANKCUSTOMERS table, and then click Finish. Click the plus icon to expand the table source operator and notice the target column NBR_YEARS_CLI contained in the BANKCUSTOMERS table.
3. To split the available records into a training set from which to create the model and a test set from which to verify the model quality, place the Random Split operator to the right of the Table source operator.
Connect the operators in the canvas
1. From the Palette, click Connection.
2. Click the OutputPort of the Table Source operator and then click the Input port of the Random Split operator.
View the random split operator properties:
1. Select the Random Split operator.
2. On the Properties page, note that the default percentage for the Percentage of test data field is 50. In this tutorial, 50% of the records are used for training and 50% are used as a test set.
To create a classification model for categorical target fields and a regression model for numerical target fields, add a Predictor operator to the canvas.
1. Place a Predictor operator on the canvas to the right of the Random Split operator.
2. Connect the operators on the canvas. From the Palette, click Connection. Click the Random Split Training output port and then click the Predictor Input port.
3. On the Properties page, click the Model Name tab and change the Model name field to CustomerLifetime.
4. Click the Mining Settings tab. From the Target column list, select NBR_YEARS_CLI as the field to be predicted.
  Tip: Since this is a numerical field, the Predictor will build a regression model to predict the values of this field.
To verify the model quality with a different set of records from those used for training, add a Tester operator to the flow.
1. Place a Tester operator on the canvas.
2. Connect the Model port of the Predictor operator to the Model port of the Tester operator.
3. Connect the Testing port of the Random Split operator to the Input port of the Tester operator.
To display the mining model, add a Visualizer operator to the flow.
1. Place a Visualizer operator on the canvas.
2. Connect the TestResult port of the Tester operator to the Model port of the Visualizer operator.
Edit some of the properties of the Tester operator.
1. Select the Tester operator.
2. From the Properties view, click the Test Result Name tab.
3. Change the Test Result Name to CustomerLifetimeTest.
4. Click the Save icon on the toolbar to save the mining flow.
Start the mining flow.
1. Click the Execute Mining Flow icon on the toolbar. The Execution of Flow window opens.
2. Accept the default values and click Execute. The Mining Flow Execution status window opens.
  Note: You can observe the mining flow progress in the Data Output page. First, you see the SplitData Mining and the BuildRegModel procedures are called to build the regression model. Next, you see that the test set of records is processed to verify the model quality when IDMMX.TestRegModel is called.
  Upon completion, the Quality page of the regression visualizer is displayed. This measures the overall quality of the model based on the test set of records.
3. Click the Gains/Lift tab. The Lift Factor graph is displayed.
4. Right-click the graph. The Customize Gains/Lift menu is displayed. Click Gains. The Gains/Lift View shows the ranking quality of the regression model. The ranking quality indicates the capability of a model to correctly order records based on a predicted property. The blue curve represents our model and the green curve is the optimum that would be obtained with a perfect prediction.
5. Close the regression visualizer. In the Data Source Explorer tree, expand Connections > DWESAMP > Data Mining Models > Regression. Both the regression model, CustomerLifetime, and the test result, CustomerLifetimeTest, are now available in the DWESAMP database.

Feedback