Building the flow

Figure 1. Example flow to classify customers using binomial logistic regression
Example flow to classify customers using binomial logistic regression
  1. Add a Data Asset node that points to telco.csv.
  2. Add a Type node, double-click it to open its properties, and make sure all measurement levels are set correctly. For example, most fields with values of 0 and 1 can be regarded as flags, but certain fields, such as gender, are more accurately viewed as a nominal field with two values.
    Figure 2. Measurement levels
    Measurement levels
  3. Set the measurement level for the churn field to Flag, and set the role to Target. Leave the role for all other fields set to Input.
  4. Add a Feature Selection modeling node to the Type node. You can use a Feature Selection node to remove predictors or data that don't add any useful information about the predictor/target relationship.
  5. Run the flow. Right-click the resulting model nugget and select View Model. You'll see a list of the most important fields.
  6. Add a Filter node after the Type node. Not all of the data in the telco.csv data file will be useful in predicting churn. You can use the filter to only select data considered to be important for use as a predictor (the fields marked as Important in the model generated in the previous step).
  7. Double-click the Filter node to open its properties, select the option Retain the selected fields (all other fields are filtered), and add the following important fields from the Feature Selection model nugget:
    tenure
    age
    address
    income
    ed
    employ
    equip
    callcard
    wireless
    longmon
    tollmon
    equipmon
    cardmon
    wiremon
    longten
    tollten
    cardten
    voice
    pager
    internet
    callwait
    confer
    ebill
    loglong
    logtoll
    lninc
    custcat
    churn
  8. Add a Data Audit output node after the Filter node. Right-click the node and run it, then open the output that was added to the Outputs pane.
  9. Look at the % Complete column, which lets you identify any fields with large amounts of missing data. In this case, the only field you need to amend is logtoll, which is less than 50% complete.
  10. Close the output, and add a Filler node after the Filter node. Double-click the node to open its properties, click Add Columns, and select the logtoll field.
  11. Under Replace, select Blank and null values. Click Save to close the node properties.
  12. Right-click the Filler node you just created and select Create supernode. Double-click the supernode and change its name to Missing Value Imputation.
  13. Add a Logistic node after the Filler node. Double-click the node to open its properties. Under Model Settings, select the Binomial procedure and the Forwards Stepwise method.
    Figure 3. Choosing model settings
    Choosing model settings
  14. Under Expert Options, select Expert.
    Figure 4. Choosing expert options
    Choosing expert options
  15. Click Output to open the display settings. Select At each step, Iteration history, and Parameter estimates, then click OK.
    Figure 5. Choosing expert options
    Choosing expert options