Preparing the Data for Analysis

Setting the random seed allows you to replicate the analysis exactly.

  1. To set the random seed, from the menus choose:

    Transform > Random Number Generators...

    Figure 1. Random Number Generators dialog box
    Random Number Generators dialog box
  2. Select Set Starting Point.
  3. Select Fixed Value, and type 9191972 as the value.
  4. Click OK.

In the previous logistic regression analysis, approximately 70% of the past customers were assigned to the training sample and 30% to a holdout sample. A partition variable will be necessary to exactly recreate the samples used in those analyses.

  1. To create the partition variable, from the menus choose:

    Transform > Compute Variable...

    Figure 2. Compute Variable dialog box
    Compute Variable dialog box
  2. Type partition in the Target Variable text box.
  3. Type 2*rv.bernoulli(0.7)-1 in the Numeric Expression text box.

    This sets the values of partition to be randomly generated Bernoulli variates with a probability parameter of 0.7, modified so that it takes values 1 or −1, instead of 1 or 0. Recall that cases with positive values on the partition variable are assigned to the training sample, cases with negative values are assigned to the holdout sample, and cases with a value of 0 are assigned to the testing sample. For now, we won't specify a testing sample.

  4. Click OK in the Compute Variable dialog box.

Approximately 70% of the customers previously given loans will have a partition value of 1. These customers will be used to create the model. The remaining customers who were previously given loans will have a partition value of −1 and will be used to validate the model results.

Next