Preparing the Data for Analysis

Setting the random seed allows you to replicate the random selection of cases in this analysis. If you are not replicating an analysis, there is generally no need to set the seed.

  1. To set the random seed, from the menus choose:

    Transform > Random Number Generators...

    Figure 1. Random Number Seed dialog box
    Random Number Seed dialog box
  2. Select Set Starting Point.
  3. Select Fixed Value and type 9191972 as the value
  4. Click OK.
  5. To create the selection variable for validation, from the menus choose:

    Transform > Compute Variable...

    Figure 2. Compute Variable dialog box
    Compute Variable dialog box
  6. Type training in the Target Variable text box.
  7. Type rv.bernoulli(0.7) in the Numeric Expression text box.

    This sets the values of training to be randomly generated Bernoulli variates with probability parameter 0.7. Thus, 70% of the 700 prior customers will be selected to the training sample.

    You only intend to use training with cases that could be used to create the model; that is, previous customers. However, there are 150 cases corresponding to potential customers in the data file. In order to prevent possible confusion, you want to make sure that cases with a missing value of default are not "assigned" to the training or test samples.

  8. To perform the computation only for previous customers, click If.
    Figure 3. If Cases dialog box
    If Cases dialog box
  9. Select Include if case satisfies condition.
  10. Type MISSING(default) = 0 as the conditional expression.

    This ensures that training is only computed for cases with non-missing values for default; that is, for customers who previously received loans.

  11. Click Continue.
  12. Click OK in the Compute Variable dialog box.

Approximately 70 percent of the customers previously given loans will have a training value of 1. These customers will be used to create the model. The remaining customers who were previously given loans will be used to validate the model results.

Next