Preparing the Data for Analysis
Setting the random seed allows you to replicate the analysis exactly.
- To set the random seed, from
the menus choose:
Figure 1. Random Number Generators dialog box - Select Set Starting Point.
- Select Fixed Value, and type 9191972 as the value.
- Click OK.
In the previous logistic regression analysis, approximately 70% of the past customers were assigned to the training sample and 30% to a holdout sample. A partition variable will be necessary to exactly recreate the samples used in those analyses.
- To create the partition variable, from the menus choose:
Figure 2. Compute Variable dialog box - Type partition in the Target Variable text box.
- Type 2*rv.bernoulli(0.7)-1 in the Numeric Expression
text box.
This sets the values of partition to be randomly generated Bernoulli variates with a probability parameter of 0.7, modified so that it takes values 1 or −1, instead of 1 or 0. Recall that cases with positive values on the partition variable are assigned to the training sample, cases with negative values are assigned to the holdout sample, and cases with a value of 0 are assigned to the testing sample. For now, we won't specify a testing sample.
- Click OK in the Compute Variable dialog box.
Approximately 70% of the customers previously given loans will have a partition value of 1. These customers will be used to create the model. The remaining customers who were previously given loans will have a partition value of −1 and will be used to validate the model results.