Using Naive Bayes for Predictor Selection and Classification

If you are a loan officer at a bank, then you want to be able to identify characteristics that are indicative of people who are likely to default on loans, and use those characteristics to identify good and bad credit risks.

Suppose information on 850 past and prospective customers is contained in bankloan.sav. See the topic Sample Files for more information. The first 700 cases are customers who were previously given loans. You will split these 700 customers into training and test samples in order to create and validate a model. Cases 701 to 850 are prospective customers; thus, these cases have missing values on the response variable and the procedure automatically excludes them from model building. However, since they have valid values for the predictors, the procedure will generate model-predicted probabilities for these cases when you save these values to the dataset. Command syntax for reproducing these analyses can be found in naivebayes_bankloan.sps.

Next