The intent in my last blog was to use a straightforward example neural net use case so we could focus on the nuts and bolts of TensorFlow, Python, a Jupyter notebook, and IBM Data Science Experience. However, for use cases involving neural nets and other statistical learning algorithms, there are good alternative technical choices within IBM Data Science Experience that don’t involve writing so much Python and TensorFlow library code. One such choice is to use SPSS Modeler Flows instead.
In my Sandbox project, I clicked the data button (top right labeled 10 01), then I clicked Load, which offers a UI for me to drag and drop my csv data file. This is the same bankloan data file used in my last blog. Go and get it so you too can have this IBM Data Science Experience.
In your project, go to the Assets view and then press ‘New Flow’ on the right side of the ‘SPSS Modeler flows’ section. Here’s how I filled in the ‘Create’ page:
Name: Predictive Model for Bank Loan Default
Description: Machine learn how to predict bank loan default based on predictor variables such as income, debt level, credit debt, etc.
Runtime: IBM SPSS Modeler
Once you hit ‘Create’ then you get a blank SPSS canvas to start building your flow. Drag and drop the bank loan data file onto the left side of the canvas (this reads a file from Object Storage, but you could alternatively do a database select in this first step).
Now we’re going to select the fields we want to use and filter out unwanted columns. Drag and Drop the ‘Type’ node from the ‘Field Operations’ palette and then click and drag the output connector from the bankloan data node to the input connector of the Type node. Now right-click the Type node, choose Open and then press the Add Columns button and make sure all the fields’ checkboxes are checked so we can add all the fields. Then, hit the left arrow to go back to the Type node configuration.
Now, we only want the fields for age, education level, years of employment, years at current address, income, debt to income ratio, credit debt, other debt, and bank loan default field. So, for each of the preddef fields, change the Role from Input to None. Now go to the ‘ default’ field and change its Role from Input to Target because this is the dependent variable that we want to machine learn how to predict. Hit OK to finish setting up the Type node.
Now we’re going to filter out unwanted records in the data. In this case, there are some records that have no value for the default field, so we will filter them out because we can’t use them for machine learning how to predict the default field. Drag and Drop the Select node from Record Operations palette and connect it to the Type node. Right-click and Open the node. You can use an “Include” mode with a condition of default=”0” or default=”1”, or you could use an “Exclude” mode with a condition of @NULL(default).
You can right-click and choose Preview on the Select node to see the results of the flow up to now. That’s how I found that I needed quotes around the 0 and 1 in the condition.
Now that we’ve done some basic data preparation, it’s time to configure the machine learning.
Drag and drop a Partition node from the Field Operations palette, and connect the Select node output to the Partition node input. Right click to open it so we can set the training and testing sizes. To match the TensorFlow sample in my previous blog, use 70 percent for training and 30 percent for testing.
Once you hit OK on that, it’s time to drag and drop a Neural Net node from the Modeling palette and connect its input with the output of the Partition node. The UI knows to use the ‘default’ field as the predictive target, and the other fields are inputs, except the ones we marked with a role of None. Now just hit OK.
Now we’re ready to do some machine learning. Right-click the ‘default’ Neural Net node and select “Run” from the menu. This generates a training results node. Here’s what the final flow looks like:
|And now, you can right click the yellow ‘default’ training node to see the results of training the neural net. Select the View Model menu item. You can see the accuracy values in the Model Evaluation tab. The Predictor Importance tab gives you a graph of which predictor variables were most important to affecting the prediction. And last but not least, the Network Diagram shows you something like this:|
In conclusion, both the TensorFlow library and this SPSS model got the same accuracy, just north of 80% on this data. The difference was that with SPSS I had a drag and drop canvas where I could just configure the data preparation, the model, and the training and testing, so it took a lot less time to go from data to trained machine learning model.