In this article, we’ll cover how to measure the quality of the TensorFlow neural net model covered in this previous article. The code for this article can be obtained from the Jupyter notebook in my TensorFlow Samples repository. Although the machine learning model is written with TensorFlow, the code for this article is written generically so that you can apply it to machine learning models built with other libraries, too.

The neural net model in the previous article was a binary classifier that learned to distinguish those who were likely to default on a bank loan from those who weren't, based on a set of predictor variables. The most basic measure of accuracy for a binary classifier is the rate of correct classifications. This is what the neural net trains to optimize, and the model training in the code gets an accuracy result of just above 80%. For this data, is this a good result? An excellent result? Or could a much better model be produced using this data?

The basic accuracy result for our binary classifier was based on selecting the predicted value based on which class (defaulter or non-defaulter) got the higher confidence value. Put another way, there was a confidence threshold of 50% for determining whether (true) or not (false) a loan applicant would default on a bank loan.

The accuracy on the positive samples is called the **true positive rate (TPR)**, and the accuracy on the negative samples is the true negative rate (TNR). The negative samples that the classifier gets wrong are the false positives, and so the **false positive rate (FPR)** is the percentage false positive samples divided by the total number of negative samples. In the terms of the bank loan example, the true positives are those that were predicted to default and that did default, and the false positives are those that were predicted to default but did not. If needed, further details are here.

A **receiver operating characteristic (ROC) curve** is a plot of the TPR versus FPR as we vary some variable related to the model being tested. In this case, we will vary the confidence threshold because it will give a fine grain view of the model’s ability to distinguish, based on confidence values, the true (defaulter) case from the false (non-defaulter) case. In fact, the **Area Under the Curve (AUC)** corresponds to the probability that the model will produce a higher confidence value for a randomly selected true case than it will for a randomly selected false case. The diagram below shows the ROC curve and AUC value for the bank loan TensorFlow neural net:

Due to the sklearn and matplotlib packages, it is easy to write the code that calculates the data for the ROC curve and the AUC:

import matplotlib.pyplot as plt from sklearn.metrics import roc_curve, auc FPR, TPR, _ = roc_curve(dependent_test, dependent_prob[:, 1]) AUC = auc(FPR, TPR)

The first parameter to *roc_curve()* gives the actual predicted values for each sample, and the second parameter is the set of confidence values for the true (1) class for each sample. The method produces the FPR and TPR data that is used by *auc()* to determine the AUC and that is also used by the plotting code below:

plt.figure() plt.plot(FPR, TPR, label='ROC curve (area = %0.2f)' % AUC) plt.plot([0, 1], [0, 1], 'r--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.02]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve') plt.legend(loc="lower right") plt.show()

Finally, with an AUC of 0.85, the bank loan TensorFlow neural net is arguably a good model, especially given the variables and data at hand. The conclusion, then, is that it would be difficult to obtain higher accuracy by simply tuning the model. Other variables and/or data would be needed to train a more accurate machine learning model for this problem.