ROC analysis

Receiver operating characteristic (ROC) analysis is a useful way to assess the accuracy of model predictions by plotting sensitivity versus (1-specificity) of a classification test (as the threshold varies over an entire range of diagnostic test results). The full area under a given ROC curve, or AUC, formulates an important statistic that represents the probability that the prediction will be in the correct order when a test variable is observed (for one subject randomly selected from the case group, and the other randomly selected from the control group). ROC Analysis supports the inference regarding a single AUC, precision-recall (PR) curves, and provides options for comparing two ROC curves that are generated from either independent groups or paired subjects.

PR curves plot precision versus recall, tend to be more informative when the observed data samples are highly skewed, and provide an alternative to ROC curves for data with a large skew in the class distribution.

Example
It is in a bank's interest to correctly classify customers into those customers who will and will not default on their loans, so special models are developed for making these decisions. ROC Analysis can be used to evaluate and assess the accuracy of the model predictions.
Statistics
AUC, negative group, missing values, positive classification, cutoff value, strength of conviction, two-sided asymptotic confidence interval, distribution, standard error, independent-group design, paired-sample design, nonparametric assumption, bi-negative exponential distribution assumption, midpoint, cut point, PR curve, stepwise interpolation, asymptotic significance (2-tail), Sensitivity and (1-Specicity), Precision and Recall.
Methods
The areas under two ROC curves, that are generated from either independent groups or paired subjects, are compared. Comparing two ROC curves can provide more information in the accuracy resulted from two comparative diagnostic approaches.

Data considerations

Data
PR curves plot Precision versus Recall, and tend to be more informative when the observed data samples are highly skewed. A simple linear interpolation may mistakenly yield an overly-optimistic estimate of a PR curve.
Assumptions
The prediction will be in the correct order when a test variable is observed for one subject that is randomly selected from the case group and the other is randomly selected from the control group. Each defined group will contain at least one valid observation. Only a single grouping variable is used for a single procedure.

Obtaining an ROC analysis

This feature requires Statistics Base Edition.

  1. From the menus choose:

    Analyze > Classification > ROC analysis

  2. Click Select variables under the Test variables section, select one or more test probability variables, and then click OK.
  3. When the default Independent-group design setting is selected, you can optionally click Select variable under the Group variable section, select a single variable to group cases, and then click OK.

    When a numeric grouping variable is selected, you can click the Group: link next to the group variable to request the independent group design for the test variables, and to specify two values, a midpoint, or a cut point. For more information, see ROC analysis: Define groups.

  4. Click Select variable under the State variable section, select a single state variable, and then click OK.
  5. Click the Define state* link next to the state variable to identify the positive value for the state variable. Click OK after specifying the positive state value.
  6. Optionally select the Paired-sample design option to request the paired-sample design for the test variables. The paired-sample design compares two ROC curves in a paired-sample scenario when multiple test values are measured on the same subjects that are associated with a state variable.
    Note: When Paired-sample design is selected, the Group variable and Distribution Assumption (in the Classification dialog) options are disabled.
  7. Optionally, expand the Additional settings menu and click the following:
    • Click Classification to define the cutoff value, test direction, and standard error of area under the curve.
    • Click Statistics to select which statistics to include in the procedure.
    • Click Plots to define plotting for the ROC and Precision-Recall curves.
    • Click Options to specify missing values settings.
  8. Click Run analysis.

This procedure pastes ROC ANALYSIS command syntax.