ROC Analysis

Receiver operating characteristic (ROC) Analysis is a useful way to assess the accuracy of model predictions by plotting sensitivity versus (1-specificity) of a classification test (as the threshold varies over an entire range of diagnostic test results). The full area under a given ROC curve, or AUC, formulates an important statistic that represents the probability that the prediction will be in the correct order when a test variable is observed (for one subject randomly selected from the case group, and the other randomly selected from the control group). ROC Analysis supports the inference regarding a single AUC, precision-recall (PR) curves, and provides options for comparing two ROC curves that are generated from either independent groups or paired subjects.

PR curves plot precision versus recall, tend to be more informative when the observed data samples are highly skewed, and provide an alternative to ROC curves for data with a large skew in the class distribution.

Example
It is in a bank's interest to correctly classify customers into those customers who will and will not default on their loans, so special models are developed for making these decisions. ROC Analysis can be used to evaluate and assess the accuracy of the model predictions.
Statistics
AUC, negative group, missing values, positive classification, cutoff value, strength of conviction, two-sided asymptotic confidence interval, distribution, standard error, independent-group design, paired-sample design, nonparametric assumption, bi-negative exponential distribution assumption, midpoint, cut point, PR curve, stepwise interpolation, asymptotic significance (2-tail), Sensitivity and (1-Specicity), Precision and Recall.
Methods
The areas under two ROC curves, that are generated from either independent groups or paired subjects, are compared. Comparing two ROC curves can provide more information in the accuracy resulted from two comparative diagnostic approaches.

ROC Analysis data considerations

Data
PR curves plot Precision versus Recall, and tend to be more informative when the observed data samples are highly skewed. A simple linear interpolation may mistakenly yield an overly-optimistic estimate of a PR curve.
Assumptions
The prediction will be in the correct order when a test variable is observed for one subject that is randomly selected from the case group and the other is randomly selected from the control group. Each defined group will contain at least one valid observation. Only a single grouping variable is used for a single procedure.

Obtaining an ROC Analysis

This feature requires the Statistics Base option.

  1. From the menus choose:

    Analyze > Classify > ROC Analysis

  2. Select one or more test probability variables.
  3. Select one state variable.
  4. Identify the positive value for the state variable.
  5. Optionally select the Paired-sample design option, or select a single grouping variable (you cannot select both options).
    • Use the Paired-sample design setting to request the paired-sample design for the test variable(s). The paired-sample design compares two ROC curves in a paired-sample scenario when multiple test values are measured on the same subjects that are associated with a state variable.
      Note: When Paired-sample design is selected, the Grouping Variable and Distribution Assumption (in the Options dialog) options are disabled.
    • When a numeric grouping variable is selected, you can click Define Groups... to request the independent group design for the test variable(s), and to specify two values, a midpoint, or a cut point.
  6. Optionally, click Options to define the classification, test direction, standard error parameters, and missing values settings.
  7. Optionally, click Display to define the plotting and print settings (which include ROC Curve, Precision-Recall Curve and model quality settings).
  8. Click OK.

This procedure pastes ROC ANALYSIS command syntax.