Overview (ROC ANALYSIS command)

ROC ANALYSIS assess the accuracy of model predictions by plotting sensitivity versus (1-specificity) of a classification test (as the threshold varies over an entire range of diagnostic test results). The full area under a given ROC curve, or AUC, formulates an important statistic that represents the probability that the prediction will be in the correct order when a test variable is observed (for one subject randomly selected from the case group, and the other randomly selected from the control group). ROC Analysis supports the inference regarding a single AUC, precision-recall (PR) curves, and provides options for comparing two ROC curves that are generated from either independent groups or paired subjects.

The old ROC Curve procedure supports the statistical inference about a single ROC curve. This may also be recovered by the new ROC Analysis procedure. Furthermore, the new ROC Analysis procedure can compare two ROC curves generated from either independent groups or paired subjects.

Options

Paired-sample design
It is not uncommon to compare two ROC curves in a paired-sample scenario where multiple test values are measured on the same subjects that are associated with a state variable. The paired-sample design uses two, or more than two, test measurements.
Grouping variable
When a grouping variable is specified, you can request the independent group design for the test variable(s). Numeric or string values can be specified as user-specified values, or the midpoint and cut point settings can be used.
Classification
The cutoff value for positive classification can be included or excluded from analysis.
Test direction
The test direction can be set to have either larger or smaller test results indicate a more positive test.
Parameters for standard of error area
Provides options for defining the distribution assumption and confidence level percentage.
Missing values
Provides options for excluding both user-missing ans system-missing values, or treating user-missing values as valid.
Plot options
Provides options for plotting the ROC and Precision-Recall curves, and controls whether or not a bar chart is created to display the value of the lower bound of the confidence interval of the estimated "Area Under the Curve".
Print options
Provides options for defining the output for the corresponding statistics, including which statistics display in the "Area Under the Curve" table, the coordinate points of ROC and Precision-Recall curves, and classifier evaluation metrics

Basic specification

The basic specification is one or more numeric variables as the test result variable(s) and one variable as the actual state variable with one of its values. When /DESIGN PAIR = TRUE, at least two numeric variables must defined as test result variables. ROC ANALYSIS uses the nonparametric (distribution-free) method to calculate the area under two ROC curve. The default and minimum output are charts of the ROC curves and tables of the areas under the curves.

The /PLOT subcommand provides options for defining ROC curves or Precision-Recall curves.

Syntax rules

  • Minimum syntax: You always need a test result variable and one actual state variable with one of its values in the ROC ANALYSIS command line.
  • The test result variable must be numeric, but the state variable can be any type with any format.
  • Subcommands can be specified in any order.
  • When a subcommand is duplicated, only the last one is honored given that all duplicates have no syntax errors. A syntax warning is issued.
  • Within a subcommand, if two or more exclusive or contradictory keywords are given, the latter keywords override the earlier ones. A syntax warning is issued.
  • If a keyword is duplicated within a subcommand, it is silently ignored.

Limitations

Distributional assumptions
In the CRITERIA subcommand, the user can choose the nonparametric or parametric method to estimate the standard error of the area under the curve. Currently, the bi-negative exponential distribution is the only parametric option.
Optional output
In addition to an estimate of the area under the ROC curves, the user may request its standard error, a confidence interval, and a p value under the null hypothesis that the area under the curve equals 0.5. Tables of cutoff values and coordinates used to plot the ROC curves may also be displayed.