Data File Setup for Case-Control

Figure 1. Data editor with file ready for case-control analysis
Data editor with file ready for case-control analysis

Some attention should be given to the setup of the data file for matched case-control studies. Analysis of these studies is based on the difference between cases and controls.

  • Each row corresponds to a matched pair.
  • The dependent variable, Claim, takes only one value. This represents the difference between cases and controls with respect to whether a claim was filed. Since all cases filed claims and all controls did not, the value of this variable is constant.
  • The variables on which the cases and controls are matched, Age in years and Gender, are included for reference.
  • The value of each variable of interest is recorded for cases and controls.
    Figure 2. Data editor with file ready for case-control analysis
    Data editor with file ready for case-control analysis
  • The arithmetic differences between cases and controls are constructed from the variables of interest. These difference variables will be used as covariates in the model.

Categorical variables are also differenced, so you need to properly encode them as contrasts. All of the categorical variables in this data file use an indicator (0,1) encoding, thus the differenced variable can take values of 0, 1, or -1.

Next