Can SPSS Statistics perform conditional logistic regression models?
Resolving The Problem
Conditional logistic regression models are designed for situations in which one or more "cases," who show the response of interest, are matched with one or more "controls," who do not show the response. The most common situation involves 1-1 (1 case with 1 control) matching, though 1-N (1 case with N controls) and M-N (M cases with N controls) matching are also seen. (In order to avoid confusion between the term "case" as used here and a physical case in an SPSS Statistics data file, I'll use "case" to refer to the physical case (or cases) showing the response, and use the term "physical case" for a physical case in the data file when such reference is required.)
In many software packages, the standard binary logistic regression procedures can be used to fit 1-1 matching situations by suppressing the intercept, using a constant dependent variable with a value of 1 for every physical case, and defining a physical case by taking the difference between the case and control values on the predictor variables. This will not work with the LOGISTIC REGRESSION procedure because it will only estimate a model when the dependent variable has exactly two values. However, this can be done in the NOMREG procedure, which is accessed in the menus via Analyze>Regression>Multinomial Logistic. See the example on matched case-control studies in the chapter on multinomial logistic regression in the SPSS Advanced Statistical Procedures Companion, by Marija Norusis, or the Case Study in the Help (Help>Case Studies>Regression Option>Multinomial Logistic Regression>Using Multinomial Logistic Regression to Analyze a 1-1 Matched Case-Control Study) for more details on how to use NOMREG for matched 1-1 case control studies.
Fitting models with multiple controls cannot be done using NOMREG. It is possible to use the COXREG procedure (or the CSCOXREG procedure in the Complex Samples module) to fit such models. This approach may also be easier with 1-1 matched data as well, as it does not require you to compute differences between predictor variable values for cases and controls.
Suppose we have K pairs of matched cases and controls (in a 1-1 matching). The total number of physical cases in the data file will then be 2K. In order to use COXREG to do the conditional logistic regression, we need to do the following:
Code or recode the dependent variable so that it has a value of 1 for the cases and 2 for the controls. We'll call this variable DV. (Technically, the only requirement is that the case in each set has a positive value that is smaller than that for its control.)
Create a copy of this variable with another name. We'll call this variable STATUS. (Technically, all that's needed here is for all cases to share some property not shared by the controls.)
If it does not already exist, create a variable that denotes each pair (this will have K different values in our example). We'll call this variable PAIR.
In the menus, click on Statistics>Survival>Cox Regression. Move DV into the Time slot. Move STATUS into the Status slot, click on the Define Event button, and define the value 1 as the single value denoting an event. Move the PAIR variable into the Strata slot. Then specify all desired predictors, choose any desired variable selection methods, and define any appropriate covariates as categorical, just as you would in LOGISTIC REGRESSION. The Variables in the Equation output for COXREG looks exactly like that in LOGISTIC REGRESSION (without an intercept). You can get confidence intervals for the Exp(B) values, or odds ratios, in the Options dialog.
In command syntax, the basic structure would be:
COXREG dv WITH covlist
The reason that this method works properly is that the conditional partial likelihood maximized by the COXREG procedure is the same one that results from the conditional logistic regression situation. The likelihood is a function of the probabilities of those physical cases that are cases being the ones to respond as opposed to those that are controls within the matched pairs. This can theoretically be extended to the 1-N and M-N matching cases, where pairs are larger sets, but COXREG should generally be used only for the 1-1 and 1-N cases.
If there are multiple controls for a each case, you can easily extend the COXREG method by simply having more than one control for each set. In this case, the variable name for the stratification variable might be called SET or something more accurately descriptive than PAIR, but this isn't necessary. Thus, fitting the conditional logistic model for the 1-N matched situation is easy. There can also be different numbers of controls for different cases.
Using the COXREG method for the M-N matching situation (where M>1) is not recommended. The reason for this is that the COXREG procedure offers only Breslow's approximate method for dealing with tied event times within a stratum. This approximation is good only when the number of ties at each event time is small relative to the number of physical cases at risk at that event time. Since in this application we are defining strata as sets, we will have M tied event times out of M+N total at risk physical cases, which will generally be a substantial proportion. The estimates one gets from COXREG in this situation are thus likely to be inadequate approximations to the true maximum likelihood values based on the discrete time likelihood.
16 April 2020