Complex Samples Cox Regression
The Complex Samples Cox Regression procedure performs survival analysis for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation.
Examples. A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction. One of the measures of recidivism is the time until second arrest for offenders. The agency would like to model time to rearrest using Cox Regression but are worried the proportional hazards assumption is invalid across age categories.
Medical researchers are investigating survival times for patients exiting a rehabilitation program post-ischemic stroke. There is the potential for multiple cases per subject, since patient histories change as the occurrence of significant nondeath events are noted and the times of these events recorded. The sample is also left-truncated in the sense that the observed survival times are "inflated" by the length of rehabilitation, because while the onset of risk starts at the time of the ischemic stroke, only patients who survive past the rehabilitation program are in the sample.
Complex Samples Cox Regression Data Considerations
Survival Time. The procedure applies Cox regression to analysis of survival times—that is, the length of time before the occurrence of an event. There are two ways to specify the survival time, depending upon the start time of the interval:
- Time=0. Commonly, you will have complete information on the start of the interval for each subject and will simply have a variable containing end times (or create a single variable with end times from Date & Time variables; see below).
- Varies by subject. This is appropriate when you have left-truncation, also called delayed entry; for example, if you are analyzing survival times for patients exiting a rehabilitation program post-stroke, you might consider that their onset of risk starts at the time of the stroke. However, if your sample only includes patients who have survived the rehabilitation program, then your sample is left-truncated in the sense that the observed survival times are "inflated" by the length of rehabilitation. You can account for this by specifying the time at which they exited rehabilitation as the time of entry into the study.
Date & Time Variables. Date & Time variables cannot be used to directly define the start and end of the interval; if you have Date & Time variables, you should use them to create variables containing survival times. If there is no left-truncation, simply create a variable containing end times based upon the difference between the date of entry into the study and the observation date. If there is left-truncation, create a variable containing start times, based upon the difference between the date of the start of the study and the date of entry, and a variable containing end times, based upon the difference between the date of the start of the study and the date of observation. See the topic Date and Time Wizard for more information.
Event Status. You need a variable that records whether the subject experienced the event of interest within the interval. Subjects for whom the event has not occurred are right-censored.
Subject Identifier. You can easily incorporate piecewise-constant, time-dependent predictors by splitting the observations for a single subject across multiple cases. For example, if you are analyzing survival times for patients post-stroke, variables representing their medical history should be useful as predictors. Over time, they may experience major medical events that alter their medical history. The following table shows how to structure such a dataset: Patient ID is the subject identifier, End time defines the observed intervals, Status records major medical events, and Prior history of heart attack and Prior history of hemorrhaging are piecewise-constant, time-dependent predictors.
Patient ID | End time | Status | Prior history of heart attack | Prior history of hemorrhaging |
---|---|---|---|---|
1 | 5 | Heart Attack | No | No |
1 | 7 | Hemorrhaging | Yes | No |
1 | 8 | Died | Yes | Yes |
2 | 24 | Died | No | No |
3 | 8 | Heart Attack | No | No |
3 | 15 | Died | Yes | No |
Assumptions. The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box.
Typically, Cox regression models assume proportional hazards—that is, the ratio of hazards from one case to another should not vary over time. If this assumption does not hold, you may need to add time-dependent predictors to the model.
Kaplan-Meier Analysis. If you do not select any predictors (or do not enter any selected predictors into the model) and choose the product limit method for computing the baseline survival curve on the Options tab, the procedure performs a Kaplan-Meier type of survival analysis.
To Obtain Complex Samples Cox Regression
This feature requires the Complex Samples option.
- From the menus choose:
- Select a plan file. Optionally, select a custom joint probabilities file.
- Click Continue.
- Specify the survival time by selecting the entry and exit times from the study.
- Select an event status variable.
- Click Define Event and define at least one event value.
Optionally, you can select a subject identifier.
This procedure pastes CSCOXREG command syntax.