Modeling the recurrence probability by period
A problem with the model as it stands is that it ignores the information gathered at the first examination; that is, that many patients did not experience a recurrence in the first six months. A "better" model would model a binary response that records whether or not the event occurred during each interval. Fitting this model requires a reconstruction of the original dataset, which can be found in ulcer_recurrence_recoded.sav. This file contains two additional variables:
- Period, which records whether the case corresponds to the first examination period or the second.
- Result by period, which records whether there was a recurrence for the given patient during the given period.
Each original case (patient) contributes one case per interval in which it remains in the risk set. Thus, for example, patient 1 contributes two cases; one for the first examination period in which no recurrence occurred, and one for the second examination period, in which a recurrence was recorded. Patient 10, on the other hand, contributes a single case because a recurrence was recorded in the first period. Patients 16, 28, and 34 dropped out of the study after six months, and thus contribute only a single case to the new dataset.
- Add a Statistics File source node pointing to
ulcer_recurrence_recoded.sav in the Demos folder.
Figure 1. Sample stream to predict ulcer recurrence - On the Filter tab of the source node, filter out id, time, and
result.
Figure 2. Filter unwanted fields - On the Types tab of the source node, set the role for the result2 field
to Target and set its measurement level to Flag. All
other fields should have their role set to Input.
Figure 3. Setting field role - Add a Field Reorder node and specify period, duration,
treatment, and age as the order of inputs. Making period the first input (and
not including the intercept term in the model) will allow you to fit a full set of dummy variables
to capture the period effects.
Figure 4. Reordering fields so they are entered into the model as desired - On the GenLin node, click the Model tab.
Figure 5. Choosing model options - Select First (Lowest) as the reference category for the target. This indicates that the second category is the event of interest, and its effect on the model is in the interpretation of parameter estimates.
- Deselect Include intercept in model.
- Click the Expert tab and select
Expert to activate the expert modeling options.
Figure 6. Choosing expert options - Select Binomial as the distribution and Complementary log-log as the link function.
- Select Fixed value as the method for estimating the scale parameter and leave the default value of 1.0.
- Select Descending as the category order for factors. This indicates that the first category of each factor will be its reference category; the effect of this selection on the model is in the interpretation of parameter estimates.
- Run the stream to create the model nugget, which is added to the stream canvas, and also to the Models palette in the upper right corner. To view the model details, right-click the nugget and choose Edit or Browse.