Generalized Estimating Equations

The Generalized Estimating Equations procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data.

Example. Public health officials can use generalized estimating equations to fit a repeated measures logistic regression to study effects of air pollution on children. Show me

Generalized Estimating Equations Data Considerations

Data. The response can be scale, counts, binary, or events-in-trials. Factors are assumed to be categorical. The covariates, scale weight, and offset are assumed to be scale. Variables used to define subjects or within-subject repeated measurements cannot be used to define the response but can serve other roles in the model.

Assumptions. Cases are assumed to be dependent within subjects and independent between subjects. The correlation matrix that represents the within-subject dependencies is estimated as part of the model.

Obtaining Generalized Estimating Equations

This feature requires the Advanced Statistics option.

From the menus choose:

Analyze > Generalized Linear Models > Generalized Estimating Equations...

  1. Select one or more subject variables (see below for further options).

    The combination of values of the specified variables should uniquely define subjects within the dataset. For example, a single Patient ID variable should be sufficient to define subjects in a single hospital, but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals. In a repeated measures setting, multiple observations are recorded for each subject, so each subject may occupy multiple cases in the dataset.

  2. On the Type of Model tab, specify a distribution and link function.
  3. On the Response tab, select a dependent variable.
  4. On the Predictors tab, select factors and covariates for use in predicting the dependent variable.
  5. On the Model tab, specify model effects using the selected factors and covariates.

Optionally, on the Repeated tab you can specify:

Within-subject variables. The combination of values of the within-subject variables defines the ordering of measurements within subjects; thus, the combination of within-subject and subject variables uniquely defines each measurement. For example, the combination of Period, Hospital ID, and Patient ID defines, for each case, a particular office visit for a particular patient within a particular hospital.

If the dataset is already sorted so that each subject's repeated measurements occur in a contiguous block of cases and in the proper order, it is not strictly necessary to specify a within-subjects variable, and you can deselect Sort cases by subject and within-subject variables and save the processing time required to perform the (temporary) sort. Generally, it's a good idea to make use of within-subject variables to ensure proper ordering of measurements.

Subject and within-subject variables cannot be used to define the response, but they can perform other functions in the model. For example, Hospital ID could be used as a factor in the model.

Covariance Matrix. The model-based estimator is the negative of the generalized inverse of the Hessian matrix. The robust estimator (also called the Huber/White/sandwich estimator) is a "corrected" model-based estimator that provides a consistent estimate of the covariance, even when the working correlation matrix is misspecified. This specification applies to the parameters in the linear model part of the generalized estimating equations, while the specification on the Estimation tab applies only to the initial generalized linear model.

Working Correlation Matrix. This correlation matrix represents the within-subject dependencies. Its size is determined by the number of measurements and thus the combination of values of within-subject variables. You can specify one of the following structures:

  • Independent. Repeated measurements are uncorrelated.
  • AR(1). Repeated measurements have a first-order autoregressive relationship. The correlation between any two elements is equal to rho for adjacent elements, rho2 for elements that are separated by a third, and so on. is constrained so that –1<<1.
  • Exchangeable. This structure has homogenous correlations between elements. It is also known as a compound symmetry structure.
  • M-dependent. Consecutive measurements have a common correlation coefficient, pairs of measurements separated by a third have a common correlation coefficient, and so on, through pairs of measurements separated by m−1 other measurements. For example, if you give students standardized tests each year from 3rd through 7th grade. This structure assumes that the 3rd and 4th, 4th and 5th, 5th and 6th, and 6th and 7th grade scores will have the same correlation; 3rd and 5th, 4th and 6th, and 5th and 7th will have the same correlation; 3rd and 6th and 4th and 7th will have the same correlation. Measurements with separaration greater than m are assumed to be uncorrelated. When choosing this structure, specify a value of m less than the order of the working correlation matrix.
  • Unstructured. This is a completely general correlation matrix.

By default, the procedure will adjust the correlation estimates by the number of nonredundant parameters. Removing this adjustment may be desirable if you want the estimates to be invariant to subject-level replication changes in the data.

  • Maximum iterations. The maximum number of iterations the generalized estimating equations algorithm will execute. Specify a non-negative integer. This specification applies to the parameters in the linear model part of the generalized estimating equations, while the specification on the Estimation tab applies only to the initial generalized linear model.
  • Update matrix. Elements in the working correlation matrix are estimated based on the parameter estimates, which are updated in each iteration of the algorithm. If the working correlation matrix is not updated at all, the initial working correlation matrix is used throughout the estimation process. If the matrix is updated, you can specify the iteration interval at which to update working correlation matrix elements. Specifying a value greater than 1 may reduce processing time.

Convergence criteria. These specifications apply to the parameters in the linear model part of the generalized estimating equations, while the specification on the Estimation tab applies only to the initial generalized linear model.

  • Parameter convergence. When selected, the algorithm stops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified, which must be positive.
  • Hessian convergence. Convergence is assumed if a statistic based on the Hessian is less than the value specified, which must be positive.

This procedure pastes GENLIN command syntax.