Generalized linear mixed models

Generalized linear mixed models extend the linear model so that:

  • The target is linearly related to the factors and covariates via a specified link function.
  • The target can have a non-normal distribution.
  • The observations can be correlated.

Generalized linear mixed models cover a wide variety of models, from simple linear regression to complex multilevel models for non-normal longitudinal data.

Examples
The district school board can use a generalized linear mixed model to determine whether an experimental teaching method is effective at improving math scores. Students from the same classroom should be correlated since they are taught by the same teacher, and classrooms within the same school may also be correlated, so we can include random effects at school and class levels to account for different sources of variability.
Medical researchers can use a generalized linear mixed model to determine whether a new anticonvulsant drug can reduce a patient's rate of epileptic seizures. Repeated measurements from the same patient are typically positively correlated so a mixed model with some random effects should be appropriate. The target field, the number of seizures, takes positive integer values, so a generalized linear mixed model with a Poisson distribution and log link may be appropriate.
Executives at a cable provider of television, phone, and internet services can use a generalized linear mixed model to know more about potential customers. Since possible answers have nominal measurement levels, the company analyst uses a generalized logit mixed model with a random intercept to capture correlation between answers to the service usage questions across service types (tv, phone, internet) within a given survey responder's answers.

The Data Structure tab allows you to specify the structural relationships between records in your dataset when observations are correlated. If the records in the dataset represent independent observations, you do not need to specify anything on this tab.

Effects options

Subjects
The combination of values of the specified categorical fields should uniquely define subjects within the dataset. For example, a single Patient ID field should be sufficient to define subjects in a single hospital, but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals. In a repeated measures setting, multiple observations are recorded for each subject, so each subject may occupy multiple records in the dataset.

A subject is an observational unit that can be considered independent of other subjects. For example, the blood pressure readings from a patient in a medical study can be considered independent of the readings from other patients. Defining subjects becomes particularly important when there are repeated measurements per subject and you want to model the correlation between these observations. For example, you might expect that blood pressure readings from a single patient during consecutive visits to the doctor are correlated.

All of the fields specified as Subjects on the Variables dialog are used to define subjects for the residual covariance structure, and provide the list of possible fields for defining subjects for random-effects covariance structures on the Random Effect Block.

Repeated Measures
The fields specified here are used to identify repeated observations. For example, a single variable Week might identify the 10 weeks of observations in a medical study, or Month and Day might be used together to identify daily observations over the course of a year.

Covariance options

Define covariance groups by
The categorical fields specified here define independent sets of repeated effects covariance parameters; one for each category defined by the cross-classification of the grouping fields. All subjects have the same covariance type; subjects within the same covariance grouping will have the same values for the parameters.
Repeated covariance type
This specifies the covariance structure for the residuals. Different covariance options are available based on the selected Repeated covariance type. The available structures are:
  • First-order autoregressive (AR1)
  • Direct product AR1 (UN_AR1)
  • Direct product unstructured (UN_UN)
  • Direct product compound symmetry (UN_CS)
  • Heterogeneous compound symmetry (CSH)
  • Heterogeneous autoregressive (ARH1)
  • Autoregressive moving average (1,1) (ARMA11)
  • Compound symmetry
  • Diagonal
  • Scaled identity
  • Toeplitz
  • Unstructured
  • Variance components
  • Spatial: Power
  • Spatial: Exponential
  • Spatial: Gaussian
  • Spatial: Linear
  • Spatial: Linear-log
  • Spatial: Spherical
Kronecker Measures
Select variables that specify the subject structure for Knonecker covariance measurements and determine how the measurement errors are correlated. The field is available only when one of the following Repeated Covariance Type is selected:
  • Direct product AR1 (UN_AR1)
  • Direct product unstructured (UN_UN)
  • Direct product compound symmetry (UN_CS)
Spatial covariance coordinates
The variables in this list specify the coordinates of the repeated observations when one of the spatial covariance types is selected for the repeated covariance type.

See the topic Covariance Structures for more information.

Pseudo R2 measures

Pseudo-R2 measures and the intra-class correlation coefficient are included in GLMM output (when appropriate). The pseudo-R2 measures are based entirely on the final estimates and are produced after estimation has been completed. The coefficient of determination R2 is a commonly reported statistic, because it represents the proportion of variance explained by a linear model. The intra-class correlation coefficient (ICC) is a related statistic that quantifies the proportion of variance explained by a grouping (random) factor in multilevel/ hierarchical data.

This procedure pastes GENLINMIXED command syntax.