Overview (GENLIN command)

The GENLIN procedure fits the generalized linear model and generalized estimating equations.

The generalized linear model includes one dependent variable and usually one or more independent effects. Subjects are assumed to be independent. The generalized linear model covers not only widely used statistical models such as linear regression for normally distributed responses, logistic models for binary data, and loglinear models for count data, but also many other statistical models via its very general model formulation. However, the independence assumption prohibits the model from being applied to correlated data.

Generalized estimating equations extend the generalized linear model to correlated longitudinal data and clustered data. More particularly, generalized estimating equations model correlations within subjects. Data across subjects are still assumed independent.

Options

Independence Assumption. The GENLIN procedure fits either the generalized linear model assuming independence across subjects, or generalized estimating equations assuming correlated measurements within subjects but independence across subjects.

Events/Trials Specification for Binomial Distribution. The typical dependent variable specification will be a single variable, but for the binomial distribution the dependent variable may be specified using a number-of-events variable and a number-of-trials variable. Alternatively, if the number of trials is the same across all subjects, then trials may be specified using a fixed number instead of a variable.

Probability Distribution of Dependent Variable. The probability distribution of the dependent variable may be specified as normal, binomial, gamma, inverse Gaussian, multinomial, negative binomial, Poisson, or Tweedie.

Link Function. GENLIN offers the following link functions: Identity, complementary log-log, log, log-complement, logit, negative binomial, negative log-log, odds power, power, and probit. For the multinomial distribution, the following link functions are available: cumulative Cauchit, cumulative complementary log-log, cumulative logit, cumulative negative log-log, and cumulative probit.

Correlation Structure for Generalized Estimating Equations. When measurements within subjects are assumed correlated, the correlation structure may be specified as independent, AR(1), exchangeable, fixed, m-dependent, or unstructured.

Estimated Marginal Means. Estimated marginal means may be computed for one or more crossed factors and may be based on either the response or the linear predictor.

Basic Specification

  • The basic specification is a MODEL subcommand with one or more model effects and a variable list identifying the dependent variable, the factors (if any), and the covariates (if any).
  • If the MODEL subcommand is not specified, or is specified with no model effects, then the default model is the intercept-only model using the normal distribution and identity link.
  • If the REPEATED subcommand is not specified, then subjects are assumed to be independent.
  • If the REPEATED subcommand is specified, then generalized estimating equations, which model correlations within subjects, are fit. By default, generalized estimating equations use the independent correlation structure.
  • The basic specification displays default output, including a case processing summary table, variable information, model information, goodness of fit statistics, model summary statistics, and parameter estimates and related statistics.

Syntax Rules

  • The dependent variable, or an events/trials specification is required. All other variables and subcommands are optional.
  • It is invalid to specify a dependent variable and an events/trials specification in the same GENLIN command.
  • Multiple EMMEANS subcommands may be specified; each is treated independently. All other subcommands may be specified only once.
  • The EMMEANS subcommand may be specified without options. All other subcommands must be specified with options.
  • Each keyword may be specified only once within a subcommand.
  • The command name, all subcommand names, and all keywords must be spelled in full.
  • Subcommands may be specified in any order.
  • Within subcommands, keyword settings may be specified in any order.
  • The following variables, if specified, must be numeric: events and trials variables, covariates, OFFSET variable, and SCALEWEIGHT variable. The following, if specified, may be numeric or string variables: the dependent variable, factors, SUBJECT variables, and WITHINSUBJECT variables.
  • All variables must be unique within and across the following variables or variable lists: the dependent variable, events variable, trials variable, factor list, covariate list, OFFSET variable, and SCALEWEIGHT variable.
  • The dependent variable, events variable, trials variable, and covariates may not be specified as SUBJECT or WITHINSUBJECT variables.
  • SUBJECT variables may not be specified as WITHINSUBJECT variables.
  • The minimum syntax is a dependent variable. This specification fits an intercept-only model.

Case Frequency

  • If an WEIGHT variable is specified, then its values are used as frequency weights by the GENLIN procedure.
  • Weight values are rounded to the nearest whole numbers before use. For example, 0.5 is rounded to 1, and 2.4 is rounded to 2.
  • The WEIGHT variable may not be specified on any subcommand in the GENLIN procedure.
  • Cases with missing weights or weights less than 0.5 are not used in the analyses.

Limitations

  • The TO keyword is not supported on variable lists in this procedure.