REPEATED Subcommand (GENLIN command)

The REPEATED subcommand specifies the correlation structure used by generalized estimating equations to model correlations within subjects and controls statistical criteria in the nonlikelihood-based iterative fitting algorithm. If the REPEATED subcommand is not specified, then the GENLIN procedure fits a generalized linear model assuming independence.

Initial Values and Generalized Estimating Equations

Generalized estimating equations require initial values for the parameter estimates in the linear model. Initial values are not needed for the working correlation matrix because matrix elements are based on the parameter estimates.

The GENLIN procedure automatically supplies initial values to the generalized estimating equations algorithm. The default initial values are the final parameter estimates from the ordinary generalized linear model, assuming independence, that is fit based on the MODEL and CRITERIA subcommand specifications.

Recall that if the REPEATED subcommand is specified, then the CRITERIA subcommand SCALE keyword is directly applicable only to the initial generalized linear model. For the linear model part of the generalized estimating equations, the scale parameter is treated as follows.

  • If SCALE = MLE, then the scale parameter estimate from the initial generalized linear model is passed to the generalized estimating equations, where it is updated by the Pearson chi-square divided by its degrees of freedom. Pearson chi-square is used because generalized estimating equations do not have the concept of likelihood, and hence the scale estimate cannot be updated by methods related to maximum likelihood estimation.
  • If SCALE = DEVIANCE or PEARSON, then the scale parameter estimate from the initial generalized linear model is passed to the generalized estimating equations, where it is treated as a fixed number.
  • If SCALE is specified with a fixed number, then the scale parameter is also held fixed in the generalized estimating equations.

It is possible to bypass fitting the generalized linear model and directly input initial values to the generalized estimating equations algorithm. To do this, specify the linear model as usual on the MODEL subcommand. Then, on the CRITERIA subcommand, specify initial values for the linear model on the INITIAL keyword and set MAXITERATIONS = 0.

For example, suppose factor A has three levels. The INITIAL keyword supplies initial value 1 for the intercept, 1.5 for the first level of factor A, 2.5 for the second level, 0 for the last level, and 3 for the covariate X. Because MAXITERATIONS = 0, no iterations are performed for the generalized linear model and the specified initial values are passed directly to the generalized estimating equations algorithm.

GENLIN depvar BY a WITH x
  /MODEL a x
    DISTRIBUTION = BINOMIAL
    LINK = LOGIT
    INITIAL = 1 1.5 2.5 0 3
    MAXITERATIONS = 0
  /REPEATED
    SUBJECT=idvar.

It is also possible to use a maximum likelihood estimate of the scale parameter as the initial value and to fix the scale parameter at this initial value for the generalized estimating equations. That is, we can override the default updating by the Pearson chi-square divided by its degrees of freedom. To do this, first fit a generalized linear model, estimating the scale parameter via maximum likelihood, and save the final parameter estimates in an external file (using the OUTFILE subcommand CORB or COVB option). Next, open this external file and copy the scale parameter estimate in full precision. Finally, fit the generalized estimating equations, using the final parameter estimates from the generalized linear model as the initial values, with MAXITERATIONS = 0 on the CRITERIA subcommand and SCALE fixed at the scale parameter estimate on the CRITERIA subcommand. The following example syntax assumes that the maximum likelihood estimate of the scale parameter is 0.1234567890123456.

GENLIN depvar BY a WITH x
  /MODEL a x
     DISTRIBUTION = NORMAL
     LINK = LOG
  /CRITERIA
     SCALE = MLE
  /OUTFILE
     COVB = '/work/estimates.sav'.

GENLIN depvar BY a WITH x
  /MODEL a x
     DISTRIBUTION = NORMAL
     LINK = LOG
  /CRITERIA
     INITIAL = '/work/estimates.sav'
     MAXITERATIONS = 0
     SCALE = 0.1234567890123456
  /REPEATED
     SUBJECT=idvar.

When the negative binomial distribution is used (/MODEL DISTRIBUTION = NEGBIN), the distribution’s ancillary parameter is treated as follows:

  1. If the ancillary parameter is specified as a number, then it is fixed at that number for the initial generalized linear model and the generalized estimating equations.
  2. If the ancillary parameter is estimated using maximum likelihood in the initial generalized linear model, then the estimate is passed to the generalized estimating equations, where it is treated as a fixed number.
  3. If NEGBIN(MLE) is specified but the initial generalized linear model is bypassed and initial values are directly input to the generalized estimating equations, then the initial value of the ancillary parameter is passed to the generalized estimating equations, where it is treated as a fixed number.

SUBJECT Keyword

The SUBJECT keyword identifies subjects in the active dataset. Complete independence is assumed across subjects, but responses within subjects are assumed to be correlated.

  • Specify a single variable or a list of variables connected by asterisks (*) or the keyword BY.
  • Variables may be numeric or string variables.
  • The number of subjects equals the number of distinct combinations of values of the variables.
  • If the active dataset is sorted by the subject variables, then all records with equal values on the subject variables are contiguous and define the measurements for one subject.
  • In contrast, if the active dataset is not sorted, then the GENLIN procedure reads the data record by record. Each block of equal values on the subject variables defines a new subject. Please be aware that this approach may produce invalid results if all records for a subject are not contiguous in the active dataset.
  • By default, the GENLIN procedure automatically sorts the active dataset by subject and any within-subject variables before performing analyses. See the SORT keyword below for more information.
  • All specified variables must be unique.
  • The dependent, events, trials, and WITHINSUBJECT variables may not be specified as SUBJECT variables.
  • The SUBJECT keyword is required if the REPEATED subcommand is used.
  • Cases with missing values for any of the subject variables are not used in the analysis.

WITHINSUBJECT Keyword

The WITHINSUBJECT keyword gives the within-subject or time effect. This effect defines the ordering of measurements within subjects. If some measurements do not appear in the data for some subjects, then the existing measurements are ordered and the omitted measurements are treated as missing values. If WITHINSUBJECT is not used, then measurements may be improperly ordered and missing values assumed for the last measurements within subjects.

  • Specify a single variable or a list of variables connected by asterisks (*) or the keyword BY.
  • Variables may be numeric or string variables.
  • The WITHINSUBJECT keyword is honored only if the default SORT = YES is in effect. The number of measurements within a subject equals the number of distinct combinations of values of the WITHINSUBJECT variables.
  • The WITHINSUBJECT keyword is ignored and a warning is issued if SORT = NO is in effect. In this case, the GENLIN procedure reads the records for a subject in the order given in the active dataset.
  • By default, the GENLIN procedure automatically sorts the active dataset by subject and any within-subject variables before performing analyses. See the SORT keyword below for more information.
  • All specified variables must be unique.
  • The dependent, events, trials, and SUBJECT variables may not be specified as WITHINSUBJECT variables.
  • The WITHINSUBJECT keyword is not required if the data are properly ordered within each subject.
  • Cases with missing values for any of the within-subject variables are not used in the analysis.

SORT Keyword

The SORT keyword indicates whether to sort cases in the working dataset by the subject effect and the within-subject effect.

YES. Sort cases by subject and any within-subject variables. The GENLIN procedure sorts the active dataset before performing analyses. The subject and any within-subject variables are sorted based on the ascending sort order of their data values. If any of the variables are strings, then their sort order is locale-dependent. This is the default. This sort is temporary—it is in effect only for the duration of the GENLIN procedure.

NO. Do not sort cases by subject and any within-subject variables. If SORT = NO is specified, then the GENLIN procedure does not sort the active dataset before performing analyses.

CORRTYPE Keyword

The CORRTYPE keyword specifies the working correlation matrix structure.

INDEPENDENT. Independent working correlation matrix. This is the default working correlation matrix structure.

AR(1). AR(1) working correlation matrix.

EXCHANGEABLE. Exchangeable working correlation matrix.

FIXED(list). Fixed working correlation matrix. Specify a list of numbers, with each number separated by a space character or a comma.

The list of numbers must define a valid working correlation matrix. The number of rows and the number of columns must equal the dimension of the working correlation matrix. This dimension depends on the subject effect, the within-subject effect, whether the active dataset is sorted, and the data. The simplest way to determine the working correlation matrix dimension is to run the GENLIN procedure first for the model using the default working correlation matrix structure (instead of the FIXED structure) and examine the PRINT MODELINFO output for the working correlation matrix dimension. Then, rerun the procedure with the FIXED specification.

Specify only the lower triangular portion of the matrix. Matrix elements must be specified row-by-row. All elements must be between 0 and 1 inclusive.

For example, if there are three measurements per subject, then the following specification defines a 3 * 3 working correlation matrix.

CORRTYPE = FIXED(0.84 0.65 0.75)

1.00 0.84 0.65
0.84 1.00 0.75
0.65 0.75 1.00

There is no default value for the fixed working correlation matrix.

MDEPENDENT(integer). m-dependent working correlation matrix. Specify the value of m in parentheses as an integer greater than or equal to 0. The specified m should be less than the number of row or column levels in the working correlation matrix. If the specified m is greater than the dimension of the working correlation matrix, then m is set equal to the number of row or column levels minus 1. For example, if the dimension of the working correlation matrix is 4, then m should be 3 or less. In this case, if you specify m > 3, then m will be set equal to 3. There is no default value.

UNSTRUCTURED. Unstructured working correlation matrix.

ADJUSTCORR Keyword

The ADJUSTCORR keyword indicates whether to adjust the working correlation matrix estimator by the number of nonredundant parameters.

YES. Adjust the working correlation matrix estimator. This is the default.

NO. Compute the working correlation matrix estimator without the adjustment.

COVB Keyword

The COVB keyword specifies whether to use the robust or the model-based estimator of the parameter estimate covariance matrix for generalized estimating equations.

ROBUST. Robust estimator of the parameter estimate covariance matrix. This is the default.

MODEL. Model-based estimator of the parameter estimate covariance matrix.

HCONVERGE Keyword

The HCONVERGE keyword specifies the Hessian convergence criterion for the generalized estimating equations algorithm. For generalized estimating equations, the Hessian convergence criterion is always absolute.

  • Specify a number greater than or equal to 0. If HCONVERGE = 0, the Hessian convergence criterion will be checked value 1E-4 after any specified convergence criteria have been satisfied. If it is not met, a warning is displayed. The default value is 0.
  • At least one of the REPEATED subcommand keywords HCONVERGE, PCONVERGE must specify a nonzero number.

MAXITERATIONS Keyword

The MAXITERATIONS keyword specifies the maximum number of iterations for the generalized estimating equations algorithm.

  • Specify an integer greater than or equal to 0. The default value is 100.

PCONVERGE Keyword

The PCONVERGE keyword specifies the parameter convergence criterion for the generalized estimating equations algorithm.

  • Specify a number greater than or equal to 0, and the ABSOLUTE or RELATIVE keyword in parentheses to define the type of convergence. The number and keyword may be separated by a space character or a comma. The parameter convergence criterion is not used if the number is 0. The default value is 1E-6 (ABSOLUTE).
  • At least one of the REPEATED subcommand keywords HCONVERGE, PCONVERGE must specify a nonzero number.

UPDATECORR Keyword

The UPDATECORR keyword specifies the number of iterations between updates of the working correlation matrix. Elements in the working correlation matrix are based on the parameter estimates, which are updated in each iteration of the algorithm. The UPDATECORR keyword specifies the iteration interval at which to update working correlation matrix elements. Specifying a value greater than 1 may reduce processing time.

  • Specify an integer greater than 0.
  • The working correlation matrix is not updated at all if the value is 0. In this case, the initial working correlation matrix is used throughout the estimation process.
  • The default value is 1. By default, the working correlation matrix is updated after every iteration, beginning with the first.
  • The UPDATECORR value must be less than or equal to the REPEATED MAXITERATIONS value.