REPEATED Subcommand (GENLIN command)
The REPEATED subcommand specifies the correlation
structure used by generalized estimating equations to model correlations
within subjects and controls statistical criteria in the nonlikelihood-based
iterative fitting algorithm. If the REPEATED subcommand
is not specified, then the GENLIN procedure fits
a generalized linear model assuming independence.
Initial Values and Generalized Estimating Equations
Generalized estimating equations require initial values for the parameter estimates in the linear model. Initial values are not needed for the working correlation matrix because matrix elements are based on the parameter estimates.
The GENLIN procedure automatically supplies initial
values to the generalized estimating equations algorithm. The default
initial values are the final parameter estimates from the ordinary
generalized linear model, assuming independence, that is fit based
on the MODEL and CRITERIA subcommand
specifications.
Recall that if the REPEATED subcommand is specified,
then the CRITERIA subcommand SCALE keyword
is directly applicable only to the initial generalized linear model.
For the linear model part of the generalized estimating equations,
the scale parameter is treated as follows.
- If
SCALE = MLE, then the scale parameter estimate from the initial generalized linear model is passed to the generalized estimating equations, where it is updated by the Pearson chi-square divided by its degrees of freedom. Pearson chi-square is used because generalized estimating equations do not have the concept of likelihood, and hence the scale estimate cannot be updated by methods related to maximum likelihood estimation. - If
SCALE = DEVIANCEorPEARSON, then the scale parameter estimate from the initial generalized linear model is passed to the generalized estimating equations, where it is treated as a fixed number. - If
SCALEis specified with a fixed number, then the scale parameter is also held fixed in the generalized estimating equations.
It is possible to bypass fitting the generalized linear model and
directly input initial values to the generalized estimating equations
algorithm. To do this, specify the linear model as usual on the MODEL subcommand.
Then, on the CRITERIA subcommand, specify initial
values for the linear model on the INITIAL keyword
and set MAXITERATIONS = 0.
For example, suppose factor A has three levels. The INITIAL keyword
supplies initial value 1 for the intercept, 1.5 for the first level
of factor A, 2.5 for the second level, 0 for the last level, and 3
for the covariate X. Because MAXITERATIONS = 0, no
iterations are performed for the generalized linear model and the
specified initial values are passed directly to the generalized estimating
equations algorithm.
GENLIN depvar BY a WITH x
/MODEL a x
DISTRIBUTION = BINOMIAL
LINK = LOGIT
INITIAL = 1 1.5 2.5 0 3
MAXITERATIONS = 0
/REPEATED
SUBJECT=idvar.
It is also possible to use a maximum likelihood estimate of the
scale parameter as the initial value and to fix the scale parameter
at this initial value for the generalized estimating equations. That
is, we can override the default updating by the Pearson chi-square
divided by its degrees of freedom. To do this, first fit a generalized
linear model, estimating the scale parameter via maximum likelihood,
and save the final parameter estimates in an external file (using
the OUTFILE subcommand CORB or COVB option).
Next, open this external file and copy the scale parameter estimate
in full precision. Finally, fit the generalized estimating equations,
using the final parameter estimates from the generalized linear model
as the initial values, with MAXITERATIONS = 0 on
the CRITERIA subcommand and SCALE fixed
at the scale parameter estimate on the CRITERIA subcommand.
The following example syntax assumes that the maximum likelihood estimate
of the scale parameter is 0.1234567890123456.
GENLIN depvar BY a WITH x
/MODEL a x
DISTRIBUTION = NORMAL
LINK = LOG
/CRITERIA
SCALE = MLE
/OUTFILE
COVB = '/work/estimates.sav'.
GENLIN depvar BY a WITH x
/MODEL a x
DISTRIBUTION = NORMAL
LINK = LOG
/CRITERIA
INITIAL = '/work/estimates.sav'
MAXITERATIONS = 0
SCALE = 0.1234567890123456
/REPEATED
SUBJECT=idvar.
When the negative binomial distribution is used (/MODEL
DISTRIBUTION = NEGBIN), the distribution’s ancillary parameter
is treated as follows:
- If the ancillary parameter is specified as a number, then it is fixed at that number for the initial generalized linear model and the generalized estimating equations.
- If the ancillary parameter is estimated using maximum likelihood in the initial generalized linear model, then the estimate is passed to the generalized estimating equations, where it is treated as a fixed number.
- If
NEGBIN(MLE)is specified but the initial generalized linear model is bypassed and initial values are directly input to the generalized estimating equations, then the initial value of the ancillary parameter is passed to the generalized estimating equations, where it is treated as a fixed number.
SUBJECT Keyword
The SUBJECT keyword identifies subjects in the
active dataset. Complete independence is assumed across subjects,
but responses within subjects are assumed to be correlated.
- Specify a single variable or a list of variables connected by
asterisks (*) or the keyword
BY. - Variables may be numeric or string variables.
- The number of subjects equals the number of distinct combinations of values of the variables.
- If the active dataset is sorted by the subject variables, then all records with equal values on the subject variables are contiguous and define the measurements for one subject.
- In contrast, if the active dataset is not sorted, then the
GENLINprocedure reads the data record by record. Each block of equal values on the subject variables defines a new subject. Please be aware that this approach may produce invalid results if all records for a subject are not contiguous in the active dataset. - By default, the
GENLINprocedure automatically sorts the active dataset by subject and any within-subject variables before performing analyses. See theSORTkeyword below for more information. - All specified variables must be unique.
- The dependent, events, trials, and
WITHINSUBJECTvariables may not be specified asSUBJECTvariables. - The
SUBJECTkeyword is required if theREPEATEDsubcommand is used. - Cases with missing values for any of the subject variables are not used in the analysis.
WITHINSUBJECT Keyword
The WITHINSUBJECT keyword gives the within-subject
or time effect. This effect defines the ordering of measurements within
subjects. If some measurements do not appear in the data for some
subjects, then the existing measurements are ordered and the omitted
measurements are treated as missing values. If WITHINSUBJECT is
not used, then measurements may be improperly ordered and missing
values assumed for the last measurements within subjects.
- Specify a single variable or a list of variables connected by
asterisks (*) or the keyword
BY. - Variables may be numeric or string variables.
- The
WITHINSUBJECTkeyword is honored only if the defaultSORT = YESis in effect. The number of measurements within a subject equals the number of distinct combinations of values of theWITHINSUBJECTvariables. - The
WITHINSUBJECTkeyword is ignored and a warning is issued ifSORT = NOis in effect. In this case, theGENLINprocedure reads the records for a subject in the order given in the active dataset. - By default, the
GENLINprocedure automatically sorts the active dataset by subject and any within-subject variables before performing analyses. See theSORTkeyword below for more information. - All specified variables must be unique.
- The dependent, events, trials, and
SUBJECTvariables may not be specified asWITHINSUBJECTvariables. - The
WITHINSUBJECTkeyword is not required if the data are properly ordered within each subject. - Cases with missing values for any of the within-subject variables are not used in the analysis.
SORT Keyword
The SORT keyword indicates whether to sort cases
in the working dataset by the subject effect and the within-subject
effect.
YES. Sort cases by subject and any within-subject variables. The GENLIN procedure
sorts the active dataset before performing analyses. The subject and
any within-subject variables are sorted based on the ascending sort
order of their data values. If any of the variables are strings, then
their sort order is locale-dependent. This is the default. This sort
is temporary—it is in effect only for the duration of the GENLIN procedure.
NO. Do not sort cases by subject and any within-subject
variables. If SORT = NO is specified, then the GENLIN procedure
does not sort the active dataset before performing analyses.
CORRTYPE Keyword
The CORRTYPE keyword specifies the working correlation
matrix structure.
INDEPENDENT. Independent working correlation matrix. This is the default working correlation matrix structure.
AR(1). AR(1) working correlation matrix.
EXCHANGEABLE. Exchangeable working correlation matrix.
FIXED(list). Fixed working correlation matrix. Specify a list of numbers, with each number separated by a space character or a comma.
The list of numbers must define a valid working correlation matrix.
The number of rows and the number of columns must equal the dimension
of the working correlation matrix. This dimension depends on the subject
effect, the within-subject effect, whether the active dataset is sorted,
and the data. The simplest way to determine the working correlation
matrix dimension is to run the GENLIN procedure first
for the model using the default working correlation matrix structure
(instead of the FIXED structure) and examine the PRINT
MODELINFO output for the working correlation matrix dimension.
Then, rerun the procedure with the FIXED specification.
Specify only the lower triangular portion of the matrix. Matrix elements must be specified row-by-row. All elements must be between 0 and 1 inclusive.
For example, if there are three measurements per subject, then the following specification defines a 3 * 3 working correlation matrix.
CORRTYPE = FIXED(0.84 0.65 0.75)
1.00 0.84 0.65
0.84 1.00 0.75
0.65 0.75 1.00
There is no default value for the fixed working correlation matrix.
MDEPENDENT(integer). m-dependent working correlation matrix. Specify the value of m in parentheses as an integer greater than or equal to 0. The specified m should be less than the number of row or column levels in the working correlation matrix. If the specified m is greater than the dimension of the working correlation matrix, then m is set equal to the number of row or column levels minus 1. For example, if the dimension of the working correlation matrix is 4, then m should be 3 or less. In this case, if you specify m > 3, then m will be set equal to 3. There is no default value.
UNSTRUCTURED. Unstructured working correlation matrix.
ADJUSTCORR Keyword
The ADJUSTCORR keyword indicates whether to adjust
the working correlation matrix estimator by the number of nonredundant
parameters.
YES. Adjust the working correlation matrix estimator. This is the default.
NO. Compute the working correlation matrix estimator without the adjustment.
COVB Keyword
The COVB keyword specifies whether to use the
robust or the model-based estimator of the parameter estimate covariance
matrix for generalized estimating equations.
ROBUST. Robust estimator of the parameter estimate covariance matrix. This is the default.
MODEL. Model-based estimator of the parameter estimate covariance matrix.
HCONVERGE Keyword
The HCONVERGE keyword specifies the Hessian convergence
criterion for the generalized estimating equations algorithm. For
generalized estimating equations, the Hessian convergence criterion
is always absolute.
- Specify a number greater than or equal to 0. If
HCONVERGE = 0, the Hessian convergence criterion will be checked value1E-4after any specified convergence criteria have been satisfied. If it is not met, a warning is displayed. The default value is 0. - At least one of the
REPEATEDsubcommand keywordsHCONVERGE,PCONVERGEmust specify a nonzero number.
MAXITERATIONS Keyword
The MAXITERATIONS keyword specifies the maximum
number of iterations for the generalized estimating equations algorithm.
- Specify an integer greater than or equal to 0. The default value is 100.
PCONVERGE Keyword
The PCONVERGE keyword specifies the parameter
convergence criterion for the generalized estimating equations algorithm.
- Specify a number greater than or equal to 0, and the
ABSOLUTEorRELATIVEkeyword in parentheses to define the type of convergence. The number and keyword may be separated by a space character or a comma. The parameter convergence criterion is not used if the number is 0. The default value is1E-6 (ABSOLUTE). - At least one of the
REPEATEDsubcommand keywordsHCONVERGE,PCONVERGEmust specify a nonzero number.
UPDATECORR Keyword
The UPDATECORR keyword specifies the number of
iterations between updates of the working correlation matrix. Elements
in the working correlation matrix are based on the parameter estimates,
which are updated in each iteration of the algorithm. The UPDATECORR keyword
specifies the iteration interval at which to update working correlation
matrix elements. Specifying a value greater than 1 may reduce processing
time.
- Specify an integer greater than 0.
- The working correlation matrix is not updated at all if the value is 0. In this case, the initial working correlation matrix is used throughout the estimation process.
- The default value is 1. By default, the working correlation matrix is updated after every iteration, beginning with the first.
- The
UPDATECORRvalue must be less than or equal to theREPEATED MAXITERATIONSvalue.