MODEL Subcommand (GENLIN command)
The MODEL subcommand is used to specify model effects, an offset or scale weight variable if either exists, the probability distribution, and the link function.
- The effect list includes all effects to be included in the model except for the intercept, which is specified using the INTERCEPT keyword. Effects must be separated by spaces or commas.
- If the multinomial distribution is used (DISTRIBUTION = MULTINOMIAL), then the intercept is inapplicable. The multinomial model always includes threshold parameters. The number of threshold parameters is one less than the number of dependent variable categories.
- To include a term for the main effect of a factor, enter the variable name of the factor.
- To include a term for an interaction between factors, use the keyword BY or an asterisk (*) to join the factors involved in the interaction. For example, A*B means a two-way interaction effect of A and B, where A and B are factors. A*A is not allowed because factors in an interaction effect must be distinct.
- To include a term for nesting one effect within another, use a pair of parentheses. For example, A(B) means that A is nested within B.
- Multiple nesting is allowed. For example, A(B(C)) means that B is nested within C, and A is nested within B(C). When more than one pair of parentheses is present, each pair of parentheses must be enclosed or nested within another pair of parentheses. Thus, A(B)(C) is not valid.
- Interactions between nested effects are not valid. For example, neither A(C)*B(C) nor A(C)*B(D) is valid.
- To include a covariate term in the design, enter the variable name of the covariate.
- Covariates can be connected, but not nested, through the * operator to form another covariate effect. Interactions among covariates such as X1*X1 and X1*X2 are valid, but X1(X2) is not.
- Factor and covariate effects can be connected only by the * operator. Suppose A and B are factors, and X1 and X2 are covariates. Examples of valid factor-by-covariate interaction effects are A*X1, A*B*X1, X1*A(B), A*X1*X1, and B*X1*X2.
- If the MODEL subcommand is not specified, or if it is specified with no model
effects, then the GENLIN procedure
fits the intercept-only model (unless the intercept is excluded on
the INTERCEPT keyword). If the
multinomial distribution is being used, then the GENLIN procedure fits the thresholds-only model.
INTERCEPT Keyword
The INTERCEPT keyword controls whether an intercept term is included in the model.
- If the multinomial distribution is in effect (DISTRIBUTION = MULTINOMIAL), then the INTERCEPT keyword is silently ignored.
YES. The intercept is included in the model. This is the default.
NO. The intercept is not included in the model. If no model effects are defined and INTERCEPT = NO is specified, then a null model is fit.
OFFSET Keyword
The OFFSET keyword specifies an offset variable or fixes the offset at a number.
- The offset variable, if specified, must be numeric.
- The offset variable may not be a dependent variable, events or trials variable, factor, covariate, SCALEWEIGHT, SUBJECT, or WITHINSUBJECT variable.
- Cases with missing values on the OFFSET variable are not used in the analysis.
- Specifying a number when INTERCEPT = YES is equivalent to adding a constant to the intercept.
- Specifying a number when INTERCEPT = NO is equivalent to fixing the intercept at the specified number.
SCALEWEIGHT Keyword
The SCALEWEIGHT keyword specifies a variable that contains omega weight values for the scale parameter.
- The scale weight variable must be numeric.
- The scale weight variable may not be a dependent variable, events or trials variable, factor, covariate, OFFSET, SUBJECT, or WITHINSUBJECT variable.
- Cases with scale weight values that are less than or equal to 0, or missing, are not used in the analysis.
DISTRIBUTION Keyword
The DISTRIBUTION keyword specifies the probability distribution of the dependent variable.
- The default probability distribution depends on the specification format of the dependent variable. If an events/trials specification is used, then the default distribution is BINOMIAL. If a single variable specification is used, then the default distribution is NORMAL.
- Caution must be exercised when the dependent variable has events/trials format, and the LINK but not the DISTRIBUTION keyword is used. In this condition, depending on the LINK specification, the default DISTRIBUTION = BINOMIAL may yield an improper combination of DISTRIBUTION and LINK settings.
- Also, caution must be exercised when the dependent variable has single variable format, and the LINK but not the DISTRIBUTION keyword is used. In this condition, if the dependent variable is a string then an error will result because a string variable cannot have a normal probability distribution. Moreover, depending on the LINK specification, the default DISTRIBUTION = NORMAL may yield an improper combination of DISTRIBUTION and LINK settings.
- The discussion of the LINK keyword below gives details about proper and improper combinations of DISTRIBUTION and LINK settings.
BINOMIAL. Binomial probability distribution. If the dependent variable is specified as a single variable, then it may be numeric or string and there may be only two distinct valid data values.
If the events and trials options are specified on the GENLIN command, then the procedure automatically computes the ratio of the events variable over the trials variable or number. The events variable—and the trials variable if specified—must be numeric. Data values for the events variable must be integers greater than or equal to zero. Data values for the trials variable must be integers greater than zero. For each case, the trials value must be greater than or equal to the events value. If an events value is noninteger, less than zero, or missing, then the corresponding case is not used in the analysis. If a trials value is noninteger, less than or equal to zero, less than the events value, or missing, then the corresponding case is not used in the analysis.
If the trials option specifies a number, then it must be a positive integer, and it must be greater than or equal to the events value for each case. Cases with invalid values are not used in the analysis.
This is the default probability distribution if the dependent variable is specified using events/trials format.
GAMMA. Gamma probability distribution. The dependent variable must be numeric, with data values greater than zero. If a data value is less than or equal to zero, or missing, then the corresponding case is not used in the analysis.
IGAUSS. Inverse Gaussian probability distribution. The dependent variable must be numeric, with data values greater than zero. If a data value is less than or equal to zero, or missing, then the corresponding case is not used in the analysis.
MULTINOMIAL. Multinomial probability distribution. The dependent variable must be specified as a single variable, it may be numeric or string, and it must have at least two distinct valid data values. The dependent variable is assumed to be ordinal with values having an intrinsic ordering.
NEGBIN(number | MLE). Negative binomial probability distribution. The dependent variable must be numeric, with data values that are integers greater than or equal to zero. If a data value is noninteger, less than zero, or missing, then the corresponding case is not used in the analysis.
The option specifies the negative binomial distribution’s ancillary parameter. Specify a number greater than or equal to zero to fix the parameter at the number. Specify MLE to use the maximum likelihood estimate of the parameter. The default value is 1.
If the REPEATED subcommand is specified, then the ancillary parameter is treated as follows. (1) If the ancillary parameter is specified as a number, then it is fixed at that number for the initial generalized linear model and the generalized estimating equations. (2) If the ancillary parameter is estimated using maximum likelihood in the initial generalized linear model, then the estimate is passed to the generalized estimating equations, where it is treated as a fixed number. (3) If NEGBIN(MLE) is specified but the initial generalized linear model is bypassed and initial values are directly input to the generalized estimating equations (see the discussion of Initial Values and Generalized Estimating Equations in REPEATED Subcommand (GENLIN command)), then the initial value of the ancillary parameter is passed to the generalized estimating equations, where it is treated as a fixed number.
NORMAL. Normal probability distribution. The dependent variable must be numeric. This is the default probability distribution if the dependent variable is specified using single-variable format.
POISSON. Poisson probability distribution. The dependent variable must be numeric, with data values that are integers greater than or equal to zero. If a data value is noninteger, less than zero, or missing, then the corresponding case is not used in the analysis.
TWEEDIE(number). Tweedie probability distribution. The dependent variable must be numeric, with data values greater than or equal to zero. If a data value is less than zero or missing, then the corresponding case is not used in the analysis.The required number specification is the fixed value of the Tweedie distribution’s parameter. Specify a number greater than one and less than two. There is no default value.
LINK Keyword
The LINK keyword specifies the link function. The following link functions are available.
- If the multinomial distribution is in effect (DISTRIBUTION = MULTINOMIAL), then only the the cumulative link functions are available. Keywords corresponding to these functions have prefix CUM. If a non-cumulative link function is specified for the multinomial distribution, then an error is issued.
IDENTITY. Identity link function. f(x)=x
CLOGLOG. Complementary log-log link function. f(x)=ln(−ln(1−x))
LOG. Log link function. f(x)=ln(x)
LOGC . Log complement link function. f(x)=ln(1−x)
LOGIT . Logit link function. f(x)=ln(x / (1−x))
NEGBIN. Negative binomial link function. f(x)=ln(x / (x+k −1))
NLOGLOG. Negative log-log link function. f(x)=−ln(−ln(x))
ODDSPOWER(number). Odds power link function. f(x)=[(x/(1−x))α−1]/α, if α≠0. f(x)=ln(x), if α=0. α is the required number specification and must be a real number. There is no default value.
POWER(number). Power link function. f(x)=x α, if α≠0. f(x)=ln(x), if α=0. α is the required number specification and must be a real number. If |α| < 2.2e-16, α is treated as 0. There is no default value.
PROBIT. Probit link function. f(x)=Φ−1(x), where Φ−1 is the inverse standard normal cumulative distribution function.
CUMCAUCHIT. Cumulative Cauchit link function. f(x) = tan(π (x - 0.5)). May be specified only if DISTRIBUTION = MULTINOMIAL.
CUMCLOGLOG. Cumulative complementary log-log link function. f(x)=ln(−ln(1−x)). May be specified only if DISTRIBUTION = MULTINOMIAL.
CUMLOGIT. Cumulative logit link function. f(x)=ln(x / (1−x)). May be specified only if DISTRIBUTION = MULTINOMIAL.
CUMNLOGLOG. Cumulative negative log-log link function. f(x)=−ln(−ln(x)). May be specified only if DISTRIBUTION = MULTINOMIAL.
CUMPROBIT. Cumulative probit link function. f(x)=Φ−1(x), where Φ−1 is the inverse standard normal cumulative distribution function. May be specified only if DISTRIBUTION = MULTINOMIAL.
- If neither the DISTRIBUTION nor the LINK keyword is specified, then the default link function is IDENTITY.
- If DISTRIBUTION is specified but LINK is not,
then the default setting for LINK depends on the DISTRIBUTION setting as shown in the following table.
Table 1. Default link function for each distribution DISTRIBUTION Setting Default LINK Setting NORMAL IDENTITY BINOMIAL LOGIT GAMMA POWER(−1) IGAUSS POWER(−2) MULTINOMIAL CUMLOGIT NEGBIN LOG POISSON LOG TWEEDIE POWER(1−p), where p is the Tweedie distrubution's parameter - The GENLIN procedure will fit a model if a permissible combination of LINK and DISTRIBUTION specifications is given. The table below indicates the permissible LINK and DISTRIBUTION combinations. Specifying an improper combination will yield an error message.
- Note that the default setting for DISTRIBUTION is NORMAL irrespective of the LINK specification, and that not all LINK specifications are valid for DISTRIBUTION = NORMAL. Thus, if LINK is specified but DISTRIBUTION is not, then the default DISTRIBUTION = NORMAL may yield an improper combination of DISTRIBUTION and LINK settings.
Link | NORMAL | BINOMIAL | GAMMA | IGAUSS | NEGBIN | POISSON | TWEEDIE |
---|---|---|---|---|---|---|---|
IDENTITY | X | X | X | X | X | X | X |
CLOGLOG | X | ||||||
LOG | X | X | X | X | X | X | X |
LOGC | X | ||||||
LOGIT | X | ||||||
NEGBIN | X | ||||||
NLOGLOG | X | ||||||
ODDSPOWER | X | ||||||
PROBIT | X | ||||||
POWER | X | X | X | X | X | X | X |
Note: The NEGBIN link function is not available if DISTRIBUTION = NEGBIN(0) is specified.