Target (generalized linear mixed models)
These settings define the target, its distribution, and its relationship to the predictors through the link function.
Target. The target is required. It can have any measurement level, and the measurement level of the target restricts which distributions and link functions are appropriate.
- Use number of trials as denominator. When the target
response is a number of events occurring in a set of trials, the target
field contains the number of events and you can select an additional
field containing the number of trials. For example, when testing a
new pesticide you might expose samples of ants to different concentrations
of the pesticide and then record the number of ants killed and the
number of ants in each sample. In this case, the field recording the
number of ants killed should be specified as the target (events) field,
and the field recording the number of ants in each sample should be
specified as the trials field. If the number of ants is the same for
each sample, then the number of trials may be specified using a fixed
value.
The number of trials should be greater than or equal to the number of events for each record. Events should be non-negative integers, and trials should be positive integers.
- Customize reference category. For a categorical target, you can choose the reference category. This can affect certain output, such as parameter estimates, but it should not change the model fit. For example, if your target takes values 0, 1, and 2, by default, the procedure makes the last (highest-valued) category, or 2, the reference category. In this situation, parameter estimates should be interpreted as relating to the likelihood of category 0 or 1 relative to the likelihood of category 2. If you specify a custom category and your target has defined labels, you can set the reference category by choosing a value from the list. This can be convenient when, in the middle of specifying a model, you don't remember exactly how a particular field was coded.
Target Distribution and Relationship (Link) with the Linear Model. Given the values of the predictors, the model expects the distribution of values of the target to follow the specified shape, and for the target values to be linearly related to the predictors through the specified link function. Short cuts for several common models are provided, or choose a Custom setting if there is a particular distribution and link function combination you want to fit that is not on the short list.
- Linear model. Specifies a normal distribution with an identity link, which is useful when the target can be predicted using a linear regression or ANOVA model.
- Gamma regression. Specifies a Gamma distribution with a log link, which should be used when the target contains all positive values and is skewed towards larger values.
- Loglinear. Specifies a Poisson distribution with a log link, which should be used when the target represents a count of occurrences in a fixed period of time.
- Negative binomial regression. Specifies a negative binomial distribution with a log link, which should be used when the target and denominator represent the number of trials required to observe k successes.
- Multinomial logistic regression. Specifies a multinomial distribution, which should be used when the target is a multi-category response. It uses either a cumulative logit link (ordinal outcomes) or a generalized logit link (multi-category nominal responses).
- Binary logistic regression. Specifies a binomial distribution with a logit link, which should be used when the target is a binary response predicted by a logistic regression model.
- Binary probit. Specifies a binomial distribution with a probit link, which should be used when the target is a binary response with an underlying normal distribution.
- Interval censored survival. Specifies a binomial distribution with a complementary log-log link, which is useful in survival analysis when some observations have no termination event.
Distribution
This selection specifies the distribution of the target. The ability to specify a non-normal distribution and non-identity link function is the essential improvement of the generalized linear mixed model over the linear mixed model. There are many possible distribution-link function combinations, and several may be appropriate for any given dataset, so your choice can be guided by a priori theoretical considerations or which combination seems to fit best.
- Binomial. This distribution is appropriate only for a target that represents a binary response or number of events.
- Gamma. This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
- Inverse Gaussian. This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
- Multinomial. This distribution is appropriate for a target
that represents a multi-category response. The form of the model will
depend on the measurement level of the target.
A nominal target will result in a nominal multinomial model in which a separate set of model parameters are estimated for each category of the target (except the reference category). The parameter estimates for a given predictor show the relationship between that predictor and the likelihood of each category of the target, relative to the reference category.
An ordinal target will result in an ordinal multinomial model in which the traditional intercept term is replaced with a set of threshold parameters that relate to the cumulative probability of the target categories.
- Negative binomial. Negative binomial regression uses a negative binomial distribution with a log link, which should be used when the target represents a count of occurrences with high variance.
- Normal. This is appropriate for a continuous target whose values take a symmetric, bell-shaped distribution about a central (mean) value.
- Poisson. This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non-negative integer values. If a data value is non-integer, less than 0, or missing, then the corresponding case is not used in the analysis.
Link Functions
The link function is a transformation of the target that allows estimation of the model. The following functions are available:
- Identity. f(x)=x. The target is not transformed. This link can be used with any distribution, except the multinomial.
- Complementary log-log. f(x)=log(−log(1−x)). This is appropriate only with the binomial or multinomial distribution.
- Cauchit. f(x) = tan(π (x − 0.5)). This is appropriate only with the binomial or multinomial distribution.
- Log. f(x)=log(x). This link can be used with any distribution, except the multinomial.
- Log complement. f(x)=log(1−x). This is appropriate only with the binomial distribution.
- Logit. f(x)=log(x / (1−x)). This is appropriate only with the binomial or multinomial distribution.
- Negative log-log. f(x)=−log(−log(x)). This is appropriate only with the binomial or multinomial distribution.
- Probit. f(x)=Φ−1(x), where Φ−1 is the inverse standard normal cumulative distribution function. This is appropriate only with the binomial or multinomial distribution.
- Power. f(x)=x α, if α ≠ 0. f(x)=log(x), if α=0. α is the required number specification and must be a real number. This link can be used with any distribution, except the multinomial.