Target (GLE models)
These settings define the target, its distribution, and its relationship to the predictors through the link function.
Target The target is required. It can have any measurement level, and the measurement level of the target affects which distributions and link functions are appropriate.
- Use predefined target To use the target settings from an upstream Type node (or the Types tab of an upstream source node), select this option.
- Use custom target To manually assign a target, select this option.
- Use number of trials as denominator When the target
response is a number of events occurring in a set of trials, the target field contains the number of
events and you can select an additional field containing the number of trials. For example, when
testing a new pesticide you might expose samples of ants to different concentrations of the
pesticide and then record the number of ants killed and the number of ants in each sample. In this
case, the field recording the number of ants killed should be specified as the target (events)
field, and the field recording the number of ants in each sample should be specified as the trials
field. If the number of ants is the same for each sample, then the number of trials may be specified
using a fixed value.
The number of trials should be greater than or equal to the number of events for each record. Events should be non-negative integers, and trials should be positive integers.
- Customize reference category. For a categorical target, you can choose the reference category. This can affect certain output, such as parameter estimates, but it should not change the model fit. For example, if your target takes values 0, 1, and 2, by default, the procedure makes the last (highest-valued) category, or 2, the reference category. In this situation, parameter estimates should be interpreted as relating to the likelihood of category 0 or 1 relative to the likelihood of category 2. If you specify a custom category and your target has defined labels, you can set the reference category by choosing a value from the list. This can be convenient when, in the middle of specifying a model, you don't remember exactly how a particular field was coded.
Target Distribution and Relationship (Link) with the Linear Model Given the values of the predictors, the model expects the distribution of values of the target to follow the specified shape, and for the target values to be linearly related to the predictors through the specified link function. Short cuts for several common models are provided, or choose a Custom setting if there is a particular distribution and link function combination you want to fit that is not on the short list.
- Linear model Specifies a normal distribution with an identity link, which is useful when the target can be predicted using a linear regression or ANOVA model.
- Gamma regression Specifies a Gamma distribution with a log link, which should be used when the target contains all positive values and is skewed towards larger values.
- Loglinear Specifies a Poisson distribution with a log link, which should be used when the target represents a count of occurrences in a fixed period of time.
- Negative binomial regression Specifies a negative binomial distribution with a log link, which should be used when the target and denominator represent the number of trials required to observe k successes.
- Tweedie regression Specifies a Tweedie distribution with identity, log, or power link functions and are useful for modeling responses that are a mixture of zeros and positive real values. These distributions are also called compound Poisson, compound gamma, and Poisson-gamma distributions.
- Multinomial logistic regression Specifies a multinomial distribution, which should be used when the target is a multi-category response. It uses either a cumulative logit link (ordinal outcomes) or a generalized logit link (multi-category nominal responses).
- Binary logistic regression Specifies a binomial distribution with a logit link, which should be used when the target is a binary response predicted by a logistic regression model.
- Binary probit Specifies a binomial distribution with a probit link, which should be used when the target is a binary response with an underlying normal distribution.
- Interval censored survival Specifies a binomial distribution with a complementary log-log link, which is useful in survival analysis when some observations have no termination event.
- Custom Specify your own combination of distribution and link function.
Distribution
This selection specifies the Distribution of the target. The ability to specify a non-normal distribution and non-identity link function is the essential improvement of the generalized linear model over the linear model. There are many possible distribution-link function combinations, and several may be appropriate for any given dataset, so your choice can be guided by a priori theoretical considerations or which combination seems to fit best.
- Automatic If you are unsure which distribution to use, select this option; the node analyzes your data to estimate and apply the best distribution method.
- Binomial This distribution is appropriate only for a target that represents a binary response or number of events.
- Gamma This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
- Inverse Gaussian This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
- Multinomial This distribution is appropriate for a
target that represents a multi-category response. The form of the model will depend on the
measurement level of the target.
A nominal target will result in a nominal multinomial model in which a separate set of model parameters are estimated for each category of the target (except the reference category). The parameter estimates for a given predictor show the relationship between that predictor and the likelihood of each category of the target, relative to the reference category.
An ordinal target will result in an ordinal multinomial model in which the traditional intercept term is replaced with a set of threshold parameters that relate to the cumulative probability of the target categories.
- Negative binomial Negative binomial regression uses a negative binomial distribution with a log link, which should be used when the target represents a count of occurrences with high variance.
- Normal This is appropriate for a continuous target whose values take a symmetric, bell-shaped distribution about a central (mean) value.
- Poisson This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non-negative integer values. If a data value is non-integer, less than 0, or missing, then the corresponding case is not used in the analysis.
- Tweedie This distribution is appropriate for variables that can be represented by Poisson mixtures of gamma distributions; the distribution is "mixed" in the sense that it combines properties of continuous (takes non-negative real values) and discrete distributions (positive probability mass at a single value, 0). The dependent variable must be numeric, with data values greater than or equal to zero. If a data value is less than zero or missing, then the corresponding case is not used in the analysis. The fixed value of the Tweedie distribution's parameter can be any number greater than one and less than two.
Link functions
The Link function is a transformation of the target that allows estimation of the model. The following functions are available:
- Automatic If you are unsure which link to use, select this option; the node analyzes your data to estimate and apply the best link function.
- Identity f(x)=x. The target is not transformed. This link can be used with any distribution, except the multinomial.
- Complementary log-log f(x)=log(−log(1−x)). This is appropriate only with the binomial or multinomial distribution.
- Cauchit f(x) = tan(π (x − 0.5)). This is appropriate only with the binomial or multinomial distribution.
- Log f(x)=log(x). This link can be used with any distribution, except the multinomial.
- Log complement f(x)=log(1−x). This is appropriate only with the binomial distribution.
- Logit f(x)=log(x / (1−x)). This is appropriate only with the binomial or multinomial distribution.
- Negative log-log f(x)=−log(−log(x)). This is appropriate only with the binomial or multinomial distribution.
- Probit f(x)=Φ−1(x), where Φ−1 is the inverse standard normal cumulative distribution function. This is appropriate only with the binomial or multinomial distribution.
- Power f(x)=x α, if α ≠ 0. f(x)=log(x), if α=0. α is the required number specification and must be a real number. This link can be used with any distribution, except the multinomial.
Parameter for Tweedie Only available if you have selected either the Tweedie regression radio button or Tweedie as the Distribution method. Select a value between 1 and 2.