Target (GLE models)

These settings define the target, its distribution, and its relationship to the predictors through the link function.

Target The target is required. It can have any measurement level, and the measurement level of the target affects which distributions and link functions are appropriate.

Target Distribution and Relationship (Link) with the Linear Model Given the values of the predictors, the model expects the distribution of values of the target to follow the specified shape, and for the target values to be linearly related to the predictors through the specified link function. Short cuts for several common models are provided, or choose a Custom setting if there is a particular distribution and link function combination you want to fit that is not on the short list.

Distribution

This selection specifies the Distribution of the target. The ability to specify a non-normal distribution and non-identity link function is the essential improvement of the generalized linear model over the linear model. There are many possible distribution-link function combinations, and several may be appropriate for any given dataset, so your choice can be guided by a priori theoretical considerations or which combination seems to fit best.

  • Automatic If you are unsure which distribution to use, select this option; the node analyzes your data to estimate and apply the best distribution method.
  • Binomial This distribution is appropriate only for a target that represents a binary response or number of events.
  • Gamma This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
  • Inverse Gaussian This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
  • Multinomial This distribution is appropriate for a target that represents a multi-category response. The form of the model will depend on the measurement level of the target.

    A nominal target will result in a nominal multinomial model in which a separate set of model parameters are estimated for each category of the target (except the reference category). The parameter estimates for a given predictor show the relationship between that predictor and the likelihood of each category of the target, relative to the reference category.

    An ordinal target will result in an ordinal multinomial model in which the traditional intercept term is replaced with a set of threshold parameters that relate to the cumulative probability of the target categories.

  • Negative binomial Negative binomial regression uses a negative binomial distribution with a log link, which should be used when the target represents a count of occurrences with high variance.
  • Normal This is appropriate for a continuous target whose values take a symmetric, bell-shaped distribution about a central (mean) value.
  • Poisson This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non-negative integer values. If a data value is non-integer, less than 0, or missing, then the corresponding case is not used in the analysis.
  • Tweedie This distribution is appropriate for variables that can be represented by Poisson mixtures of gamma distributions; the distribution is "mixed" in the sense that it combines properties of continuous (takes non-negative real values) and discrete distributions (positive probability mass at a single value, 0). The dependent variable must be numeric, with data values greater than or equal to zero. If a data value is less than zero or missing, then the corresponding case is not used in the analysis. The fixed value of the Tweedie distribution's parameter can be any number greater than one and less than two.

Link functions

The Link function is a transformation of the target that allows estimation of the model. The following functions are available:

  • Automatic If you are unsure which link to use, select this option; the node analyzes your data to estimate and apply the best link function.
  • Identity f(x)=x. The target is not transformed. This link can be used with any distribution, except the multinomial.
  • Complementary log-log f(x)=log(−log(1−x)). This is appropriate only with the binomial or multinomial distribution.
  • Cauchit f(x) = tan(π (x − 0.5)). This is appropriate only with the binomial or multinomial distribution.
  • Log f(x)=log(x). This link can be used with any distribution, except the multinomial.
  • Log complement f(x)=log(1−x). This is appropriate only with the binomial distribution.
  • Logit f(x)=log(x / (1−x)). This is appropriate only with the binomial or multinomial distribution.
  • Negative log-log f(x)=−log(−log(x)). This is appropriate only with the binomial or multinomial distribution.
  • Probit f(x)=Φ−1(x), where Φ−1 is the inverse standard normal cumulative distribution function. This is appropriate only with the binomial or multinomial distribution.
  • Power f(x)=x α, if α ≠ 0. f(x)=log(x), if α=0. α is the required number specification and must be a real number. This link can be used with any distribution, except the multinomial.

Parameter for Tweedie Only available if you have selected either the Tweedie regression radio button or Tweedie as the Distribution method. Select a value between 1 and 2.