GenLin Node Expert Options
If you have detailed knowledge of generalized linear models, expert options allow you to finetune the training process. To access expert options, set Mode to Expert on the Expert tab.
Target Field Distribution and Link Function
Distribution.
This selection specifies the distribution of the dependent variable. The ability to specify a nonnormal distribution and nonidentity link function is the essential improvement of the generalized linear model over the general linear model. There are many possible distributionlink function combinations, and several may be appropriate for any given dataset, so your choice can be guided by a priori theoretical considerations or which combination seems to fit best.
 Binomial. This distribution is appropriate only for variables that represent a binary response or number of events.
 Gamma. This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
 Inverse Gaussian. This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values. If a data value is less than or equal to 0 or is missing, then the corresponding case is not used in the analysis.
 Negative binomial. This distribution can be thought of as the number of trials required to observe k successes and is appropriate for variables with nonnegative integer values. If a data value is noninteger, less than 0, or missing, then the corresponding case is not used in the analysis. The fixed value of the negative binomial distribution's ancillary parameter can be any number greater than or equal to 0. When the ancillary parameter is set to 0, using this distribution is equivalent to using the Poisson distribution.
 Normal. This is appropriate for scale variables whose values take a symmetric, bellshaped distribution about a central (mean) value. The dependent variable must be numeric.
 Poisson. This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with nonnegative integer values. If a data value is noninteger, less than 0, or missing, then the corresponding case is not used in the analysis.
 Tweedie. This distribution is appropriate for variables that can be represented by Poisson mixtures of gamma distributions; the distribution is "mixed" in the sense that it combines properties of continuous (takes nonnegative real values) and discrete distributions (positive probability mass at a single value, 0). The dependent variable must be numeric, with data values greater than or equal to zero. If a data value is less than zero or missing, then the corresponding case is not used in the analysis. The fixed value of the Tweedie distribution's parameter can be any number greater than one and less than two.
 Multinomial. This distribution is appropriate for variables that represent an ordinal response. The dependent variable can be numeric or string, and it must have at least two distinct valid data values.
Link Functions.
The link function is a transformation of the dependent variable that allows estimation of the model. The following functions are available:
 Identity. f(x)=x. The dependent variable is not transformed. This link can be used with any distribution.
 Complementary loglog. f(x)=log(−log(1−x)). This is appropriate only with the binomial distribution.
 Cumulative Cauchit. f(x) = tan(π (x – 0.5)), applied to the cumulative probability of each category of the response. This is appropriate only with the multinomial distribution.
 Cumulative complementary loglog. f(x)=ln(−ln(1−x)), applied to the cumulative probability of each category of the response. This is appropriate only with the multinomial distribution.
 Cumulative logit. f(x)=ln(x / (1−x)), applied to the cumulative probability of each category of the response. This is appropriate only with the multinomial distribution.
 Cumulative negative loglog. f(x)=−ln(−ln(x)), applied to the cumulative probability of each category of the response. This is appropriate only with the multinomial distribution.
 Cumulative probit. f(x)=Φ^{−1}(x), applied to the cumulative probability of each category of the response, where Φ^{−1} is the inverse standard normal cumulative distribution function. This is appropriate only with the multinomial distribution.
 Log. f(x)=log(x). This link can be used with any distribution.
 Log complement. f(x)=log(1−x). This is appropriate only with the binomial distribution.
 Logit. f(x)=log(x / (1−x)). This is appropriate only with the binomial distribution.
 Negative binomial. f(x)=log(x / (x+k ^{−1})), where k is the ancillary parameter of the negative binomial distribution. This is appropriate only with the negative binomial distribution.
 Negative loglog. f(x)=−log(−log(x)). This is appropriate only with the binomial distribution.
 Odds power. f(x)=[(x/(1−x))^{α}−1]/α, if α ≠ 0. f(x)=log(x), if α=0. α is the required number specification and must be a real number. This is appropriate only with the binomial distribution.
 Probit. f(x)=Φ^{−1}(x), where Φ^{−1} is the inverse standard normal cumulative distribution function. This is appropriate only with the binomial distribution.
 Power. f(x)=x ^{α}, if α ≠ 0. f(x)=log(x), if α=0. α is the required number specification and must be a real number. This link can be used with any distribution.
Parameters. The controls in this group allow you to specify parameter values when certain distribution options are chosen.
 Parameter for negative binomial. For negative binomial distribution, choose either to specify a value or to allow the system to provide an estimated value.

Parameter for Tweedie. For Tweedie distribution, specify a number between 1.0 and 2.0 for the
fixed value.
Parameter Estimation. The controls in this group allow you to specify estimation methods and to provide initial values for the parameter estimates.
 Method. You can select a parameter estimation method. Choose between NewtonRaphson, Fisher scoring, or a hybrid method in which Fisher scoring iterations are performed before switching to the NewtonRaphson method. If convergence is achieved during the Fisher scoring phase of the hybrid method before the maximum number of Fisher iterations is reached, the algorithm continues with the NewtonRaphson method.
 Scale Parameter Method. You can select the scale parameter estimation method. Maximumlikelihood jointly estimates the scale parameter with the model effects; note that this option is not valid if the response has a negative binomial, Poisson, or binomial distribution . The deviance and Pearson chisquare options estimate the scale parameter from the value of those statistics. Alternatively, you can specify a fixed value for the scale parameter.
 Covariance matrix. The modelbased estimator is the negative of the generalized inverse of the Hessian matrix. The robust (also called the Huber/White/sandwich) estimator is a "corrected" modelbased estimator that provides a consistent estimate of the covariance, even when the specification of the variance and link functions is incorrect.
Iterations. These options allow you to control the parameters for model convergence. See the topic Generalized Linear Models Iterations for more information.
Output. These options allow you to request additional statistics that will be displayed in the advanced output of the model nugget built by the node. See the topic Generalized Linear Models Advanced Output for more information.
Singularity Tolerance. Singular (or noninvertible) matrices have linearly dependent columns, which can cause serious problems for the estimation algorithm. Even nearsingular matrices can lead to poor results, so the procedure will treat a matrix whose determinant is less than the tolerance as singular. Specify a positive value.