Statistical models use mathematical equations to encode information extracted
from the data. In some cases, statistical modeling techniques can provide adequate models very
quickly. Even for problems in which more flexible machinelearning techniques (such as neural
networks) can ultimately give better results, you can use some statistical models as baseline
predictive models to judge the performance of more advanced techniques.
The following statistical modeling nodes are available.

Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors.


Logistic regression is a statistical technique for classifying records based on values of
input fields. It is analogous to linear regression but takes a categorical target field instead of a
numeric range.


The PCA/Factor node provides powerful datareduction techniques to reduce the complexity of
your data. Principal components analysis (PCA) finds linear combinations of the input fields that do
the best job of capturing the variance in the entire set of fields, where the components are
orthogonal (perpendicular) to each other. Factor analysis attempts to identify underlying factors
that explain the pattern of correlations within a set of observed fields. For both approaches, the
goal is to find a small number of derived fields that effectively summarizes the information in the
original set of fields.


Discriminant analysis makes more stringent assumptions than logistic regression but can be a
valuable alternative or supplement to a logistic regression analysis when those assumptions are met.


The Generalized Linear model expands the general linear model so that the dependent variable
is linearly related to the factors and covariates through a specified link function. Moreover, the
model allows for the dependent variable to have a nonnormal distribution. It covers the
functionality of a wide number of statistical models, including linear regression, logistic
regression, loglinear models for count data, and intervalcensored survival models.


A generalized linear mixed model (GLMM) extends the linear model so that the target can have
a nonnormal distribution, is linearly related to the factors and covariates via a specified link
function, and so that the observations can be correlated. Generalized linear mixed models cover a
wide variety of models, from simple linear regression to complex multilevel models for nonnormal
longitudinal data. 

The Cox regression node enables you to build a survival model for timetoevent data in the
presence of censored records. The model produces a survival function that predicts the probability
that the event of interest has occurred at a given time (t) for given values of the input
variables.
