Statistical models use mathematical equations to encode information
extracted from the data. In some cases, statistical modeling techniques
can provide adequate models very quickly. Even for problems in which
more flexible machine-learning techniques (such as neural networks)
can ultimately give better results, you can use some statistical models
as baseline predictive models to judge the performance of more advanced
techniques.
The following statistical modeling nodes are available.
|
Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors. |
|
Logistic regression is a statistical technique for classifying
records based on values of input fields. It is analogous to linear
regression but takes a categorical target field instead of a numeric
range. |
|
The PCA/Factor node provides powerful data-reduction techniques
to reduce the complexity of your data. Principal components analysis
(PCA) finds linear combinations of the input fields that do the best
job of capturing the variance in the entire set of fields, where the
components are orthogonal (perpendicular) to each other. Factor analysis
attempts to identify underlying factors that explain the pattern of
correlations within a set of observed fields. For both approaches,
the goal is to find a small number of derived fields that effectively
summarizes the information in the original set of fields. |
|
Discriminant analysis makes more stringent assumptions than
logistic regression but can be a valuable alternative or supplement
to a logistic regression analysis when those assumptions are met. |
|
The Generalized Linear model expands the general linear model
so that the dependent variable is linearly related to the factors
and covariates through a specified link function. Moreover, the model
allows for the dependent variable to have a non-normal distribution.
It covers the functionality of a wide number of statistical models,
including linear regression, logistic regression, loglinear models
for count data, and interval-censored survival models. |
|
A generalized linear mixed model (GLMM) extends the linear
model so that the target can have a non-normal distribution, is linearly
related to the factors and covariates via a specified link function,
and so that the observations can be correlated. Generalized linear
mixed models cover a wide variety of models, from simple linear regression
to complex multilevel models for non-normal longitudinal data. |
|
The Cox regression node enables you to build a survival model
for time-to-event data in the presence of censored records. The model
produces a survival function that predicts the probability that the
event of interest has occurred at a given time (t) for given
values of the input variables. |