Glossary
A
AICC
A measure for selecting and comparing mixed models based on the -2 (Restricted) log
likelihood. Smaller values indicate better models. The AICC "corrects" the AIC for small sample
sizes. As the sample size increases, the AICC converges to the AIC.
B
Bayesian Information Criterion (BIC)
A measure for selecting and comparing models based on the -2 log likelihood. Smaller
values indicate better models. The BIC also "penalizes" overparameterized models (complex models
with a large number of inputs, for example), but more strictly than the AIC.
Box's M test
A test for the equality of the group covariance matrices. For sufficiently large
samples, a nonsignificant p value means there is insufficient evidence that the matrices differ. The
test is sensitive to departures from multivariate normality.
C
Cases
Codes for actual group, predicted group, posterior probabilities, and discriminant
scores are displayed for each case.
Classification Results
The number of cases correctly and incorrectly assigned to each of the groups based on
the discriminant analysis. Sometimes called the "Confusion Matrix."
Combined-Groups Plots
Creates an all-groups scatterplot of the first two discriminant function values. If
there is only one function, a histogram is displayed instead.
Covariance
An unstandardized measure of association between two variables, equal to the
cross-product deviation divided by N-1.
F
Fisher's
Displays Fisher's classification function coefficients that can be used directly for
classification. A separate set of classification function coefficients is obtained for each group,
and a case is assigned to the group for which it has the largest discriminant score (classification
function value).
H
Hazard Plot
Displays the cumulative hazard function on a linear scale.
K
Kurtosis
A measure of the extent to which there are outliers. For a normal distribution, the
value of the kurtosis statistic is zero. Positive kurtosis indicates that the data exhibit more
extreme outliers than a normal distribution. Negative kurtosis indicates that the data exhibit less
extreme outliers than a normal distribution.
L
Leave-one-out Classification
Each case in the analysis is classified by the functions derived from all cases other
than that case. It is also known as the "U-method."
M
MAE
Mean absolute error. Measures how much the series varies from its model-predicted
level. MAE is reported in the original series units.
Mahalanobis Distance
A measure of how much a case's values on the independent variables differ from the
average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one
or more of the independent variables.
MAPE
Mean Absolute Percentage Error. A measure of how much a dependent series varies from
its model-predicted level. It is independent of the units used and can therefore be used to compare
series with different units.
MaxAE
Maximum Absolute Error. The largest forecasted error, expressed in the same units as
the dependent series. Like MaxAPE, it is useful for imagining the worst-case scenario for your
forecasts. Maximum absolute error and maximum absolute percentage error may occur at different
series points--for example, when the absolute error for a large series value is slightly larger than
the absolute error for a small series value. In that case, the maximum absolute error will occur at
the larger series value and the maximum absolute percentage error will occur at the smaller series
value.
MaxAPE
Maximum Absolute Percentage Error. The largest forecasted error, expressed as a
percentage. This measure is useful for imagining a worst-case scenario for your forecasts.
Maximizing the Smallest F Ratio Method of Entry
A method of variable selection in stepwise analysis based on maximizing an F ratio
computed from the Mahalanobis distance between groups.
Maximum
The largest value of a numeric variable.
Mean
A measure of central tendency. The arithmetic average, the sum divided by the number of
cases.
Means
Displays total and group means, as well as standard deviations for the independent
variables.
Median
The value above and below which half of the cases fall, the 50th percentile. If there
is an even number of cases, the median is the average of the two middle cases when they are sorted
in ascending or descending order. The median is a measure of central tendency not sensitive to
outlying values (unlike the mean, which can be affected by a few extremely high or low values).
Minimize Wilks' Lambda
A variable selection method for stepwise discriminant analysis that chooses variables
for entry into the equation on the basis of how much they lower Wilks' lambda. At each step, the
variable that minimizes the overall Wilks' lambda is entered.
Minimum
The smallest value of a numeric variable.
Mode
The most frequently occurring value. If several values share the greatest frequency of
occurrence, each of them is a mode.
N
Normalized BIC
Normalized Bayesian Information Criterion. A general measure of the overall fit of a
model that attempts to account for model complexity. It is a score based upon the mean square error
and includes a penalty for the number of parameters in the model and the length of the series. The
penalty removes the advantage of models with more parameters, making the statistic easy to compare
across different models for the same series.
O
One Minus Survival
Plots one-minus the survival function on a linear scale.
R
Range
The difference between the largest and smallest values of a numeric variable, the
maximum minus the minimum.
Rao's V (Discriminant Analysis)
A measure of the differences between group means. Also called the Lawley-Hotelling
trace. At each step, the variable that maximizes the increase in Rao's V is entered. After selecting
this option, enter the minimum value a variable must have to enter the analysis.
RMSE
Root Mean Square Error. The square root of mean square error. A measure of how much a
dependent series varies from its model-predicted level, expressed in the same units as the dependent
series.
R-Squared
Goodness-of-fit measure of a linear model, sometimes called the coefficient of
determination. It is the proportion of variation in the dependent variable explained by the
regression model. It ranges in value from 0 to 1. Small values indicate that the model does not fit
the data well.
S
Separate-Groups
Separate-groups covariance matrices are used for classification. Because classification
is based on the discriminant functions (not based on the original variables), this option is not
always equivalent to quadratic discrimination.
Separate-Groups Covariance
Displays separate covariance matrices for each group.
Separate-Groups Plots
Creates separate-group scatterplots of the first two discriminant function values. If
there is only one function, histograms are displayed instead.
Sequential Bonferroni
This is a sequentially step-down rejective Bonferroni procedure that is much less
conservative in terms of rejecting individual hypotheses but maintains the same overall significance
level.
Sequential Sidak
This is a sequentially step-down rejective Sidak procedure that is much less
conservative in terms of rejecting individual hypotheses but maintains the same overall significance
level.
Skewness
A measure of the asymmetry of a distribution. The normal distribution is symmetric and
has a skewness value of 0. A distribution with a significant positive skewness has a long right
tail. A distribution with a significant negative skewness has a long left tail. As a guideline, a
skewness value more than twice its standard error is taken to indicate a departure from
symmetry.
standard deviation
A measure of dispersion around the mean, equal to the square root of the variance. The
standard deviation is measured in the same units as the original variable.
Standard Deviation
A measure of dispersion around the mean. In a normal distribution, 68% of cases fall
within one standard deviation of the mean and 95% of cases fall within two standard deviations. For
example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would be between
25 and 65 in a normal distribution.
Standard Error
A measure of how much the value of a test statistic varies from sample to sample. It is
the standard deviation of the sampling distribution for a statistic. For example, the standard error
of the mean is the standard deviation of the sample means.
Standard Error of Kurtosis
The ratio of kurtosis to its standard error can be used as a test of normality (that
is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive
value for kurtosis indicates that the tails of the distribution are longer than those of a normal
distribution; a negative value for kurtosis indicates shorter tails (becoming like those of a
box-shaped uniform distribution).
Standard Error of Mean
A measure of how much the value of the mean may vary from sample to sample taken from
the same distribution. It can be used to roughly compare the observed mean to a hypothesized value
(that is, you can conclude the two values are different if the ratio of the difference to the
standard error is less than -2 or greater than +2).
Standard Error of Skewness
The ratio of skewness to its standard error can be used as a test of normality (that
is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive
value for skewness indicates a long right tail; an extreme negative value indicates a long left
tail.
Stationary R-squared
A measure that compares the stationary part of the model to a simple mean model. This
measure is preferable to ordinary R-squared when there is a trend or seasonal pattern. Stationary
R-squared can be negative with a range of negative infinity to 1. Negative values mean that the
model under consideration is worse than the baseline model. Positive values mean that the model
under consideration is better than the baseline model.
Sum
The sum or total of the values, across all cases with nonmissing values.
Survival Plot
Displays the cumulative survival function on a linear scale.
T
Territorial Map
A plot of the boundaries used to classify cases into groups based on function values.
The numbers correspond to groups into which cases are classified. The mean for each group is
indicated by an asterisk within its boundaries. The map is not displayed if there is only one
discriminant function.
Total Covariance
Displays a covariance matrix from all cases as if they were from a single sample.
U
Unexplained Variance
At each step, the variable that minimizes the sum of the unexplained variation between
groups is entered.
Unique
Evaluates all effects simultaneously, adjusting each effect for all other effects of
any type.
Univariate ANOVAs
Performs a one-way analysis-of-variance test for equality of group means for each
independent variable.
Unstandardized
Displays the unstandardized discriminant function coefficients.
Use F Value
A variable is entered into the model if its F value is greater than the Entry value and
is removed if the F value is less than the Removal value. Entry must be greater than Removal, and
both values must be positive. To enter more variables into the model, lower the Entry value. To
remove more variables from the model, increase the Removal value.
Use Probability of F
A variable is entered into the model if the significance level of its F value is less
than the Entry value and is removed if the significance level is greater than the Removal value.
Entry must be less than Removal, and both values must be positive. To enter more variables into the
model, increase the Entry value. To remove more variables from the model, lower the Removal
value.
V
Valid
Valid cases having neither the system-missing value, nor a value defined as
user-missing.
Variance
A measure of dispersion around the mean, equal to the sum of squared deviations from
the mean divided by one less than the number of cases. The variance is measured in units that are
the square of those of the variable itself.
W
Within-Groups
The pooled within-groups covariance matrix is used to classify cases.
Within-Groups Correlation
Displays a pooled within-groups correlation matrix that is obtained by averaging the
separate covariance matrices for all groups before computing the correlations.
Within-Groups Covariance
Displays a pooled within-groups covariance matrix, which may differ from the total
covariance matrix. The matrix is obtained by averaging the separate covariance matrices for all
groups.