# Using Rao's V as criteria in stepwise Discriminant Analysis in SPSS Statistics

## Question

I am using the Discriminant procedure in SPSS Statistics. I have chosen the Stepwise Method and clicked the Method button to choose a criterion for variable selection. In the Stepwise Method dialog, I see a choice under Method called Rao's V, which has a box to type the "V to enter" if that method is chosen. The default "V to enter" is 0, which presumably allow any nonnegative value of V to qualify. Please provide some background information on Rao's V and a strategy for choosing a reasonable critical value of V.

Rao's V is a measure of the distance between the group centroids, i.e., the group means on the set of predictors for the discriminant function. If you are using stepwise analysis, then you are selecting one predictor at a time to build  a discriminant function which will make it easy to discriminate between groups by having a large distance between them on the predictors that are used.  If a predictor is not useful for discrimination, then we usually don't want to include it, so we have a stopping rule to stop the stepwise process when the remaining predictors are not useful. For Rao's V, setting the "V to enter" means setting a minimum value for the change in V when a predictor is entered. At a particular step, if the largest V change for all of the remaining predictors (i.e. predictors not yet entered) is smaller than the value that was entered into the "V to enter" box, then the stepwise variable selection will stop and the discriminant function will be based on the predictors that were already selected.
So, what is a reasonable value to enter? From the algorithms,
"When n−g is large, V, under the null hypothesis, is approximately distributed as χ2 with q(g−1) degrees of
freedom. When an additional variable is entered, the change in V, if positive, has approximately
a χ2 distribution with g−1 degrees of freedom."
where n is the sample size, g is the number of groups, and q is the number of variables. For a change in V when a variable is entered, you may want the change to be statistically significant at p<.05. For 4 groups (for example), you would then need to know the value of chi-square with 4-1=3 degrees of freedom (DF) that has a significance level of .05.
You can find that in a table of critical chi square values  in a statistics textbook or you can use the SPSS IDF.CHISQ() function in a COMPUTE command to create a variable  whose value is the necessary critical value of a chi-square variable with 3 DF.
The IDF.CHISQ function (or Inverse Distribution Function) takes the Cumulative Distribution value for a proposed chi-square variable X, i.e. the proportion of the population with scores less than X on the specified Chi-square distribution, and the degrees of freedom as arguments and returns the value of X that has that cumulative distribution for that number of degrees of freedom. The cumulative distribution value for the first argument is 1 minus the desired significance level (e.g., .95 for a significance level of .05). The following COMPUTE command returns the critical chi-square at p<.05 for 3 DF.

COMPUTE sig_V_4gp=IDF.CHISQ(.95,3).
EXECUTE.

The result is 7.8147. So, if you enter 7.815 as "V to enter", this would set a criteria for variable entry that was the critical value for p<.05 significance for Rao's V.

Rao's V is discussed in the following sources:

Klecka, W.R. (1980). Discriminant Analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences, series no. 07-019. Beverly Hills and London: Sage Publications.

Morrison, D.F. (1976), Multivariate Statistical Methods (2nd Ed.). McGraw-Hill.

Rao, C.R. (1952). Advanced Statistical Methods in Biometric Research. New York: Wiley.

[{"Product":{"code":"SSLVMB","label":"SPSS Statistics"},"Business Unit":{"code":"BU001","label":"Analytics Private Cloud"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":""}]

Modified date:
16 June 2018

swg21968175