Question & Answer
Please indicate the key similarities and differences between AnswerTree and the Decision Tree procedure in IBM SPSS Statistics.
1. Both the AnswerTree (AT) and Decision Tree (DT) procedures allow you to produce a decision tree (or classification tree) to segment a data set into a collection of terminal nodes that are distinct in the distribution of a target variable, with each case in the data set belonging to one and only one terminal node.
2. Both AT and DT offer a set of 4 methods, or algorithms, for producing the tree: Chi-Square Automatic Interaction Detection (CHAID); Exhaustive CHAID; Classification & Regression Trees (called C&RT in AT and CRT in DT); QUEST.
3. Neither AT nor DT honors SPSS Statistics Split File designations.
4. Noninteger (fractional) frequency weights for each case are rounded to the nearest integer See points 4 and 5 under "Differences" below.
1. AT is interactive, letting you prune a branch and replace a predictor at a particular split at any level of the tree (Tree->Select Predictor) and manually separate or merge nodes at a split (Tree->Define Split). DT lets you choose the predictor for the very first split but is not interactive. DT lets you specify a pruning rule for the CRT and QUEST methods, but will not let you alter the tree once it has grown.
2. DT lets you save predicted values and probabilities directly to the active data file. AT requires a little more work. You can paste the rules generated by AT into an SPSS syntax file and run them in SPSS, with the data set active in SPSS, to save the same set of variables.
3. DT requires a categorical target variable to have value labels to allow production of outputs (e.g. misclassification costs) that do not require value labels in AT. DT also requires value labels for the target variable to produce Gains charts and tables.
4. Frequency weights are assigned by a Frequency weight box in AT. In DT, the weight variable defined by Data->Weight cases is used as the frequency weight.
5. "Case weights" in AT are called "Influence Variables " in DT. See Technote 1478254 for more detail on the distinction between case/influence and frequency weight variables.
6. Missing values for ordinal predictors are handled differently in AT CHAID vs. DT CHAID. For all growth methods in both both products, a case will not be used in the analysis if the dependent variable of the case is missing; if all predictor variables of a case are missing;.if the case weight is missing, zero, or negative; if the frequency weight is missing, zero, or negative.. Otherwise, missing values will be treated as a predictor category.
The following description for Decision Tree CHAID's handling of missing values in ordinal predictors can be found in the algorithms available under Help->Algorithms. Scroll down the tree on the left of the Algorithms page and expand the Algorithms link, then scroll down and expand the Trees Algorithms link and then the "CHAID and Exhaustive CHAID Algorithms" link,
"For ordinal predictors, the algorithm first generates the “best” set of categories using all non-missing information from the data. Next the algorithm identifies the category that is most similar to the missing category. Finally, the algorithm decides whether to merge the missing category with its most similar category or to keep the missing category as a separate category. Two p-values are calculated, one for the set of categories formed by merging the missing category with its most similar category, and the other for the set of categories formed by adding the missing category as a separate category. Take the action that gives the smallest p-value."
AnswerTree, on the other hand, only splits off the missing value as a separate child node if it is significantly different from all of the nonmissing nodes.
For nominal predictors, both AT and DT treat the missing category the same as other categories of the predictor.
16 June 2018