CHAID Node - Advanced

The advanced options enable you to fine-tune the tree-building process.

Significance level for splitting. Specifies the significance level (alpha) for splitting nodes. The value must be between 0 and 1. Lower values tend to produce trees with fewer nodes.

Significance level for merging. Specifies the significance level (alpha) for merging categories. The value must be greater than 0 and less than or equal to 1. To prevent any merging of categories, specify a value of 1. For continuous targets, this means the number of categories for the variable in the final tree matches the specified number of intervals. This option is not available for Exhaustive CHAID.

Adjust significance values using Bonferroni method. Adjusts significance values when testing the various category combinations of a predictor. Values are adjusted based on the number of tests, which directly relates to the number of categories and measurement level of a predictor. This is generally desirable because it better controls the false-positive error rate. Disabling this option will increase the power of your analysis to find true differences, but at the cost of an increased false-positive rate. In particular, disabling this option may be recommended for small samples.

Allow resplitting of merged categories within a node. The CHAID algorithm attempts to merge categories in order to produce the simplest tree that describes the model. If selected, this option enables merged categories to be resplit if that results in a better solution.

Chi-square for categorical targets. For categorical targets, you can specify the method used to calculate the chi-square statistic.

Pearson. This method provides faster calculations but should be used with caution on small samples.
Likelihood ratio. This method is more robust than Pearson but takes longer to calculate. For small samples, this is the preferred method. For continuous targets, this method is always used.

Minimum change in expected cell frequencies. When estimating cell frequencies (for both the nominal model and the row effects ordinal model), an iterative procedure (epsilon) is used to converge on the optimal estimate used in the chi-square test for a specific split. Epsilon determines how much change must occur for iterations to continue; if the change from the last iteration is smaller than the specified value, iterations stop. If you are having problems with the algorithm not converging, you can increase this value or increase the maximum number of iterations until convergence occurs.

Maximum iterations for convergence. Specifies the maximum number of iterations before stopping, whether convergence has taken place or not.

Overfit prevention set. (This option is only available when using the interactive tree builder.) The algorithm internally separates records into a model building set and an overfit prevention set, which is an independent set of data records used to track errors during training in order to prevent the method from modeling chance variation in the data. Specify a percentage of records. The default is 30.

Replicate results. Setting a random seed enables you to replicate analyses. Specify an integer or click Generate, which will create a pseudo-random integer between 1 and 2147483647, inclusive.