Regression tree

A regression tree is a type of decision tree. It uses sum of squares and regression analysis to predict values of the target field. The predictions are based on combinations of values in the input fields.

A regression tree calculates a predicted mean value for each node in the tree. This type of tree is generated when the target field is continuous.

The algorithmic details are too complicated to describe here. You can see some of the statistics in the tooltips for the nodes in the decision tree visualization. Each node is split into two or more child nodes to reduce the sum of squares for the node. Some of squares is a function that penalizes target values distant from the mean. Both mean and standard deviation are displayed for each node. Sum of squares is directly related to the standard deviation and the number of data rows corresponding to the node. Child nodes corresponding to given predictor categories are merged when the increase in sum of squares is tolerable within specified limit. For each node, the predictor that reduces the sum of squares the most is selected for splitting the node.

The process of building a decision tree starts with the root node that corresponds to all the rows in data. Any node is split into child nodes until no further improvement in sum of squares is possible, or the number of rows corresponding to the node becomes too small. Process also stops if the number of nodes in the decision tree becomes too large.

R2 is used to estimate regression tree predictive strength. Reliable predictive regression tree is reported when its predictive strength is greater than 10% threshold.