Table of contents

Random Forest node

Random Forest© is an advanced implementation of a bagging algorithm with a tree model as the base model.

In random forests, each tree in the ensemble is built from a sample drawn with replacement (for example, a bootstrap sample) from the training set. When splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. Because of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.1

The Random Forest node in Cloud Pak for Data is implemented in Python. The nodes palette contains this node and other Python nodes.

For more information about random forest algorithms, see Forests of randomized trees.

1L. Breiman, "Random Forests," Machine Learning, 45(1), 5-32, 2001.