Support of data sets that contain missing values

When you create regression trees models, you can use data sets that contain missing values. Be aware, however, that missing values affect operations that rely on the availability of attribute values.

If the values of one or more attributes are missing for some data sets, you cannot do the following actions as usual:

  1. Class distribution calculation because a class counts for each node-attribute-value
  2. Split evaluation because class impurity counts for each node-attribute-value
  3. Split application because the splitting data is based on equality conditions or inequality conditions

You can, however, use data sets with missing values that cannot be processed for the creation of regression tree models and for prediction. Although this process increases the computational expense, the impact of missing values on model quality and prediction quality is kept as low as possible.