Handling missing values

Input records might contain one or more values that are NULL. These values are known as missing values. The handling of missing values in Intelligent Miner® depends on the algorithm that you are using.
Characteristics
  • Fields defined in a PMML model can explicitly define a validity range. All values outside this range are considered as missing values.
  • Fields defined in a PMML model can define a missing value replacement (PMML 2.0 and higher). In this case, all missing values are replaced by a valid value indicated in the model.
Classification
  • Neural Classification: In IBM® models, if none of the activations of the output neurons is above a certain threshold limit, DM_getPredClass returns NULL. Other models always predict a value. DM_getConfidence always returns a value.
  • Tree Classification: The handling of missing values depends on whether the model is generated by an IBM product or by a non-IBM product.
    Models generated by Intelligent Miner
    With IBM models, a sophisticated value treatment is used. If a missing value occurs, the record being scored is fed into both child nodes (binary tree) of the tree node requiring the missing value. This process continues until the record reaches a leaf node. Thus, a record is assigned to more than one leaf node. Tree Classification aggregates all these leaf nodes, and DM_getPredClass returns the value assigned to this aggregated node.
    Models generated by a non-IBM product
    If a handling strategy for missing values is defined in the PMML model, missing values are handled accordingly. If the handling of missing values is not defined, the scoring process stops at the first tree node requiring a missing value, and DM_getPredClass returns the value assigned to this (non-leaf) node.
  • Logistic Regression: If a substitute for a missing value is defined in the PMML model, it is used. Otherwise, no prediction is possible.
Clustering
  • Distribution-based Clustering: Missing values are ignored and the corresponding field is not included in the scoring process. If all values of the record are missing, NULL is returned.
  • Center-based Clustering: If all the values of the record are missing, NULL is returned.
Regression
  • Transform Regression. The Transform Regression models can handle missing values so that a numeric prediction value is always returned.
  • Linear Regression and Polynomial Regression:
    • Numeric variables: If a missing value replacement (PMML 2.0 or higher) is present, this will be taken. If a mean value is given in the PMML, that will be taken. Otherwise no prediction is given.
    • Categorical variables: If a missing value replacement (PMML 2.0 or higher) is present, that will be taken. Otherwise no prediction is given.
    • If all input variables are missing values, no prediction will be given. The function DM_getPredValue returns NULL.
  • Neural Regression: If all values of the record are missing, NULL is returned.
  • RBF Prediction: Missing values are ignored, and the corresponding field is not included in the scoring process. If all values of the record are missing, DM_getPredValue and DM_getRBFRegionID return NULL values.


Feedback | Information roadmap