Input records might contain one or more values that are NULL. These
values are known as missing values. The handling of missing values
in
Intelligent Miner® depends
on the algorithm that you are using.
- Characteristics
-
- Fields defined in a PMML model can explicitly define a validity
range. All values outside this range are considered as missing values.
- Fields defined in a PMML model can define a missing value replacement
(PMML 2.0 and higher). In this case, all missing values are replaced
by a valid value indicated in the model.
- Classification
-
- Neural Classification: In IBM® models,
if none of the activations of the output neurons is above a certain
threshold limit, DM_getPredClass returns NULL. Other
models always predict a value. DM_getConfidence always
returns a value.
- Tree Classification: The handling of missing values depends
on whether the model is generated by an IBM product
or by a non-IBM product.
- Models generated by Intelligent
Miner
- With IBM models, a sophisticated
value treatment is used. If a missing value occurs, the record being
scored is fed into both child nodes (binary tree) of the tree node
requiring the missing value. This process continues until the record
reaches a leaf node. Thus, a record is assigned to more than one
leaf node. Tree Classification aggregates all these leaf nodes, and DM_getPredClass returns
the value assigned to this aggregated node.
- Models generated by a non-IBM product
- If a handling strategy for missing values is defined in the PMML
model, missing values are handled accordingly. If the handling of
missing values is not defined, the scoring process stops at the first
tree node requiring a missing value, and DM_getPredClass returns
the value assigned to this (non-leaf) node.
- Logistic Regression: If a substitute for a missing value
is defined in the PMML model, it is used. Otherwise, no prediction
is possible.
- Clustering
-
- Distribution-based Clustering: Missing values are ignored
and the corresponding field is not included in the scoring process.
If all values of the record are missing, NULL is returned.
- Center-based Clustering: If all the values of the record
are missing, NULL is returned.
- Regression
-
- Transform Regression. The Transform Regression models can
handle missing values so that a numeric prediction value is always
returned.
- Linear Regression and Polynomial Regression:
- Numeric variables: If a missing value replacement (PMML 2.0 or
higher) is present, this will be taken. If a mean value is given in
the PMML, that will be taken. Otherwise no prediction is given.
- Categorical variables: If a missing value replacement (PMML 2.0
or higher) is present, that will be taken. Otherwise no prediction
is given.
- If all input variables are missing values, no prediction will
be given. The function DM_getPredValue returns NULL.
- Neural Regression: If all values of the record are missing,
NULL is returned.
- RBF Prediction: Missing values are ignored, and the corresponding
field is not included in the scoring process. If all values of the
record are missing, DM_getPredValue and DM_getRBFRegionID return
NULL values.