Data preparation for numeric fields

A field is treated as numeric whenever it contains numeric information and its usage property is set to measure.

Overview

Because numeric data can be varied in their distribution, IBM® Cognos Analytics transforms non-target numeric fields into ordinal bins, reducing the dependence of analytic algorithms on the format of numeric data.

Algorithms

The basic algorithm that is used is equal frequency binning. Numeric data is divided into a fixed number of bins that are attempting to put an equal number of rows of data into each bin. Missing values are placed in their own bin. Cognos Analytics attempts to use knowledge about missing values in predictor fields to build a better model. For example, if a field of data represents when an item was tested, Cognos Analytics uses missing values (which might represent that an item was never tested) to help predict the values of other fields.

Details

Certain field exclusion criteria apply to numeric fields. A numeric field is excluded from further analysis if it has only a single value, including the missing value. Otherwise, the numeric field is binned and the default number of bins is 5. If a field has no more than 10 unique numeric values, then binning is not attempted, and each unique value is given its own category. If zero occurs in more than 40% of rows, it is always given a separate category. Missing values are placed in their own bin and do not affect the binning procedure.