Data preparation

Data preparation is a pre-analysis step that is used by most data analytic algorithms to ensure that the data is suitable for analytic use.

Overview

Data preparation is critical in IBM® Cognos Analytics. Only prepared data is entered into analysis for key drivers, decision trees, and relationships that are displayed in the advanced analytics visualizations: Spiral, Driver analysis, Decision tree, Sunburst, and Explore relationships. Data is not automatically prepared for other visualizations and their corresponding insights.

Algorithms

All applied algorithms are based on values of a single field at a time. Missing values are removed or handled for each field, all numeric predictor driver fields are binned. All categorical fields are adjusted for large number of categories and outliers are handled in the target field. While all data preparation influences the analysis results, corresponding data preparation summaries are not currently reported to you.

Details

Data preparation and subsequent key drivers, decision trees and relationships are based on a data sample with approximately 10,000 rows when the original data is larger. Bernoulli random sampling, equal probability without replacement random sampling, is applied to uploaded data and any connected data sources that are supporting random sampling. Otherwise, systematic sampling is used.