Data Audit node

The Data Audit node provides a comprehensive first look at the data you bring into IBM® SPSS® Modeler, presented in an easy-to-read matrix that can be sorted and used to generate full-size graphs and a variety of data preparation nodes.

  • The Audit tab displays a report that provides summary statistics, histograms, and distribution graphs that may be useful in gaining a preliminary understanding of the data. The report also displays the storage icon before the field name.
  • The Quality tab in the audit report displays information about outliers, extremes, and missing values, and offers tools for handling these values.

Using the Data Audit node

The Data Audit node can be attached directly to a source node or downstream from an instantiated Type node. You can also generate a number of data preparation nodes based on the results. For example, you can generate a Filter node that excludes fields with too many missing values to be useful in modeling, and generate a SuperNode that imputes missing values for any or all of the fields that remain. This is where the real power of the audit comes in, enabling you not only to assess the current state of your data, but to take action based on the assessment.

Screening or sampling the data. Because an initial audit is particularly effective when dealing with big data, a Sample node may be used to reduce processing time during the initial exploration by selecting only a subset of records. The Data Audit node can also be used in combination with nodes such as Feature Selection and Anomaly Detection in the exploratory stages of analysis.