Data Audit Node Settings Tab

The Settings tab enables you to specify basic parameters for the audit.

Default. You can simply attach the node to your stream and click Run to generate an audit report for all fields based on default settings, as follows:

Use custom fields. Select this option to manually select fields. Use the field chooser button on the right to select fields individually or by type.

Overlay field. The overlay field is used in drawing the thumbnail graphs shown in the audit report. In the case of a continuous (numeric range) field, bivariate statistics (covariance and correlation) are also calculated. If a single Target field is present based on Type node settings, it is used as the default overlay field as described above. Alternatively, you can select Use custom fields in order to specify an overlay.

Display. Enables you to specify whether graphs are available in the output, and to choose the statistics displayed by default.

Median and mode. Calculates the median and mode for all fields in the report. Note that with large datasets, these statistics may increase processing time, since they take longer than others to compute. In the case of the median only, the reported value may be based on a sample of 2000 records (rather than the full dataset) in some cases. This sampling is done on a per-field basis in cases where memory limits would otherwise be exceeded. When sampling is in effect, the results will be labeled as such in the output (Sample Median rather than just Median). All statistics other than the median are always computed using the full dataset.

Empty or typeless fields. When used with instantiated data, typeless fields are not included in the audit report. To include typeless fields (including empty fields), select Clear All Values in any upstream Type nodes. This ensures that data are not instantiated, causing all fields to be included in the report. For example, this may be useful if you want to obtain a complete list of all fields or generate a Filter node that will exclude those that are empty. See the topic Filtering Fields with Missing Data for more information.