Data Preparation - IBM SPSS Statistics

What SPSS Data Preparation can do for your business

IBM® SPSS® Data Preparation performs advanced techniques to streamline the data preparation stage — delivering faster, more accurate data analysis results. Choose from an automated data preparation procedure for fast results or select other methods to prepare more challenging data sets. Easily identify suspicious or invalid cases, variables and data values. View patterns of missing data, summarize variable distributions and more accurately work with algorithms designed for nominal attributes.

This module is included in the SPSS Professional edition for on premises, and in the base edition for subscription plans.

Feature spotlights

Variables tab

The Validate Data dialog is used to validate your data. The variables tab shows variables in your file. Start by selecting the variables you want and moving them to the Analysis Variables list.

Basic checks

You can specify basic checks to apply to variables and cases in your file. For example, you can obtain reports that identify variables with a high percentage of missing values or empty cases.

Standard and custom rules

Apply rules to individual variables that identify invalid values — values outside a valid range or missing values. You can also create your own rules, cross-variable rules or apply predefined rules.

Recommendations

Automated data preparation delivers recommendations and allows users to drill in and examine the recommendations.

Prepare data in a single step — automatically

Manual data preparation is a complex and time-consuming process. When you need results quickly, the ADP procedure helps you detect and correct quality errors and impute missing values in one efficient step. The ADP feature provides an easy-to-understand report with comprehensive recommendations and visualizations to help you determine the right data to use in your analysis.

Additional options for data preparation

Perform automatic data checks and help eliminate time-consuming, tedious, manual checks by using the validate data procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level — whether categorical or continuous. Then, determine data validity and remove or correct suspicious cases at your discretion prior to analysis.

Access to a range of features

SPSS Data Preparation includes features like data validation, automated data preparation, optimal binning and identification of unusual cases.

Read the documentation

Bin or set cut points for scale variables

With the optimal binning procedure, you can more accurately use algorithms designed for nominal attributes, such as Naive Bayes and logit models. Optimal binning enables you to bin — or set cut points for — scale variables.

Select from three types of optimal binning

Choose one of these types of optimal binning for preprocessing data prior to model building:

1) Unsupervised: Create bins with equal counts.
2) Supervised: Take the target variable into account to determine cut points. This method is more accurate than unsupervised; however, it is also more computationally intensive.
3) Hybrid approach: Combine the unsupervised and supervised approaches. This method is particularly useful if you have a large amount of distinct values.

Technical details

How to buy SPSS Data Preparation

For on premises: Purchase the Professional edition
For Subscription plans: Purchase the Base edition

See a complete list of software requirements

Hardware requirements

Processor: 2 GHz or faster
Display: 1024*768 or higher
Memory: 4 GB of RAM required, 8 GB of RAM or more recommended
Disk space: 2 GB or more

See a complete list of hardware requirements

Take the next step

Try SPSS Statistics at no cost

Compare products and pricing

More ways to explore

Documentation

Community