Introduction to Data Preparation
As computing systems increase in power, appetites for information grow proportionately, leading to more and more data collection—more cases, more variables, and more data entry errors. These errors are the bane of the predictive model forecasts that are the ultimate goal of data warehousing, so you need to keep the data "clean." However, the amount of data warehoused has grown so far beyond the ability to verify the cases manually that it is vital to implement automated processes for validating data.
Data Preparation allows you to identify unusual cases and invalid cases, variables, and data values in your active dataset, and prepare data for modeling.