Analyzing source data

You use the Investigate stage to analyze the quality of the source data. The Investigate stage helps you determine the business rules that you can use in designing your data cleansing project.

The Investigate stage indicates the degree of processing needed to create the target cleansed data. Investigating data identifies errors and validates the contents of fields in a data file. This investigation lets you identify and correct data problems before they infect new systems.

The Investigate stage analyzes data by determining the number and frequency of unique values, and classifying or assigning a business meaning to each occurrence of a value within a column. The Investigate stage has the following capabilities:

The Investigation reports, which you can generate from the IBM® InfoSphere Information Server Web console by using data processed in the investigation job, can help you evaluate your data and develop better business practices.