Introduction to data cleansing

IBM® InfoSphere® QualityStage® provides a methodology and development environment for cleansing and improving data quality for any domain.

Your organization's data contains valuable information that your organization needs in order to conduct business, whether it is managing customers and products, managing operations, evaluating corporate performance, or providing business intelligence. InfoSphere QualityStage helps you deliver and maintain data quality so that your organization can rely upon its corporate data investment.

Data is high quality when it is up-to-date, complete, accurate, and easy to use. Depending on your organizational goals, high quality data can mean any of the following items:
  • Your customer records do not include duplicate records for the same person.
  • Your inventory records do not include duplicates for the same materials.
  • Your vendor records do not include vendors you no longer use or suppliers no longer in business.
  • You can be confident that Paul Allen and Allen Paul are records for two different customers, not the result of a data entry mistake.
  • Your employees can find the data they need when they need it. Confident that they are working with high quality data, they do not need to create their own individual version of a database.

Whether your organization is transitioning from one or more information systems to another, upgrading its organization and its processes, or integrating and leveraging information across the enterprise, your goal is to determine the requirements and structure of the data that will address the organizational goal. Data that is restructured to conform to these new requirements is called cleansed data (sometimes referred to generally as data reengineering).