Data rules analysis
At the heart of data rules analysis, the business wants to build data rules to test and evaluate for specific data conditions.
The data rule analysis function within IBM® InfoSphere® Information Analyzer is the component by which you develop a free-form test of data. Collectively, data rules can be used to measure all of the important data quality conditions that need to be analyzed. You can also establish a benchmark for data rule results against which the system will compare the actual results and determine a variance. Data rule analysis can be used on a one-time basis but is often used on a periodic basis to track trends and identify significant changes in the overall data quality condition.
- What data is involved?
- Are there multiple parts or conditions to the validation?
- Are there known ‘qualities' about the data to consider?
- What are the sources of data (for example, external files)?
- Are there specific data classes (for example, dates, quantities, and other data classes) to evaluate?
- Are there aspects to the ‘rule' that involve the statistics from the validation?
- Are there aspects to the ‘rule' that involve understanding what happened previously?
As you address these questions, it is also critical to follow some basic guidelines.
- Know your goals
- Data quality assessment and monitoring is a process and not everything happens in a product. Data is used by the business for specific business purposes. Ensuring high quality data is part of the business process to meet business goals. Understanding which goals are most critical and important, or which goals can be most easily addressed, should guide you in identifying starting points.
- Keep to a well-defined scope
- Projects that try to do it all in one pass generally fail. You should keep in mind that it is reasonable and acceptable to develop the data rule incrementally to drive ongoing value. Key business elements (sometimes called KBEs or critical data elements) are often the first targets for assessment and monitoring as they drive many business processes.
- Identify what is relevant
- Identify the business rules that pertain to the selected or targeted data elements.
- Document the potential sources of these elements that should be evaluated. These can start with selected systems or incorporate evaluations across systems.
- Test and debug the data rule against identified sources to help ensure the quality of the data rule and more importantly the quality and value of the output.
- Remove the extraneous information produced. Not all information
generated by a data rule is necessarily going to resolve data quality
issues. Important: A field might fail a rule testing for whether it contains Null values. If it turns out that the field is optional for data entry, the lack of this information might not reflect any specific data quality problem (but might indicate a business or data provisioning problem).
- Identify what needs further exploration
- Expand the focus with new and targeted rules, rule sets, and metrics. As you create more rules and evaluate more data, the focus of data quality assessment and monitoring will expand. This will often suggest new rules, broader rule sets, or more specific metrics be put into process beyond the initial set of quality controls.
- Periodically refresh or update
- Utilize the knowledge that you already gained. Just as business processes and data sources change, so should the rules, rule sets, and metrics evaluated.