You use rule definitions to develop rule logic to analyze your data. Rule definitions follow a basic syntax where a variable, such as a word or term, is evaluated based on a specified condition or type of check.
The specified condition or check might require some additional reference value such as another variable, a list of values, or a specified format. IBM® InfoSphere® Information Analyzer rule logic evaluates to a true or false value, and sets up pass or fail checks to evaluate the quality of your data. Rule definitions represent a logical expression that can include multiple conditional expressions. For example, several conditions can be connected together with IF THEN, AND, or OR clauses.
Incomplete, empty, or invalid data values affect the quality of the data in your project by interrupting data integration processes and by using up memory on source systems. You can create rule definitions to analyze data for completeness and validity to find these anomalies.
A rule definition is a true or false (Boolean) expression that contains various conditions to evaluate data records. After you transform your rule definition into a data rule, you can run the data rule against data in your project. Each record you run the data rule against either meets or does not meet the logic defined in the data rule. You can test the quality of your data based on the number of records that meet or do not meet the rule logic. Rule definitions can contain simple tests or complex nested Boolean conditions.
To create the rule logic, define a source value and select a condition to evaluate. Rule definitions can be built from various components. You can include physical sources, such as standard metadata sources that are imported by using InfoSphere Information Analyzer, or logical sources, such as glossary terms or words that you create to map to physical sources. You can choose any words you want to describe the source or reference values in your rule definitions. For example, if you have a date of birth value in your source data or reference data fields, you could type in a common word or abbreviation such as "DOB," "Date_of_Birth," or "Birthdate." You could also use a specific data source field or column name such as "SystemA_DATE_BIRTH."
In the rule definition state, the rules are logical representations. Eventually you turn the rule definitions into data rules, and bind the logical representations with real data from your physical data sources. Data rules are objects that you run to produce specific output. When you build data rule definitions, you create representational rule logic to map data that you want to analyze.
If you use physical data sources when building your rule logic, then a logical variable representing the physical source is created, and is the default binding for this variable when you transform the data rule definition into a data rule. When you transform the data rule definition into a data rule, you can either keep the default binding, or you can modify the setting and bind the logical variable to a different physical data source.
IF Age < 18 THEN Can_Vote = 'false'
In this example, "Age" and "Can_Vote" might be fields
in one of the data sources in the project that you plan to use when
you turn the rule logic into a data rule. This data rule definition can be written in any terms that you want to use. For example, you can type "Age," "voter_age," or you can create a logical variable for "Age," by highlighting an "Age" column in one of your data sources and clicking Add to Logic. When creating the data rule definition, you can type the components of the rule logic in any way that you prefer. When you generate a data rule, you transform the logical representations into elements that are bound to real data in your data sources.
DateOfBirth IS_DATE
This condition indicates that a variable called DateOfBirth must be in a recognized date format.
IF DateOfBirth EXISTS
AND DateOfBirth > dateValue(‘1900-01-01’)
AND DateOfBirth < date()
THEN CustomerType = ‘P’
In this example, there is a conditional statement that checks whether the variable DateOfBirth exists and is within a set range. If those conditions are met, another variable CustomerType is tested to see if it is equal to the value, P.