Automation rules

Use automation rules to automate the process of governing your data.

Automation rules help you automate some of the tasks that you run on your data to ensure the highest data quality. These tasks include applying rule definitions and data quality dimensions, assigning terms or stewards to assets, analyzing assets and more.

Automation rules are if ... then rules, which you create by dragging block elements. If the specified conditions are met, the specified action is taken. The example syntax of the rule might be if the asset is classified by the data class Account number, then bind the data rule definition FieldIsNumeric.

Automation rules are run automatically as a part of the discovery, and data quality analysis process.

Prerequisites
  • InfoSphere® Information Analyzer must be installed.
  • To view automation rules, Information Governance Catalog User role, or higher is required.
  • To create, edit, delete automation rules, and to generate suggested automation rules, Information Analyzer Project Administrator role is required, and a role to access Information Governance Catalog.
  • To run automation rules, configure the project where you want to run them. Go to Quality > project name > Settings. In the Data quality tab, select the Enable automation rules option.

Automation rules status

Automation rules can have the following status values:
Accepted
This status indicates that the rule is active and that the assets that are specified in the rule logic are affected. Set this status when you want the rule to be active.
You can set this status when you create or edit a rule.
Candidate
This status indicates that the rule isn't active, and the rule logic doesn't have any impact on the specified assets. Set this status when the rule isn't ready to be used yet.
You can set this status when you create or edit a rule.
Error
This status indicates that the rule is invalid. For example, an asset that was used in the rule logic was deleted from the catalog.
This status is set automatically, when invalid logic is detected. You can see this status in the details page of a rule.
Important: When you change the automation rule status from accepted to candidate, all the changes that were applied by the rule are reverted. For example, when an automation rule bound quality rules to columns, the connection between these quality rules and columns is removed.

Automation rules structure

Automation rule structure consists of the following types of elements:
  • logic
  • conditions
  • actions
Logic
The basic logic is if condition then action. You can expand this logic by adding more conditions and joining them by using and, and or operators.
Conditions
  • the name of the asset is asset_name - the rule is applied for the selected asset.
  • the asset has the term term_name assigned - the rule is applied when the asset has the specified term assigned.
  • the asset has a term assigned with the attribute attribute_name which has the value value - the rule is applied when the asset has a term with the specified attribute set to the specified value.
  • the asset has the label label_name assigned - the rule is applied when the asset has the specified label assigned.
  • the asset is a asset_type - the rule is applied if the asset is of the specified type. You can select between columns and tables.
  • the asset has the attribute attribute_name assigned which has the value value - the rule is applied when the asset has the specified attribute set to the specified value.
  • the asset is classified by the data class data_class_name - the rule is applied when the asset is classified by the specified data class.
  • the asset is contained in the host host_name - the rule is applied if the specified host contains the asset.
  • the asset is contained in the schema schema_name - the rule is applied if the specified schema contains the asset.
  • the asset is contained in the data source datasource_name - the rule is applied if the specified data source contains the asset.
  • the asset is in the project data_quality_project_name - the rule is applied if the asset is in the specified project.
  • the quality score of the asset is more/less than 1-100% - the rule is applied if the quality score of the asset is more or less than the specified value.
  • the quality score of the asset is above/below its quality threshold - the rule is applied if the quality score of the asset is above or below the asset's quality threshold.
Actions
  • bind the data rule definition data_rule_definition_name - if the conditions are met, the specified data rule definition is bound to the asset.
  • use the data quality dimension data_quality_dimension_name - if the conditions are met, the specified data quality dimension is used.
  • use all the data quality dimensions - if the conditions are met, all data quality dimensions are used
  • set the data quality score threshold to 1-100 % - if the conditions are met, the data quality score threshold is set to the specified value.

Running automation rules

Automation rules are automatically run as the part of the data quality analysis or a discovery process.
Important: You must configure a project where you run a discovery or analysis before the automation rules are applied:
  1. In the Quality tab, find the project for which you want to run a discovery or analysis.
  2. In the project settings, open the Data quality tab.
  3. Select the Enable automation rules option.
To run automation rules, choose one of the following ways:

Importing, exporting, and deleting automation rules from command line

You can import, export, and delete automation rules by using the IAAdmin commands. For more information, see the Commands to import, export, and delete automation rules topic.

Example scenario

Your data contains information about passports of your customers. You want to make sure that the format of the passport values is correct. You need to create a rule that analyzes all database columns that have the term Passport assigned to make sure that the passport number complies with the standard passport number formatting. As a prerequisite to this task, you must have a data rule definition PassportNumMatchesRegex. Complete the following steps:
  1. Create the term Passport in Catalog > Create > Term.
  2. Look for all columns that contain passport values in catalog search. In Catalog, select Database Column as the asset type, enter passport, and press Enter. In the results list, you can use advanced filters to narrow down search results.
  3. For each column that contains passport numbers, complete the following steps:
    1. Open the details page of a column, and from the menu click Edit.
    2. In the Assigned to Terms field, find your newly created term and add it to the list.
    3. Save the changes.
  4. Create an automation rule in Catalog > Create > Automation rule.
  5. Add the name Check passport numbers, and provide description If the term passport is assigned to an asset, then automatically bind the asset to the data rule definition PassportNumMatchesRegex.
  6. Set the status to Accepted to activate the rule.
  7. Specify the rule logic.
    1. Click Conditions, and select the asset has the term assigned. Drag it to the if element, and find the term Passport. Save it.
    2. Click Actions, and select bind the data rule definition. Drag it to the then element, and find the data rule definition PassportNumMatchesRegex. Save it.
    3. Save the rule. As a result, the data rule definition PassportNumMatchesRegex is bound as a quality rule to all columns that have the term Passport assigned.
  8. To apply your newly created rule on assets, you must run the discovery with data quality analysis task.
    1. As a prerequisite, open project settings and Data quality tab. Select the Enable automation rules option.
    2. Run discovery in Connections > Discover assets. You must select the Analyze data quality task.
  9. To review the discovery results, and see whether your automation rule discovered any cases of incorrect passport format, go to Connections > Discovery results.