Filtering rows in data protection rules (Watson Knowledge Catalog)

You can select to filter rows as the action in data protection rules. You can include or exclude rows based on values in a specified column in the same asset or in a reference asset.

Tip: You must have a thorough understanding of the schemas and data values of your data assets to implement these types of rules effectively.

Row filtering methods

When you create a data protection rule, you can select Filter rows from the list of actions and then specify the filter method.

For example, suppose you have a database with tables that combine government, enterprise, and retail client data. The Billing table has data for all types of customers and a CustomerType column that indicates the type of customer. You can create a data protection rule that excludes all rows for government customers. You can either create a rule that excludes all rows if the value of the CustomerType column is government, or create a rule that includes only the rows where the value of the CustomerType column is equal to enterprise or retail.

Now suppose that your Billing table does not include a CustomerType column, but it does include a CustomerID column. You have another table, CustomerInfo, that contains both the CustomerID and the CustomerType columns. You can create a data protection rule that effectively joins the Billing and CustomerInfo tables and filters rows based on the CustomerType values.

Filter method Description Example
Include rows Include only the rows that fit the criteria of the specified values in the specified column of the asset. Include rows if the CustomerType column has values that are equal to enterprise or retail.
Exclude rows Exclude the rows that fit the criteria of the specified values in the specified column of the asset. Exclude rows if the CustomerType column has values that are equal to government.
Include rows based on reference asset Include only the rows that fit the criteria of the specified values in the specified column of the reference asset. Include rows if the CustomerID column values match values in the CustomerID column of the reference data asset for rows where the CustomerType column has values that are equal to enterprise or retail.
Exclude rows based on reference asset Exclude the rows that fit the criteria of the specified values in the specified column of the reference asset. Exclude rows if the CustomerID column values match values in the CustomerID column of the reference data asset for rows where the CustomerType column has values that are equal to government.

Row filtering conditions

The row filtering conditions specify the column and values to filter.

If choose the Include rows or Exclude rows method, you select a column name, select the equals, greater than, lesser than, greater than equals, or lesser than equals operator, and specify a column value. You can add more conditions.

If choose the Include rows based on reference asset or Exclude rows based on reference asset method, you select the reference asset and then match the column in the reference asset to the column in the target asset. Then, you specify a column name in the reference asset to filter on, select the equals, greater than, lesser than, greater than equals, or lesser than equals operator, and specify a column value. You can add more conditions.

Column name guidelines

Value guidelines

Reference asset guidelines

Limitations to filtering rows

Row filtering runs an SQL statement to enforce the rule on the target data asset. The SQL statement is run exactly as coded. The only way to know whether the rule works as expected is to examine the schema of the data assets in governed catalogs that have matching criteria. Any data assets with columns that have similar values but different column names are not affected. For example, if the rule specified a column name of CustomerID, the rule does not affect assets with the column name CustomerIdentifier. You might not receive any direct indication of a rule that did not trigger due to missing or incorrect filter criteria, however administrators can inspect the policy enforcement log to view any such situations. Your data quality must also be high. The values that your data protection rule with row filtering depend on must be consistent. For example, if government is misspelled as goverment or written as Government, that row is not filtered out.

If your filtering criteria includes a reference data asset, the data access might encounter performance delays inherent to the join. For example, the lack of proper indexing or SQL errors due to a missing join object can cause delays.

Previews and downloads might not be available

You can't preview or download an asset that is affected by data protection rules that filter rows in the following situations:

Numeric values can't be strings

You can't include a numeric value in a row filtering condition and process it as a string. For example, you might want to process a numeric postal code as a string. To process a numeric value as a string, you must create the rule with row filtering with an API call. See Create a rule.

Filtering with reference assets requires a second condition

You can't create a simple rule that filters all rows from the target asset that match rows in the reference asset. You must include a second condition. To create a rule with a reference asset and without a second condition, create the rule with an API call. See Create a rule.

Column names can't contain white spaces

The column names that you include in row filtering conditions can't include white spaces.

Learn more

Parent topic: Designing data protection rules