Masking data by using Mask at Read semantics in Data Virtualization

Configure Mask at Read semantics so that Data Virtualization masks your data before evaluating the query.

About this task

Enable the Mask at Read global configuration setting to apply masking rules to your data before Data Virtualization evaluates any part of the query. The default setting is Mask at Result, which applies masking to the result set instead.

With Mask at Read, you add an extra layer of protection to your data by controlling how column masking rules defined by IBM® Knowledge Catalog are applied.

For more information on masking tables in Data Virtualization with IBM Knowledge Catalog, see Masking virtual data in Data Virtualization.

Mask at Read affects the following clauses:
  • WHERE
  • JOIN (INNER / LEFT / RIGHT / LEFT OUTER / RIGHT OUTER / FULL OUTER)
  • HAVING
  • IN
  • GROUP BY
  • ORDER BY

Required roles

Before you begin, you must have the following role:
  • Administrator role

Use cases

Mask at Read example

With Mask at Read, data is masked with the value 0, preserving your data privacy throughout the query lifecycle.

Consider the following query that you might use to determine the salary of Holly Rhodes:
SELECT Salary FROM employee_info WHERE Name='Holly Rhodes' AND Salary=132000
If you run the query with Mask at Read enabled, the constant Salary=132,000 is evaluated against the masked salary value of 0 (instead of 132,000), and therefore returns 0 records:
SALARY
-----------

  0 record(s) selected.

The clauses HAVING, GROUP BY, ORDER BY or JOIN on the Salary column would also operate on masked values.

Mask at Read example using masked query predicates
Consider the case where you need to fetch anonymized PATIENT_ID, and then issue follow-up queries predicated on that anonymized ID, without knowing the real (unmasked) PATIENT_ID. Unlike Mask at Result semantics, with Mask at Read semantics you do not need to know and use unmasked values to issue query predicates.
For example, assume the same masking expression applies to PATIENT_ID column in all tables to the given user.
You can run this query, which returns a result set with masked PATIENT_ID:
SELECT * FROM PATIENTS WHERE GROUP=<group_id>
You can also run this query without knowing the real PATIENT_ID:
SELECT …​ FROM VISITS, RECORDS JOIN ON PATIENT_ID WHERE PATIENT_ID=<masked_patient_id>

Restrictions

  • If you enable Mask at Read, Data Virtualization evaluates row filtering rules that reference masked columns by using the unmasked values of the reference table.
  • Query predicates are not pushed down when they reference masked columns, which might cause a negative impact on query performance compared to Mask at Result (default behavior). Instead Data Virtualization does the following actions:
    1. Fetch all of the data from the source objects participating in the query.
    2. Mask all of the rows with applicable data protection rules.
    3. Apply the predicates.
  • If you reference a masked column in a query predicate (like in a WHERE clause), and apply a type casting function (such as CAST, VARCHAR, INTEGER and etc.), the Data Virtualization query optimizer does not push down that type cast to the data source. Instead, the query optimizer applies type casting after the masking function is applied. As a result, data might be retrieved from the source in a data type that is not compatible with the column data type of the virtualized table, causing an SQLCODE=-420, SQLSTATE=22018 error. To remediate this issue, ensure you specify query predicates that match the data type of the virtualized column.

    For example: If the column AGE was virtualized as CHAR/VARCHAR type, then specify AGE='75' instead of AGE=75.

Procedure

Applying the Mask at Read setting
To apply masking to data before evaluating the query, run this procedure:
CALL DVSYS.SETWKCPOLICYOPTS('MASK_SEMANTICS=READ')
Reverting to the default Mask at Result setting
To revert to the default behavior where queries are evaluated on unmasked data, while masking the result set, run this procedure:
CALL DVSYS.SETWKCPOLICYOPTS('MASK_SEMANTICS=RESULT')
Check your currently configured masking semantics
To check if you are currently configured to use Mask at Read or Mask at Result, run this query:
SELECT VALUE FROM DVSYS.INSTANCE_INFO WHERE KEY='WKC_POLICY_OPTS'
Note: An empty string is equivalent to Mask at Result (default behavior).