Masking

Masking allows a security administrator to define rules for a column that define which values a user can see. These rules are written in SQL. While the concept of masking a particular column is simple, the implications for how that mask is used within a database management system, and impact of the mask on applications, are complex. Functional details of the mask behavior affect performance, query correctness and the security provided by those masks.

In Db2 12.1.1 and earlier, only a single masking semantic, AT RESULT, is provided. While this mask semantic integrates seamlessly with applications, it can sometimes infer data. Db2 12.1.2 introduces several new options to address inference:
  • A new masking semantic called AT READ.
  • The ability to restrict how masked columns are used within SQL statements
Each option has tradeoffs in terms of ease of use within applications, protections from inference, and performance impacts.

The AT RESULT mask semantic

The default masking semantic is AT RESULT. This semantic only affects the values in the final result table.

The mask does not interfere with the operations of other clauses within a SQL statement, so masks can apply to existing application SQL, usually without any changes. Examples of clauses include the following:
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT DISTINCT
  • ORDER BY
The statement runs as if the mask were not present, and only the final result set is masked. However, this ease of use also allows for the inference of data.
A simple example is a table with a name and salary, and a mask on the salary. Consider the following SQL statement:
SELECT  name, salary FROM employee WHERE salary > 1000000
The result returns data with the salary column masked, but with the inferred knowledge that while the salary is not known, the unmasked value was greater than one million.

While this inference is generally not a problem for applications with controlled SQL, it can be for users that issue arbitrary SQL. In these cases, the security administrator might want to use something like auditing to validate that such users are not attempting to infer data of masked columns.

The AT READ mask semantic

The AT READ masking semantic is counter to the AT RESULT semantic. The AT READ semantic is applied immediately within the statement, before any other operation of the statement. Depending on the nature of the mask, this feature can completely address any inference problem.

Take the previous salary example, and imagine the mask is to transform the salary to zero, for users who are not permitted to see the salary per the mask rule. The statement
SELECT  name, salary FROM employee WHERE salary > 1000000
returns zero rows, as all salary values for this user are masked to zero and this rule is applied before the evaluation of the WHERE clause.

Masks that map many or all values to a single value, such as NULL or 0, rarely have inference problems. However, when using uniqueness when preserving masks, ensure that the mask is not known to users or include the masking function inside their query.

While the previous greater-than ( > ) predicate does not work, and equality predicate ( = ) does:
SELECT name, salary FROM employee WHERE salary = MASKFUNC(1000000)

With an AT READ mask, the masking is applied before the WHERE clause being evaluated. This protection from inference, however, can negatively affect performance. The mask is applied before any other operation in the query and can result in table scans where the mask is applied to every row, to determine which rows can further participate in the query. Index on expressions that use the mask can help in these scenarios.

In general, use of a column with a MASK AT READ in a statement, outside of the select list, can expect to have poor performance. Some operations might not work as expected. For example in a zeroing mask where all values are masked to 0 for a user, issuing a join on that column to another table does not produce the expected results, as the mask happens before the join operation, so all rows have the same value. For both semantics, database operations outside the scope of query are unaffected by the mask. For example, for enforcement of referential integrity, population of materialized query tables, or population of trigger transition variables, all operate on unmasked data. Only a single mask can be applied to a column, regardless of the semantic used. Masks of differing semantics can exist on different columns within the same table. A SQL statement can refer to columns with differing masking semantics. The semantic is specified as part of the CREATE MASK statement and can not be changed without recreating the mask.

Restrictions on masks

In addition to the masking semantic, the use of the masked column can be prevented in scenarios that might lead to inference, such as MASK AT RESULT. Use of the masked column can also produce unexpected results or poor performance when MASK AT READ is used.

These uses of masking semantics result in an error, SQL20478 RC 42, if the masked column is used in a restricted operation. If RESTRICT USAGE is included in a CREATE MASK statement, including the masked column in any of the following SQL clauses generates the SQL20478 RC 42 error:
  • WHERE.
  • HAVING.
  • Join conditions.
  • IN ORDER BY.
  • OLAP functions that include ORDER BY.
  • Aggregation Functions LISTAGG, ARRAYAGG, XMLAGG.
  • Set operators UNION, EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL.
If we look again at the first example SQL statement, an error is generated as the masked column, salary, is used in a WHERE predicate:
SELECT name, salary FROM employee WHERE salary > 1000000
The same restrictions apply to the use of the RESTRICT USAGE ALLOW JOINS clause. Joins are allowed, however, if they meet the following conditions:
  • It is a bare reference to the column, not input to an expression.
  • The other side of the relational operator is also a bare reference to a column that has a column mask defined on it.