Masking virtual data in Data Virtualization

You can use a data protection rule to mask virtual data. When queried, masked columns return disguised data.

Tech preview This is a technology preview and is not supported for use in production environments.

Data masking applies to the result sets of the queries only. The original data in tables and columns remains untouched. Masking does not apply to query predicates or views. You can use data masking to avoid exposing sensitive data. However, data masking does not stop Data Virtualization users from connecting to the service and running queries against that data. Users can join and group data, generate the reports, perform analytics, and collect insights by using the raw data, while masking the result set only.

Data masking rules apply only to the following objects in Data Virtualization:

  • Virtualized objects
  • Native Db2® tables
  • Materialized Query Tables (MQTs)

Depending on the method of data masking, data is redacted, substituted, or obfuscated. See Masking data with data protection rules for details. A lock icon (Lock icon) next to the column name in your virtualized data indicates that the data in the column is masked by a data protection rule.

When you apply masking rules to virtual data, keep in mind the following considerations:

Data type support for masking virtual data

Data Virtualization supports data masking on columns of several data types. Masking operations behave differently based on the operation and the data type of the column.
Redact action data mapping
The following list describes which data types are mapped for Redact action in Data Virtualization.
  • VARCHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
  • CHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
  • Numeric is masked to 0.
  • DATE is masked to '0001-01-01'.
  • TIME is masked to '00:00:00'.
  • TIMESTAMP is masked to DATE('0001-01-01').
  • BOOLEAN is masked to 0.
Substitute action data mapping
CHAR, VARCHAR
Substitution displays a hexadecimal hash of the original value up to a 64 digits or the column length, whichever is shorter.
SMALLINT, INT, BIGINT
Substitution displays a numeric hash of the original value. The resulting hash has from two digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
DECIMAL, DOUBLE, REAL
Substitution displays a numeric hash of the original value. The resulting hash has from 2 digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
Substitution generates the same number of fractional digits as the original value for up to 2 digits after decimal point.
Other data
Falls back to the Redact action.
Obfuscate action data mapping
Obfuscation support is limited to the following data classes only:
  • Personal Information
    • Gender
    • Honorific
    • Name suffix
  • Contact Details
    • USPHN
    • STREET
    • ADDRESSLINE1
    • ADDRESSLINE2
    • ADDRESSLINE3
    • CITY
    • STATEPROVNAME
    • COUNTRY
    • Latitude
    • Longitude
  • Financial Accounts
    • American Express card
    • Diners club card
    • Discover card
    • Mastercard
    • VISA card
    • Credit card
    • Japan Credit Bureau (JCB)
  • Government Identities
    • SSN
    • SSN4
  • Organization Information
    • DUNS Number
    • CAGE CODE
    • USSIC
  • Connectivity Data
    • IP v4 address
  • Personal Demographic Information
    • RELIGION
    • ETHNICITY
    • EYECOLOR
    • HAIRCOLOR
    • MARITALSTATUS
    • HOBBY
    • EMPLOYMENTSTATUS
  • Other
    • USSCP
    • RELATIONSHIP
If obfuscation cannot be performed, the behavior falls back to Substitute.
Note: If obfuscated value is longer than the column length (value does not fit), masked data falls back to the Redact action to avoid overflowing column length.

Limitations and known issues for masking virtual data

The following limitations and known issues apply to Data Virtualization for data masking.
  • Data masking is limited to the following data types:
    • Character strings (CHAR, VARCHAR)
    • Numeric (SMALLINT, INT, BIGINT, DECIMAL, DOUBLE, REAL, DECFLOAT)
    • Date and time (DATE, TIME, TIMESTAMP)
    • Boolean
  • If a data masking rule applies to a column of an unsupported data type (for example, CLOB), you will see a Db2 error with SQLCODE=-245, SQLSTATE=428F5.
  • GROUP BY expressions on masked columns are not pushed down to data sources.
  • Aggregate functions such as SUM might cause numeric overflow in substitution masking. You can cast the column to a higher degree type as shown in the following example.
    SELECT SUM(CAST(VALUE AS DECIMAL(31,0))) FROM PROJECT1.CUSTOMERS
  • Masking is not applied to query predicates such as WHERE, GROUP BY, JOIN, HAVING, SELECT DISTINCT, and ORDER BY clauses in an SQL statement.
  • Data masking rules cannot apply to views directly, but the result sets of views are masked according to the data protection rules that apply to the objects referenced by the view. You can mask identifying details from tables that are referenced in the view definition:
    1. Ensure that the virtual tables that are referenced in a view definition are virtualized tables with data masking support:
      SELECT BSCHEMA, BNAME, BTYPE FROM SYSCAT.TABDEP WHERE TABSCHEMA = <View schema>  AND TABNAME =<View name>

      If BTYPE for all referenced objects is N or T, your view is referencing virtualized tables with data masking support.

    2. Publish the referenced tables to the catalog and define masking rules that apply to these virtual tables.
    Note: When the objects that are referenced in the views are masked, the views are masked as well.
  • For the data masking rules to apply, virtual objects must be captured in a governed catalog and use the Data Virtualization connector.
  • Masked columns cannot be used in SELECT (including views) derived from a set operation that involves an EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL, or UNION set operator.
  • When you preview Data Virtualization data assets in Watson services in Cloud Pak for Data (for example, Watson Knowledge Catalog, Watson Studio, and Data Refinery), and in cases when data masking applies, the preview is subject to the data protection rules and catalog or project access control only.

    To avoid double masking when you use preview in Watson services, access control in Data Virtualization is not applied when you preview a data asset (table or view) that comes from Data Virtualization. This happens only when data masking applies to the preview in Watson services. Access control does not apply in this circumstance.

For other issues and workarounds you might encounter when you use data masking in Data Virtualization, see Troubleshooting governance issues.