Masking virtual data in Watson Query

Important: IBM Cloud Pak® for Data Version 4.6 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.6 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

You can use a data protection rule to mask virtual data. When queried, masked columns return disguised data.

Tech preview This is a technology preview and is not supported for use in production environments.

Data masking applies to the result sets of the queries only. The original data in tables and columns remains untouched. Masking does not apply to query predicates or views. You can use data masking to avoid exposing sensitive data. However, data masking does not stop Watson Query users from connecting to the service and running queries against that data. Users can join and group data, generate the reports, perform analytics, and collect insights by using the raw data, while masking the result set only.

Data masking rules apply only to the following objects in Watson Query:

  • Virtualized objects
  • Native Db2® tables
  • Materialized Query Tables (MQTs)

Depending on the method of data masking, data is redacted, substituted, or obfuscated. See Masking data with data protection rules for details. A lock icon (Lock icon) next to the column name in your virtualized data indicates that the data in the column is masked by a data protection rule.

When you apply masking rules to virtual data, keep in mind the following considerations:

Data type support for masking virtual data

Watson Query supports data masking on columns of several data types. Masking operations behave differently based on the operation and the data type of the column.
Redact action data mapping
The following list describes which data types are mapped for Redact action in Watson Query.
  • VARCHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
  • CHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
  • Numeric is masked to 0.
  • DATE is masked to '0001-01-01'.
  • TIME is masked to '00:00:00'.
  • TIMESTAMP is masked to DATE('0001-01-01').
  • BOOLEAN is masked to 0.
Substitute action data mapping
CHAR, VARCHAR
Substitution displays a hexadecimal hash of the original value up to a 64 digits or the column length, whichever is shorter.
SMALLINT, INT, BIGINT
Substitution displays a numeric hash of the original value. The resulting hash has from two digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
DECIMAL, DOUBLE, REAL
Substitution displays a numeric hash of the original value. The resulting hash has from 2 digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
Substitution generates the same number of fractional digits as the original value for up to 2 digits after decimal point.
Other data
Falls back to the Redact action.
Obfuscate action data mapping
Obfuscation support is limited to the following data classes only:
  • Personal Information
    • Person name
    • First name
    • Last name
    • Gender
    • Honorific
    • Name suffix
  • Contact Details
    • USPHN
    • STREET
    • ADDRESSLINE1
    • ADDRESSLINE2
    • ADDRESSLINE3
    • CITY
    • STATEPROVNAME
    • COUNTRY
    • Latitude
    • Longitude
    • Email address
  • Financial Accounts
    • American Express card
    • Diners club card
    • Discover card
    • Mastercard
    • VISA card
    • Credit card
    • Japan Credit Bureau (JCB)
  • Government Identities
    • SSN
    • SSN4
  • Organization Information
    • DUNS Number
    • CAGE CODE
    • USSIC
  • Connectivity Data
    • IP v4 address
  • Personal Demographic Information
    • RELIGION
    • ETHNICITY
    • EYECOLOR
    • HAIRCOLOR
    • MARITALSTATUS
    • HOBBY
    • EMPLOYMENTSTATUS
  • Other
    • USSCP
    • RELATIONSHIP
If obfuscation cannot be performed, the behavior falls back to Substitute.
Note: If obfuscated value is longer than the column length (value does not fit), masked data falls back to the Redact action to avoid overflowing column length.

Limitations and known issues for masking virtual data

The following limitations and known issues apply to Watson Query for data masking.
  • Data masking is limited to the following data types:
    • Character strings (CHAR, VARCHAR)
    • Numeric (SMALLINT, INT, BIGINT, DECIMAL, DOUBLE, REAL, DECFLOAT)
    • Date and time (DATE, TIME, TIMESTAMP)
    • Boolean
  • Advanced masking options are not supported.
  • If a data masking rule applies to a column of an unsupported data type (for example, CLOB), you will see a Db2 error with SQLCODE=-245, SQLSTATE=428F5.
  • GROUP BY expressions on masked columns are not pushed down to data sources.
  • Aggregate functions such as SUM might cause numeric overflow in substitution masking. You can cast the column to a higher degree type as shown in the following example.
    SELECT SUM(CAST(VALUE AS DECIMAL(31,0))) FROM PROJECT1.CUSTOMERS
  • Masking is not applied to query predicates such as WHERE, GROUP BY, JOIN, HAVING, SELECT DISTINCT, and ORDER BY clauses in an SQL statement.
  • Data masking rules cannot apply to views directly, but the result sets of views are masked according to the data protection rules that apply to the objects referenced by the view. You can mask identifying details from tables that are referenced in the view definition:
    1. Ensure that the virtual tables that are referenced in a view definition are virtualized tables with data masking support:
      SELECT BSCHEMA, BNAME, BTYPE FROM SYSCAT.TABDEP WHERE TABSCHEMA = <View schema>  AND TABNAME =<View name>

      If BTYPE for all referenced objects is N or T, your view is referencing virtualized tables with data masking support.

    2. Publish the referenced tables to the catalog and define masking rules that apply to these virtual tables.
    Note: When the objects that are referenced in the views are masked, the views are masked as well.
  • For the data masking rules to apply, virtual objects must be captured in a governed catalog and use the Watson Query connection type.
  • Masked columns cannot be used in SELECT (including views) derived from a set operation that involves an EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL, or UNION set operator.
  • When you preview Watson Query data assets in Watson services, other than Watson Query, in Cloud Pak for Data (for example, Watson Knowledge Catalog, Watson Studio, and Data Refinery), and in cases when data masking or row-level filtering applies, the preview is subject to the data protection rules and catalog or project access control only.

    To avoid double masking when you preview in Watson services other than Watson Query, access control in Watson Query is not applied when you preview, download, or refine a data asset (table or view) that comes from Watson Query. This happens only when data masking or row-level filtering applies to the preview in the other Watson services. Watson Query internal access controls, which are controlled by using Manage access in the Watson Query UI, do not apply in this circumstance.

For other issues and workarounds you might encounter when you use data masking in Watson Query, see Troubleshooting governance issues.