Masking virtual data in Data Virtualization
You can use a data protection rule to mask virtual data. When queried, masked columns return disguised data.
Data masking applies to the result sets of the queries only. The original data in tables and columns remains untouched. Masking does not apply to query predicates or views. You can use data masking to avoid exposing sensitive data. However, data masking does not stop Data Virtualization users from connecting to the service and running queries against that data. Users can join and group data, generate the reports, perform analytics, and collect insights by using the raw data, while masking the result set only.
Data masking rules apply only to the following objects in Data Virtualization:
- Virtualized objects
- Native Db2® tables
- Materialized Query Tables (MQTs)
Depending on the method of data masking, data is redacted, substituted, or obfuscated. See
Masking
data with data protection rules for details. A lock icon (
) next to the column name in your virtualized data indicates
that the data in the column is masked by a data protection
rule.
Data type support for masking virtual data
- Redact action data mapping
- The following list describes which data types are mapped for Redact
action in Data Virtualization.
- VARCHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
- CHAR is masked to 'XXXXXXXXXX' or 'X' * column length, whichever is shorter.
- Numeric is masked to 0.
- DATE is masked to '0001-01-01'.
- TIME is masked to '00:00:00'.
- TIMESTAMP is masked to DATE('0001-01-01').
- BOOLEAN is masked to 0.
- Substitute action data mapping
-
- CHAR, VARCHAR
- Substitution displays a hexadecimal hash of the original value up to a 64 digits or the column length, whichever is shorter.
- SMALLINT, INT, BIGINT
- Substitution displays a numeric hash of the original value. The resulting hash has from two digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
- DECIMAL, DOUBLE, REAL
- Substitution displays a numeric hash of the original value. The resulting hash has from 2 digits up to the same number of digits as the original value. When the original value is too small, substitution generates a two-digit number to prevent exposing details about the original value. If the resulting value has a leading "0", that zero will not be displayed. For example, a masked value of "02" is displayed as just "2".
- Other data
- Falls back to the Redact action.
- Obfuscate action data mapping
- Obfuscation support is limited to the following data classes only:
- Personal Information
- Gender
- Honorific
- Name suffix
- Contact Details
- USPHN
- STREET
- ADDRESSLINE1
- ADDRESSLINE2
- ADDRESSLINE3
- CITY
- STATEPROVNAME
- COUNTRY
- Latitude
- Longitude
- Financial Accounts
- American Express card
- Diners club card
- Discover card
- Mastercard
- VISA card
- Credit card
- Japan Credit Bureau (JCB)
- Government Identities
- SSN
- SSN4
- Organization Information
- DUNS Number
- CAGE CODE
- USSIC
- Connectivity Data
- IP v4 address
- Personal Demographic Information
- RELIGION
- ETHNICITY
- EYECOLOR
- HAIRCOLOR
- MARITALSTATUS
- HOBBY
- EMPLOYMENTSTATUS
- Other
- USSCP
- RELATIONSHIP
If obfuscation cannot be performed, the behavior falls back to Substitute.Note: If obfuscated value is longer than the column length (value does not fit), masked data falls back to the Redact action to avoid overflowing column length. - Personal Information
Limitations and known issues for masking virtual data
- Data masking is limited to the following data types:
- Character strings (CHAR, VARCHAR)
- Numeric (SMALLINT, INT, BIGINT, DECIMAL, DOUBLE, REAL, DECFLOAT)
- Date and time (DATE, TIME, TIMESTAMP)
- Boolean
- If a data masking rule applies to a column of an unsupported data type (for example, CLOB), you will see a Db2 error with SQLCODE=-245, SQLSTATE=428F5.
- GROUP BY expressions on masked columns are not pushed down to data sources.
- Aggregate functions such as SUM might cause numeric
overflow in substitution masking. You can cast the column to a higher degree type as shown in the
following
example.
SELECT SUM(CAST(VALUE AS DECIMAL(31,0))) FROM PROJECT1.CUSTOMERS - Masking is not applied to query predicates such as
WHERE,GROUP BY,JOIN,HAVING,SELECT DISTINCT, andORDER BYclauses in an SQL statement. - Data masking rules cannot apply to views directly, but the result sets of views
are masked according to the data protection rules that apply to the objects referenced by the view.
You can mask identifying details from tables that are referenced in the view definition:
- Ensure that the virtual tables that are referenced in a view definition are virtualized tables
with data masking
support:
SELECT BSCHEMA, BNAME, BTYPE FROM SYSCAT.TABDEP WHERE TABSCHEMA = <View schema> AND TABNAME =<View name>If
BTYPEfor all referenced objects isNorT, your view is referencing virtualized tables with data masking support. - Publish the referenced tables to the catalog and define masking rules that apply to these virtual tables.
Note: When the objects that are referenced in the views are masked, the views are masked as well. - Ensure that the virtual tables that are referenced in a view definition are virtualized tables
with data masking
support:
- For the data masking rules to apply, virtual objects must be captured in a governed catalog and use the Data Virtualization connector.
- Masked columns cannot be used in
SELECT(including views) derived from a set operation that involves anEXCEPT,EXCEPT ALL,INTERSECT,INTERSECT ALL, orUNIONset operator. -
When you preview Data Virtualization data assets in Watson services in Cloud Pak for Data (for example, Watson Knowledge Catalog, Watson Studio, and Data Refinery), and in cases when data masking applies, the preview is subject to the data protection rules and catalog or project access control only.
To avoid double masking when you use preview in Watson services, access control in Data Virtualization is not applied when you preview a data asset (table or view) that comes from Data Virtualization. This happens only when data masking applies to the preview in Watson services. Access control does not apply in this circumstance.
For other issues and workarounds you might encounter when you use data masking in Data Virtualization, see Troubleshooting governance issues.