Masking data with data protection rules (Watson Knowledge Catalog)
Data masking helps you protect sensitive data, such as personally identifiable information or restricted business data to avoid the risk of compromising confidential information. It is defined in data policy rules that are enforced for an asset. Depending on the method of data masking, data is redacted, substituted, or obfuscated with retained formatting in the asset preview.
The lock icon (
) in the column header of the asset on the Overview tab indicates that this data column contains masked data.
The schema information always reflects the total number of columns that are contained in the original asset.
Note: Asset owners can view any data within the asset even if data is masked. See Previews.
However, for virtual data, the behavior is slightly different based on the data field definition. See Masking virtual data.
When creating rules, you first define conditions in the rule builder and then decide whether to deny access to the asset or mask data according to policies.
To mask data in assets:
- Complete the conditions and select the attributes that you want to process.
- Select the action Mask data.
- In columns containing: Select the data class groups or individual data classes.
By default, this field contains the attributes that you selected when defining the condition. You can now remove or add more data classes.
- Select the method to mask data:
- Redact data values in asset columns.
This method replaces each data value with a string of exactly ten letters of X to remove information that is, for example, identifying or otherwise sensitive. With redacted data, neither the format of the data nor referential integrity is retained. - Substitute data values in asset columns.
This method replaces data with values that don’t match the original format. It preserves referential integrity (RI) to ensure that table relationships are consistent.
If a value is used several times in a column with substituted data, Substitute uses the same substitution value for identical data values.
For example, if a column contains the email addressuserA@example.comseveral times, each finding is replaced by the same substitution value, such as:500ddcc98133703531re3456. - Obfuscate data values in asset columns that contain data classes with the following types of information, such as:
- Personal information, for example, basic attributes of an individual, such as gender, honorofic and name suffix.
- Contact details, for example, email addresses, phone numbers, state, postal addresses, latitude, or longitude.
- Financial accounts, for example, credit cards, banking, or other financial account numbers.
- Government identities, for example, personal identification numbers issued by governments, such as SSN (US social security numbers) and CCN (credit card numbers).
- Personal demographic information, for example, religion, ethnicity, eye color, hair color, marital status, hobbies, or employee status.
- Connectivity data, for example, IP address or mac address.
This method replaces the data values with similarly formatted values that match the original format. For identical data values, the same replacement value is used. For example, identical phone numbers might always be obfuscated with the replacement value 510-987-6543. If the data element cannot be obfuscated, the fallback is “substitute”.
- Redact data values in asset columns.
- Click Create.