Table of contents

Masking data with data protection rules (Watson Knowledge Catalog)

Data masking helps you protect sensitive data, such as personally identifiable information or restricted business data to avoid the risk of compromising confidential information. It is defined in data policy rules that are enforced for an asset. Depending on the method of data masking, data is redacted, substituted, or obfuscated with retained formatting in the asset preview.

The lock icon (a lock icon) in the column header of the asset on the Overview tab indicates that this data column contains masked data.

The schema information always reflects the total number of columns that are contained in the original asset.

Note: Asset owners can view any data within the asset even if data is masked. See Previews.

However, for virtual data, see Governing virtual data by using data protection rules.

When creating rules, you first define conditions in the rule builder and then decide whether to deny access to the asset or mask data according to policies.

To mask data in assets:

  1. Complete the conditions and select the attributes that you want to process.
  2. Select the action Mask data.
  3. In columns containing: Select the business term, data class, column name, or tag.
    By default, this field contains the attributes that you selected when defining the condition. You can now remove or add more data classes.
  4. Select the method to mask data:
    • Redact data values in asset columns.
      This method replaces each data value with a string of exactly ten letters of X to remove information that is, for example, identifying or otherwise sensitive. With redacted data, neither the format of the data nor referential integrity is retained.
    • Substitute data values in asset columns.
      This method replaces data with values that don’t match the original format. It preserves referential integrity (RI) to ensure that table relationships are consistent.
      If a value is used several times in a column with substituted data, Substitute uses the same substitution value for identical data values.
      For example, if a column contains the email address several times, each finding is replaced by the same substitution value, such as: 500ddcc98133703531re3456.
    • Obfuscate data values in asset columns that contain data classes with the following types of information, such as:
      • Financial accounts, for example, credit cards, banking, or other financial account numbers.
      • Government identities, for example, personal identification numbers issued by governments, such as passport numbers, SSN (US social security numbers), CCN (credit card numbers), ITIN (Individual Taxpayer Identification Number), or SIN (social insurance number).
      • Contact details, for example, Email addresses, postal addresses, or phone numbers.
      • Personal information, for example, basic attributes of an individual, such as person name, date of birth, or gender.

      This method replaces the data values with similarly formatted values that match the original format. It does not preserve referential integrity (RI) or data distribution. If the data element cannot be obfuscated, the fallback is “substitute”.

  5. Click Create.

Next steps

Learn more