REDACT - Working with regex on Windows DB servers

When you use regular expressions with REDACT with Windows database servers, you need to be aware of some caveats and workarounds.

Symptoms

When you use Guardium® REDACT in your policy, the policy does not recognize the following regular expressions:
Regex Description
{n} Repeat n times. That is, match the preceding item n times.
[xNN] 2-digit hex number. Use [\xNN] instead (with backslash).
[NNN] 3-digit octal number. If possible, specify the ASCII equivalent.

Causes

Windows server does not support all regular expression patterns.
Note: REDACT policies that use regex can only scrub null-terminated data types.

Environment

Guardium collectors that receive data from Windows database servers.

Resolving the problem

Depending on the regex issue, rewrite your regex as follows:
  • Instead of {n}, specify each digit that you want to match. For example, instead of the following regex:
    [0-9]{5}
    Use the following expression:
    [0-9][0-9][0-9][0-9][0-9]
  • Use [\xNN] to signify a 2-digit hexadecimal number, such as [x41].

    Where [\xNN] is a hexadecimal number between 01 to FF (\x00 is not supported).

    For example, the hexadecimal number x41 maps to the ASCII character A. In this case, you can use either of the following regular expressions to add the letter A to the beginning of any 4-digit number:
    [\x41][0-9][0-9][0-9][0-9][0-9]
    Or:
     A[0-9][0-9][0-9][0-9][0-9]
    To use [\xNN] for redaction, specify the length of the entire string as a 2-digit hexadecimal number, and then wrap the characters to redact in parentheses.
    Note: For Microsoft SQL, each character is two bytes. For most other data sources, each character is one byte.

    For example, to redact the first 6 digits of a 10-digit number, use the following regex:

    For MS SQL data sources:

    \x14([0-9][0-9][0-9][0-9][0-9][0-9])[0-9][0-9][0-9][0-9]

    For other (non-MS SQL) data sources:

    \x0A([0-9][0-9][0-9][0-9][0-9][0-9])[0-9][0-9][0-9][0-9]

    To make sure that your regex works with most data sources, use the following regex:

    [\x0A\x14]([0-9][0-9][0-9][0-9][0-9][0-9])[0-9][0-9][0-9][0-9]

    Where, the [\x0A\x14] part of the pattern means either-or. The redaction engine first tries the \x0A pattern, and if that doesn't match, it tries the \x14 pattern.

    Note: For Microsoft SQL with VARCHAR only, you must specify the exact number (from 01 to 99) of characters to redact. Specify the number in angle brackets. For example, to redact the first 10 characters of your string:
    <10>([0-9][0-9][0-9][0-9][0-9][0-9])[0-9][0-9][0-9][0-9]
  • [NNN] signifies a 3-digit octal number such as 102. Windows S-TAP® does not support this pattern.
    No perfect alternative pattern exists, but if [NNN] matches to a specific ASCII character, you can replace the octal number with its ASCII counterpart. For example, the octal number 102 maps to the ASCII character B. In this case, you can replace the following regex:
    [102][0-9][0-9][0-9][0-9][0-9]
    With this expression:
     B[0-9][0-9][0-9][0-9][0-9]

If one of these patterns does not resolve the problem in your environment, contact IBM® Technical Support if you need further analysis.