Hash privacy provider
Use the hash privacy provider to mask source data with numeric values generated by a hash algorithm. You can use the numeric values as the basis for providing replacement data from a lookup table or an array of values.
The provider generates a numeric hash value based on the source value and uses the hash value to produce masked data. The provider can process source data types such as character strings, integers, floating point numbers, and date-time values.
The provider also supports multiple source values of the same or different data type and concatenates these values before hashing them. You can specify a separator character to create a different concatenated source value for each row when concatenated values of columns from two or more rows would otherwise be identical.
The output value might not be unique even if the input is unique. To generate repeatable masked values for the same input, use the same seed value. The masked value for the same input will vary if the seed is changed for that input.
The provider offers two hashing algorithms, a default algorithm and the SHA-256 (secure hashing algorithm 256 bits) algorithm. The default algorithm usually outperforms the SHA-256 algorithm in processing time. The SHA-256 algorithm is a cryptographic-strength hashing algorithm that offers stronger masking than the default algorithm and ensures that masked values cannot be revealed by reverse engineering. The HMAC (hash-based message authentication code) algorithm, which is based on the SHA-256 algorithm, is also available with a user-supplied exit only and provides stronger encryption than the SHA-256 algorithm alone by using an encryption key provided by the exit. The provider prevents unauthorized access to the key value by not keeping the key in memory.
To generate values for use with a lookup table or an array of values, use the HASHMAX parameter to limit the number of values.
Examples
- Literal seed
The following example uses the default algorithm and a seed value of 100 to create hashed values for a destination field, hash_dst.
pro=hash, seed="100", dstfld="hash_dst", flddef1=(name=hash_int, dt=integer), flddef2=(name=hash_str, dt=varchar_sz), flddef3=(name=hash_dst, dt=integer)
This example uses the following parameters:
- SEED="100"
- This parameter provides a seed value of 100 to influence the masking process. Use the same seed value each time the provider is used to produce the same masked values for a given source value.
- HASHMAX limit
The following example creates hashed values for a lookup table or array of values by limiting the range of hashed values. The example also uses a seed value from the hash_seed environment variable.
pro=hash, seed=@hash_seed, hashmax=2000000, dstfld="hash_dst", flddef1=(name=hash_int, dt=integer), flddef2=(name=hash_str, dt=varchar_sz), flddef3=(name=hash_dst, dt=long_long)
This example uses the following parameters:
- SEED=@hash_seed
- This parameter provides an environment variable name, hash_seed, as seed value to influence the masking process. Use the same value each time the provider is used to produce the same masked values for a given source value.
- HASHMAX=2000000
- This parameter limits the number of output values to 2000000, which equates to the number of rows in a lookup table or the range of an array of values.
- HMAC seed
The following example uses the SHA-256 algorithm and a seed value that is provided by a user exit to create hashed values for a destination field, hash_dst.
pro=hash, algo=sha256, seed=hmac, dstfld="hash_dst", flddef1=(name=hash_int, dt=integer), flddef2=(name=hash_str, dt=varchar_sz), flddef3=(name=hash_dst, dt=integer)
This example uses the following parameters:
- ALGO=SHA256
- The SHA-256 algorithm is a cryptographic-strength hashing algorithm and ensures that masked values cannot be revealed by reverse engineering.
- SEED=HMAC
- This parameter uses a seed value to influence the masking process that is provided by the user exit. Use the same value each time the provider is used to produce the same masked values for a given source value.
Syntax
The hash privacy provider uses the following syntax:
- Masking parameters
PROVIDER = HASH , [ ALGORITHM = { SHA256 | DEFAULT } ] , [ SEED = seed-value ] , [ HASHMAX = max-hash-value ] ,
- Source value parameters
[ REMOVE = “remove-chars“ ] , [ SEP = “sep-char“ ] , [ CASE = UPPER ] , [ SRC_DATE_FORMAT = “src-date-format-expression“ ] ,
- Destination value parameters
DSTFLD = “destination-field-name”,
- Hash value parameters
[ WHENNULL = special-value-when-null-fld ] , [ WHENSPACE = special-value-when-space-fld ] , [ WHENZEROLEN = special-value-when-zerolen-fld ] , [ WHENMIXED = special-value-when-mixed-fld ] ,
- Processing parameters
[ DISCARDLIMIT = discard-limit-value ] ,
- Data definition parameters
FLDDEFn = ( NAME = field-name, DATATYPE = datatype-value, [ PRECISION = field-precision-value ], [ SCALE = field-scale-value ], [ LENGTH = field-length-value ], [ CODEPAGE = codepage-value ], [ CPTYPE = { DB2ZOS |DB2LUW | ORACLE |SYBASE |ODBC | INFORMIX |NETEZZA |SQLSERVER |TERADATA |ANY |NONE } ] ) , [ CODEPAGE = codepage-value ] , [ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX | NETEZZA | SQLSERVER | TERADATA | ANY | NONE } ]
Masking parameters
Parameters that determine how to mask data.
- PROVIDER (or PRO)
- Required. Enter the provider name, HASH.Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
- ALGORITHM (or ALGO)
- Specifies the type of hashing algorithm to use.
Enter one of the following options:
- SHA256
- Specifies to use the SHA-256 (secure hashing algorithm 256 bits) algorithm, or if a user-supplied exit is available, the HMAC (hash-based message authentication code) algorithm. Use the exit to provide a hash key for the HMAC algorithm.
- DEFAULT (or DEF)
- Specifies to use the default ODPP hash algorithm.
- SEED
- Use the SEED parameter to influence the masking process and to
produce repeatable or non-repeatable output values. Enter one of
the following options:
- "literal"
- A numeric value that is enclosed in quotes, which produces a higher degree of hash uniqueness and a repeatable output value. Enter a numeric value within the range 0 - 2,000,000,000.
- @variable
- The name of an environment variable, which is preceded by the
at sign (@), which produces a repeatable output value. The variable
provides a seed value within the range 0 - 2,000,000,000.Note: If you are using the Hash provider SEED environment variable option with the data privacy UDFs for DB2 for Linux, UNIX, and Windows, you must set the variable using the db2set DB2ENVLIST command.
- #seed-field-name
- The name of a field that contains the seed values. This field
must contain numeric values within the range 0 - 2,000,000,000.
If the field name has spaces or commas, it must be enclosed in double quotation marks. If the field name has a double quotation mark character, the character must use a backslash (\) escape character.
SEED=seed-field-name is not compatible with the parameter ALGO=SHA256.
- RANDOM
- Specifies a random seed value, which produces a non-repeatable
output value.
SEED=RANDOM is not compatible with the parameter ALGO=SHA256.
- HMAC
- Specifies to use the seed value that is provided by the user exit.
A user exit is required.
SEED=HMAC is not compatible with the parameter ALGO=DEF.
- HASHMAX
- An integer that specifies a limit for generating hash values.
The maximum is 4,294,967,295.
To generate hash values for use with a lookup table, use a HASHMAX value that is equal to the number of rows in the lookup table.
To generate hash values for use with replacement values from an array, use a HASHMAX value that is equal to the array size.
Source value parameters
Parameters for handling source values.
- REMOVE
- Specifies the characters, within enclosing double quotation marks, to remove from the source value before processing. To specify a double quotation mark character, use a backslash (\) escape character.
- SEP
- Specifies a field separator, within enclosing double quotation marks, that enables concatenation when there are multiple source fields. Only one character can be used as a separator.
- CASE
- Specifies that the provider will convert the source values to
uppercase before hashing.
Enter the following option:
- UPPER (or UP)
- Convert the source values to uppercase before hashing.
- SRC_DATE_FORMAT (or SRCDF)
- Specifies, within enclosing double quotation marks, the format
of the date string values in the source field.Use any of the following format specifiers for formatting the date-time strings. All specifiers start with a percent (%) sign. If SRCDF is not specified, the provider uses the following default date format: %DD/%MM/%YYYY %HH24:%MI:%SS:%FFFFFF.
- Year
- %YYYY
- %YY
- Month
- %MONTH
- %MMM
- %MM
- Day
- %DD
- Time
- %HH
- %MI
- %SS
- %HH24
- Fraction of a second
- %FFFFFF
- %FFFFF
- %FFFF
- %FFF
- %FF
- %F
- AM/PM
- %PM
For example, to format a date string as four-digit year, three-character month, two-digit date with dash-type (-) separators, the format string would be “%YYYY-%MMM-%DD”.
Destination value parameters
Parameters for handling destination values.
- DSTFLD
- Required. The field name that is specified in a FLDDEFn parameter that is associated with the destination field that receives the masked value. If the field name has spaces or commas, it must be enclosed in double quotation marks. If the field name has a double quotation mark character, the character must use a backslash (\) escape character.
Hash value parameters
Parameters for handling hash values.
- WHENNULL
- Specifies an output value to return when the source value is NULL. The default value is -1.
- WHENSPACE
- Specifies an output value to return when the source value is blank (consisting of spaces only). The default value is -2.
- WHENZEROLEN
- Specifies an output value to return when the source value is zero length. The default value is -3.
- WHENMIXED
- Specifies an output value to return when the source value is NULL, blank (consisting of spaces only), or zero length. The default value is -4.
Processing parameters
Parameters for managing provider processes.
- DISCARDLIMIT (or DLIM)
- Specifies the number of failed rows to discard before the provider stops processing.
Data definition parameters
Parameters for defining source and target data. For further information see, supported data types.
- FLDDEFn
- Required. Specifies the attributes of input values to use for processing. See Field definition parameter.
- CODEPAGE (or CP)
- An integer value that specifies the codepage or character-set identifier of the source fields. The default is UTF-8. The CP parameter within the FLDDEFn parameter overrides this value.
- CPTYPE (or CPT)
- The codepage type of the source fields. The CPT parameter within the FLDDEFn
parameter overrides this value.
When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE. As there are no DBMS-specific code pages for Netezza®, a specification of NONE is implied when Netezza is specified.
Enter one of the following values:
Value Description DBZ (or DB2zOS) DB2® for z/OS® DB2 (or DB2LUW) DB2 for Linux®, UNIX, and Windows IFX (or INFORMIX) Informix® MSS (or SQLSERVER) Microsoft SQL Server NZ or NETEZZA Netezza ODBC ODBC ORA (or ORACLE) Oracle SYB (or SYBASE) Sybase TD or TERADATA Teradata ANY Any DBMS NONE No DBMS
Supported data types
The hash privacy provider supports the following data types for source fields:
Data type | Description |
---|---|
CHAR | Fixed size character data that is left justified and space padded. |
WCHAR | Fixed size wide character data that is left justified and space padded. |
VARCHAR | Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow. |
WVARCHAR | Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow. |
VARCHAR_SZ | Character data string which is terminated by a NULL character. |
WVARCHAR_SZ | Wide character data string that is terminated by a NULL character. |
ODBC_DATE | A data type that is used when the source value is a ODPP_ODBC_DATE structure of the Optim™ data masking API. |
ODBC_TIME | A data type that is used when the source value is a ODPP_ODBC_TIME structure of the Optim data masking API. |
ODBC_TIMESTAMP | A data type that is used when the source value is a ODPP_ODBC_TIMESTAMP structure of the Optim data masking API. |
DATETIME_SZ | A character data string containing date-time values and is terminated by a NULL character. To format this data type value, the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API. |
DATETIME_WSZ | A character data string containing date-time values and is terminated by a NULL character. To format this data type, value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API. |
DOUBLE | Double precision floating point number in the range 1.7E +/- 308 (15 digits). |
FLOAT | Floating point number in the range 3.4E +/- 38 (7 digits). |
LONG_LONG | An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
U_LONG_LONG | An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615. |
INTEGER | A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647. |
U_INTEGER | A 4 byte unsigned integer value in the range 0 to 4,294,967,295. |
SMALLINT | A 2 byte signed integer value in the range -32,768 to 32,767. |
U_SMALLINT | A 2 byte unsigned integer value in the range 0 to 65,535. |
TINYINT | A single byte signed integer value in the range -128 to 127. |
U_TINYINT | A single byte unsigned integer value in the range 0 to 255. |
DECIMAL_370 | Packed decimal encoded buffer. |
ORA_VARNUM | Oracle varnum encoded buffer. |
The hash privacy provider supports the following data types for destination fields. Note that some data types are not available if the HASHMAX parameter is not used.
Data type | Description | HASHMAX not used |
---|---|---|
CHAR | Fixed size character data that is left justified and space padded. | No |
WCHAR | Fixed size wide character data that is left justified and space padded. | No |
VARCHAR | Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow. | No |
WVARCHAR | Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow. | No |
VARCHAR_SZ | Character data string which is terminated by a NULL character. | No |
WVARCHAR_SZ | Wide character data string that is terminated by a NULL character. | No |
DOUBLE | Double precision floating point number in the range 1.7E +/- 308 (15 digits). | No |
FLOAT | Floating point number in the range 3.4E +/- 38 (7 digits). | No |
LONG_LONG | An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. | No |
U_LONG_LONG | An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615. | No |
INTEGER | A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647. | Yes |
SMALLINT | A 2 byte signed integer value in the range -32,768 to 32,767. | Yes |
TINYINT | A single byte signed integer value in the range -128 to 127. | Yes |
DECIMAL_370 | Packed decimal encoded buffer. | No |
ORA_VARNUM | Oracle varnum encoded buffer. | No |