Hash privacy provider

Use the hash privacy provider to mask source data with numeric values generated by a hash algorithm. You can use the numeric values as the basis for providing replacement data from a lookup table or an array of values.

The provider generates a numeric hash value based on the source value and uses the hash value to produce masked data. The provider can process source data types such as character strings, integers, floating point numbers, and date-time values.

The provider also supports multiple source values of the same or different data type and concatenates these values before hashing them. You can specify a separator character to create a different concatenated source value for each row when concatenated values of columns from two or more rows would otherwise be identical.

The output value might not be unique even if the input is unique. To generate repeatable masked values for the same input, use the same seed value. The masked value for the same input will vary if the seed is changed for that input.

The provider offers two hashing algorithms, a default algorithm and the SHA-256 (secure hashing algorithm 256 bits) algorithm. The default algorithm usually outperforms the SHA-256 algorithm in processing time. The SHA-256 algorithm is a cryptographic-strength hashing algorithm that offers stronger masking than the default algorithm and ensures that masked values cannot be revealed by reverse engineering. The HMAC (hash-based message authentication code) algorithm, which is based on the SHA-256 algorithm, is also available with a user-supplied exit only and provides stronger encryption than the SHA-256 algorithm alone by using an encryption key provided by the exit. The provider prevents unauthorized access to the key value by not keeping the key in memory.

To generate values for use with a lookup table or an array of values, use the HASHMAX parameter to limit the number of values.

Examples

Literal seed

The following example uses the default algorithm and a seed value of 100 to create hashed values for a destination field, hash_dst.

pro=hash, seed="100", dstfld="hash_dst", flddef1=(name=hash_int, dt=integer), 
flddef2=(name=hash_str, dt=varchar_sz), flddef3=(name=hash_dst, dt=integer)

This example uses the following parameters:

SEED="100"
This parameter provides a seed value of 100 to influence the masking process. Use the same seed value each time the provider is used to produce the same masked values for a given source value.
HASHMAX limit

The following example creates hashed values for a lookup table or array of values by limiting the range of hashed values. The example also uses a seed value from the hash_seed environment variable.

pro=hash, seed=@hash_seed, hashmax=2000000, dstfld="hash_dst", 
flddef1=(name=hash_int, dt=integer), flddef2=(name=hash_str, dt=varchar_sz), 
flddef3=(name=hash_dst, dt=long_long)

This example uses the following parameters:

SEED=@hash_seed
This parameter provides an environment variable name, hash_seed, as seed value to influence the masking process. Use the same value each time the provider is used to produce the same masked values for a given source value.
HASHMAX=2000000
This parameter limits the number of output values to 2000000, which equates to the number of rows in a lookup table or the range of an array of values.
HMAC seed

The following example uses the SHA-256 algorithm and a seed value that is provided by a user exit to create hashed values for a destination field, hash_dst.

pro=hash, algo=sha256, seed=hmac, dstfld="hash_dst", 
flddef1=(name=hash_int, dt=integer), flddef2=(name=hash_str, dt=varchar_sz), 
flddef3=(name=hash_dst, dt=integer)

This example uses the following parameters:

ALGO=SHA256
The SHA-256 algorithm is a cryptographic-strength hashing algorithm and ensures that masked values cannot be revealed by reverse engineering.
SEED=HMAC
This parameter uses a seed value to influence the masking process that is provided by the user exit. Use the same value each time the provider is used to produce the same masked values for a given source value.

Syntax

The hash privacy provider uses the following syntax:

Masking parameters
PROVIDER = HASH  , 
	[ ALGORITHM = { SHA256 | DEFAULT } ] ,
	[ SEED = seed-value ]  ,
	[ HASHMAX = max-hash-value ]  ,
Source value parameters
	[ REMOVE = “remove-chars“ ]  ,      
	[ SEP = “sep-char“ ]  ,      
	[ CASE = UPPER  ]  ,
	[ SRC_DATE_FORMAT = “src-date-format-expression“ ]  ,
Destination value parameters
	DSTFLD = “destination-field-name”,
Hash value parameters
	[ WHENNULL = special-value-when-null-fld ]  ,
	[ WHENSPACE = special-value-when-space-fld ]  ,
	[ WHENZEROLEN = special-value-when-zerolen-fld ]  ,
	[ WHENMIXED = special-value-when-mixed-fld ]  ,
Processing parameters
	[ DISCARDLIMIT = discard-limit-value ] ,
Data definition parameters
	FLDDEFn = ( NAME = field-name,    
		DATATYPE = datatype-value, 
		[ PRECISION = field-precision-value ], 
		[ SCALE = field-scale-value ],    
		[ LENGTH = field-length-value ],
		[ CODEPAGE = codepage-value ],
		[ CPTYPE = { DB2ZOS |DB2LUW | ORACLE |SYBASE |ODBC | INFORMIX |NETEZZA |SQLSERVER |TERADATA |ANY |NONE } ] ) ,
	[ CODEPAGE = codepage-value ]  ,
	[ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX |
		     NETEZZA | SQLSERVER | TERADATA | ANY | NONE  } ]

Masking parameters

Parameters that determine how to mask data.

PROVIDER (or PRO)
Required. Enter the provider name, HASH.
Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
ALGORITHM (or ALGO)
Specifies the type of hashing algorithm to use.

Enter one of the following options:

SHA256
Specifies to use the SHA-256 (secure hashing algorithm 256 bits) algorithm, or if a user-supplied exit is available, the HMAC (hash-based message authentication code) algorithm. Use the exit to provide a hash key for the HMAC algorithm.
DEFAULT (or DEF)
Specifies to use the default ODPP hash algorithm.
SEED
Use the SEED parameter to influence the masking process and to produce repeatable or non-repeatable output values. Enter one of the following options:
"literal"
A numeric value that is enclosed in quotes, which produces a higher degree of hash uniqueness and a repeatable output value. Enter a numeric value within the range 0 - 2,000,000,000.
@variable
The name of an environment variable, which is preceded by the at sign (@), which produces a repeatable output value. The variable provides a seed value within the range 0 - 2,000,000,000.
Note: If you are using the Hash provider SEED environment variable option with the data privacy UDFs for DB2 for Linux, UNIX, and Windows, you must set the variable using the db2set DB2ENVLIST command.
#seed-field-name
The name of a field that contains the seed values. This field must contain numeric values within the range 0 - 2,000,000,000.

If the field name has spaces or commas, it must be enclosed in double quotation marks. If the field name has a double quotation mark character, the character must use a backslash (\) escape character.

SEED=seed-field-name is not compatible with the parameter ALGO=SHA256.

RANDOM
Specifies a random seed value, which produces a non-repeatable output value.

SEED=RANDOM is not compatible with the parameter ALGO=SHA256.

HMAC
Specifies to use the seed value that is provided by the user exit. A user exit is required.

SEED=HMAC is not compatible with the parameter ALGO=DEF.

HASHMAX
An integer that specifies a limit for generating hash values. The maximum is 4,294,967,295.

To generate hash values for use with a lookup table, use a HASHMAX value that is equal to the number of rows in the lookup table.

To generate hash values for use with replacement values from an array, use a HASHMAX value that is equal to the array size.

Source value parameters

Parameters for handling source values.

REMOVE
Specifies the characters, within enclosing double quotation marks, to remove from the source value before processing. To specify a double quotation mark character, use a backslash (\) escape character.
SEP
Specifies a field separator, within enclosing double quotation marks, that enables concatenation when there are multiple source fields. Only one character can be used as a separator.
CASE
Specifies that the provider will convert the source values to uppercase before hashing.

Enter the following option:

UPPER (or UP)
Convert the source values to uppercase before hashing.
SRC_DATE_FORMAT (or SRCDF)
Specifies, within enclosing double quotation marks, the format of the date string values in the source field.
Use any of the following format specifiers for formatting the date-time strings. All specifiers start with a percent (%) sign. If SRCDF is not specified, the provider uses the following default date format: %DD/%MM/%YYYY %HH24:%MI:%SS:%FFFFFF.
Year
  • %YYYY
  • %YY
Month
  • %MONTH
  • %MMM
  • %MM
Day
%DD
Time
  • %HH
  • %MI
  • %SS
  • %HH24
Fraction of a second
  • %FFFFFF
  • %FFFFF
  • %FFFF
  • %FFF
  • %FF
  • %F
AM/PM
%PM

For example, to format a date string as four-digit year, three-character month, two-digit date with dash-type (-) separators, the format string would be “%YYYY-%MMM-%DD”.

Destination value parameters

Parameters for handling destination values.

DSTFLD
Required. The field name that is specified in a FLDDEFn parameter that is associated with the destination field that receives the masked value. If the field name has spaces or commas, it must be enclosed in double quotation marks. If the field name has a double quotation mark character, the character must use a backslash (\) escape character.

Hash value parameters

Parameters for handling hash values.

WHENNULL
Specifies an output value to return when the source value is NULL. The default value is -1.
WHENSPACE
Specifies an output value to return when the source value is blank (consisting of spaces only). The default value is -2.
WHENZEROLEN
Specifies an output value to return when the source value is zero length. The default value is -3.
WHENMIXED
Specifies an output value to return when the source value is NULL, blank (consisting of spaces only), or zero length. The default value is -4.

Processing parameters

Parameters for managing provider processes.

DISCARDLIMIT (or DLIM)
Specifies the number of failed rows to discard before the provider stops processing.

Data definition parameters

Parameters for defining source and target data. For further information see, supported data types.

FLDDEFn
Required. Specifies the attributes of input values to use for processing. See Field definition parameter.
CODEPAGE (or CP)
An integer value that specifies the codepage or character-set identifier of the source fields. The default is UTF-8. The CP parameter within the FLDDEFn parameter overrides this value.
CPTYPE (or CPT)
The codepage type of the source fields. The CPT parameter within the FLDDEFn parameter overrides this value.

When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE. As there are no DBMS-specific code pages for Netezza®, a specification of NONE is implied when Netezza is specified.

Enter one of the following values:

Value Description
DBZ (or DB2zOS) DB2® for z/OS®
DB2 (or DB2LUW) DB2 for Linux®, UNIX, and Windows
IFX (or INFORMIX) Informix®
MSS (or SQLSERVER) Microsoft SQL Server
NZ or NETEZZA Netezza
ODBC ODBC
ORA (or ORACLE) Oracle
SYB (or SYBASE) Sybase
TD or TERADATA Teradata
ANY Any DBMS
NONE No DBMS

Supported data types

The hash privacy provider supports the following data types for source fields:

Data type Description
CHAR Fixed size character data that is left justified and space padded.
WCHAR Fixed size wide character data that is left justified and space padded.
VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow.
WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow.
VARCHAR_SZ Character data string which is terminated by a NULL character.
WVARCHAR_SZ Wide character data string that is terminated by a NULL character.
ODBC_DATE A data type that is used when the source value is a ODPP_ODBC_DATE structure of the Optim™ data masking API.
ODBC_TIME A data type that is used when the source value is a ODPP_ODBC_TIME structure of the Optim data masking API.
ODBC_TIMESTAMP A data type that is used when the source value is a ODPP_ODBC_TIMESTAMP structure of the Optim data masking API.
DATETIME_SZ A character data string containing date-time values and is terminated by a NULL character. To format this data type value, the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_WSZ A character data string containing date-time values and is terminated by a NULL character. To format this data type, value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DOUBLE Double precision floating point number in the range 1.7E +/- 308 (15 digits).
FLOAT Floating point number in the range 3.4E +/- 38 (7 digits).
LONG_LONG An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
U_LONG_LONG An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615.
INTEGER A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647.
U_INTEGER A 4 byte unsigned integer value in the range 0 to 4,294,967,295.
SMALLINT A 2 byte signed integer value in the range -32,768 to 32,767.
U_SMALLINT A 2 byte unsigned integer value in the range 0 to 65,535.
TINYINT A single byte signed integer value in the range -128 to 127.
U_TINYINT A single byte unsigned integer value in the range 0 to 255.
DECIMAL_370 Packed decimal encoded buffer.
ORA_VARNUM Oracle varnum encoded buffer.

The hash privacy provider supports the following data types for destination fields. Note that some data types are not available if the HASHMAX parameter is not used.

Data type Description HASHMAX not used
CHAR Fixed size character data that is left justified and space padded. No
WCHAR Fixed size wide character data that is left justified and space padded. No
VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow. No
WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow. No
VARCHAR_SZ Character data string which is terminated by a NULL character. No
WVARCHAR_SZ Wide character data string that is terminated by a NULL character. No
DOUBLE Double precision floating point number in the range 1.7E +/- 308 (15 digits). No
FLOAT Floating point number in the range 3.4E +/- 38 (7 digits). No
LONG_LONG An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. No
U_LONG_LONG An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615. No
INTEGER A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647. Yes
SMALLINT A 2 byte signed integer value in the range -32,768 to 32,767. Yes
TINYINT A single byte signed integer value in the range -128 to 127. Yes
DECIMAL_370 Packed decimal encoded buffer. No
ORA_VARNUM Oracle varnum encoded buffer. No