Hash lookup privacy provider

Use the hash lookup privacy provider to obtain the value for a destination field from a lookup table, according to a hashed value derived from a source field.

The lookup table is typically indexed and must include a key field that contains sequential number values without any gaps. The provider hashes one or more source field values to derive sequential numbers from 1 to the maximum value in the key field of the lookup table. The provider then matches the hash value with a value in the field of sequential values in the lookup table and uses replacement values from the lookup table row with the matching value.

The provider offers two hashing algorithms, a default algorithm and the SHA-256 (secure hashing algorithm 256 bits) algorithm. The default algorithm usually outperforms the SHA-256 algorithm in processing time. The SHA-256 algorithm is a cryptographic-strength hashing algorithm that offers stronger masking than the default algorithm and ensures that masked values cannot be revealed by reverse engineering. The HMAC (hash-based message authentication code) algorithm, which is based on the SHA-256 algorithm, is also available with a user-supplied exit only and provides stronger encryption than the SHA-256 algorithm alone by using an encryption key provided by the exit. The provider prevents unauthorized access to the key value by not keeping the key in memory.

The provider can use specific hash values for null, space, and zero-length values. The provider can also remove specified characters from source values and convert source values to uppercase before it hashes the values.

If a matching lookup value is not available, the provider can preserve the source value or generate an error.

The provider can maintain a cache of lookup values, which can improve performance but requires extra memory or disk space.

Examples

Default algorithm

The following example uses the default hashing algorithm to derive a hash value from data in the s1 source field and searches for a matching value in the seq field of the other.optim_us_person lookup table. The provider uses replacement values from the firstname and lastname fields in the lookup table row with the matching value to mask the firstname and lastname fields in the source data.

pro=hash_lookup, hashfld="seq",
source="s1", rep="firstname, lastname",
id=other.optim_us_person, lib=db2luw, conn=odpplkpu,
user=testuser, pass=testpass,
flddef1=(name=s1, dt=varchar_sz),
flddef2=(name=firstname, dt=varchar_sz),
flddef3=(name=lastname, dt=varchar_sz)

This example uses the following parameters:

HASHFLD="seq"
This parameter specifies the lookup table field that contains sequential values to match against the hash values generated from the source fields.
REP="firstname,lastname"
This parameter specifies the lookup table fields (firstname and lastname) that provide replacement values for thefirstname and lastname fields in the source table.
ID="other.optim_us_person"
This parameter specifies the name of the lookup table, other.optim_us_person.
LIB="db2luw"
This parameter specifies the DBMS of the lookup table, DB2® for Linux®, UNIX, and Windows.
SHA256 algorithm

The following example uses the SHA-256 hashing algorithm to derive a hash value from data in the s1 source field and searches for a matching value in the seq field of the other.optim_us_person lookup table. The provider uses replacement values from the firstname and lastname fields in the lookup table row with the matching value to mask the firstname and lastname fields in the source data. The SHA-256 hashing algorithm uses a seed value from by the HMAC key exit.

pro=hash_lookup, hashfld="seq",
source="s1", rep="firstname, lastname",
algo=sha256, seed=hmac,
id=other.optim_us_person, lib=db2luw, conn=odpplkpu,
user=testuser, pass=testpass,
flddef1=(name=s1, dt=varchar_sz),
flddef2=(name=firstname, dt=varchar_sz),
flddef3=(name=lastname, dt=varchar_sz)

This example uses the following parameters:

HASHFLD="seq"
This parameter specifies the lookup table field that contains sequential values to match against the hash values generated from the source fields.
ALGO=SHA256
The SHA-256 algorithm is a cryptographic-strength hashing algorithm and ensures that masked values cannot be revealed by reverse engineering.
SEED=HMAC
This parameter uses a seed value to influence the masking process that is provided by the user exit. Use the same value each time the provider is used to produce the same masked values for a given source value.
REP="firstname,lastname"
This parameter specifies the lookup table fields (firstname and lastname) that provides replacement values for the firstname and lastname fields in the source table.
ID="other.optim_us_person"
This parameter specifies the name of the lookup table, other.optim_us_person.
LIB="db2luw"
This parameter specifies the DBMS of the lookup table, DB2 for Linux, UNIX, and Windows.

Syntax

The hash lookup privacy provider uses the following syntax:

Masking parameters
PROVIDER = HASH_LOOKUP  , 
	HASHFLD = “hash-field_name“ ,
	[ ALGORITHM = { SHA256 | DEFAULT } ] ,
	[ SEED = { seed-value | HMAC } ] ,
Source parameters
	SOURCE = “srcfld1-name , srcfldn-name“ ,
	[ REMOVE = “remove-chars“ ]  ,  
	[ CASE = UPPER  ]  ,
	[ SRC_DATE_FORMAT = “src-date-format-expression“ ]  ,
	[ PRESERVENULL = “fld1-name , fldn-name“  ] ,
	[ PRESERVESPACES = “fld1-name , fldn-name“  ] ,
	[ PRESERVEZEROLENGTH = “fld1-name , fldn-name“  ] ,
Lookup parameters
	REPLACE = “fld1-name , fldn-name“ ,
	[ WHENNOTFOUND = { PRESERVE | ERROR } ] ,
Hash parameters
	[ WHENNULL = special-value-when-null-fld ]  ,
	[ WHENSPACE = special-value-when-space-fld ]  ,
	[ WHENZEROLEN = special-value-when-zerolen-fld ]  ,
	[ WHENMIXED = special-value-when-mixed-fld ]  ,
Cache parameters
	[ CACHE = { MEMORY | MEMFILE | N } ] ,
	[ CACHESIZE = cache-size-integer ] ,
	[ TMPDIR = temp-directory-location ] ,
Connection parameters
	ID = identifier-string  ,
	LIB = { DB2LUW | DB2ZOS | ORACLE  } ,
	USER = user-name  ,
	PASS = password  ,
	CONN = connection-string  ,
Note: Starting with release 11.3.0.5, the LIB parameter can also be set to MSS for Microsoft SQL Server lookup tables.
Processing parameters
	[ DISCARDLIMIT = discard-limit-value ] ,
Data definition parameters
	FLDDEFn = ( NAME = field-name,    
		DATATYPE = datatype-value, 
		[ PRECISION = field-precision-value ], 
		[ SCALE = field-scale-value ],    
		[ LENGTH = field-length-value ],
		[ CODEPAGE = codepage-value ],
		[ CPTYPE = { DB2ZOS |DB2LUW | ORACLE |SYBASE |ODBC | INFORMIX |NETEZZA |SQLSERVER |TERADATA |ANY |NONE } ] ) ,
	[ CODEPAGE = codepage-value ]  ,
	[ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX |
		     NETEZZA | SQLSERVER | TERADATA | ANY | NONE  } ]

Masking parameters

Parameters that determine how to mask data.

PROVIDER (or PRO)
Required. Enter the provider name, HASH_LKP (or HASH_ LOOKUP).
Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
HASHFLD
Required. Specifies the lookup table field that contains sequential values to match against the hash values generated from the source fields.

This field does not require a FLDDEFn entry.

ALGORITHM (or ALGO)
Specifies the type of hashing algorithm to use.

Enter one of the following options:

SHA256
Specifies to use the SHA-256 (secure hashing algorithm 256 bits) algorithm, or if a user-supplied exit is available, the HMAC (hash-based message authentication code) algorithm. Use the exit to provide a hash key for the HMAC algorithm.
DEFAULT (or DEF)
Specifies to use the default ODPP hash algorithm.
SEED
A literal string that alters the behavior of the hash algorithms. To produce repeatable masked data, use the same seed value.

Enter one of the following options:

"literal"
A numeric value that is enclosed in quotes, which produces a higher degree of hash uniqueness and a repeatable output value. Enter a numeric value within the range 0 - 2,000,000,000.
HMAC
Specifies to use the seed value that is provided by the user exit. A user exit is required.

SEED=HMAC is not compatible with the parameter ALGO=DEF.

Source parameters

Parameters for managing source values.

SOURCE (or SRC)
Required. Specifies the names of the source table fields that provide input values for the hashing function.

Use commas to separate a list of source field names. If a field name has spaces or commas, enclose the name in double quotation marks. If a field name has a double quotation mark character, precede the double quotation mark character with a backslash (\) escape character.

REMOVE
Specifies the characters, within enclosing double quotation marks, to remove from the source value before processing. To specify a double quotation mark character, use a backslash (\) escape character.
CASE
Specifies that the provider will convert the source values to uppercase before hashing.

Enter the following option:

UPPER (or UP)
Convert the source values to uppercase before hashing.
SRC_DATE_FORMAT (or SRCDF)
Specifies, within enclosing double quotation marks, the format of the date string values in the source field.
Use any of the following format specifiers for formatting the date-time strings. All specifiers start with a percent (%) sign. If SRCDF is not specified, the provider uses the following default date format: %DD/%MM/%YYYY %HH24:%MI:%SS:%FFFFFF.
Year
  • %YYYY
  • %YY
Month
  • %MONTH
  • %MMM
  • %MM
Day
%DD
Time
  • %HH
  • %MI
  • %SS
  • %HH24
Fraction of a second
  • %FFFFFF
  • %FFFFF
  • %FFFF
  • %FFF
  • %FF
  • %F

When converting a timestamp or time value in a string format to a privacy provider format, use as many fractions of a second as possible. The provider will automatically handle the fractions if the source value fractions are less than what is specified by SRCDF. Formatting fractions in this manner is useful when want to use the same SRCDF for multiple values with fractions that vary in precision.

AM/PM
%PM

For example, to format a date string as four-digit year, three-character month, two-digit date with dash-type (-) separators, the format string would be “%YYYY-%MMM-%DD”.

Lua column map procedures read the date/time values as a string and follows the system defined date/time format. To use SRCDF in LUA column map procedures, set SCRDF to the system date/time format. In case the format is not clear, write a small LUA column map procedure to print the value of date/time column and use the output to prepare a SRCDF. Please note that this also makes the LUA column map procedure dependent on the system format on which they are run, so the same procedure might require a change in SRCDF if the procedure is run on a system with different date/time format.

PRESERVENULL (or PRENULL)
Specifies source fields in which NULL values are preserved and not replaced with lookup table values.

Use commas to separate a list of source field names. If a field name has spaces or commas, enclose the name in double quotation marks. If a field name has a double quotation mark character, precede the double quotation mark character with a backslash (\) escape character.

PRESERVESPACES (or PRESPACES)
Specifies source fields in which blank values are preserved and not replaced with lookup table values.

Use commas to separate a list of source field names. If a field name has spaces or commas, enclose the name in double quotation marks. If a field name has a double quotation mark character, precede the double quotation mark character with a backslash (\) escape character.

PRESERVEZEROLENGTH (or PREZL)
Specifies source fields in which zero length values are preserved and not replaced with lookup table values.

Use commas to separate a list of source field names. If a field name has spaces or commas, enclose the name in double quotation marks. If a field name has a double quotation mark character, precede the double quotation mark character with a backslash (\) escape character.

Lookup parameters

Parameters for managing lookup table values.

REPLACE (or REP)
Required. Specifies the replacement fields in the lookup table.

Use commas to separate a list of replacement field names. If a field name has spaces or commas, enclose the name in double quotation marks. If a field name has a double quotation mark character, precede the double quotation mark character with a backslash (\) escape character.

WHENNOTFOUND (or WHENNF)
Specifies how to process search field values with no matches in the lookup table. Enter one of the following options:
PRESERVE (or PRE)
If the source field values value are not found in the lookup table, the source field values are preserved.
ERROR (or ERR)
Default. If the source field values value are not found in the lookup table, an error occurs.

Hash parameters

Parameters for managing hash values.

WHENNULL
Specifies an output value to return when the source value is NULL. The default value is -1.
WHENSPACE
Specifies an output value to return when the source value is blank (consisting of spaces only). The default value is -2.
WHENZEROLEN
Specifies an output value to return when the source value is zero length. The default value is -3.
WHENMIXED
Specifies an output value to return when the source value is NULL, blank (consisting of spaces only), or zero length. The default value is -4.

Cache parameters

Parameters for managing a cache of lookup values.

CACHE
Determines whether to maintain a cache of the replacement values during processing. Using a cache is faster when the masking process retrieves a replacement value many times but requires extra memory and optionally, disk space.
Note: It is recommended to specify the length of all character search fields. The length is used to allocate internal data buffers when caching is enabled.

Enter one of the following options:

MEMFILE
Default. Specifies that the cache size is determined by the CACHESIZE value, but when the cache threshold is reached, further rows are written to a file cache. By default, this file is written to the platform temporary-type directory. On Windows, this directory is the system-configured temp-type directory. On UNIX and Linux, this directory is the /tmp directory. To specify the memory file location, use TMPDIR. The file cache file name is in the format: ODPP.Pxxxxxxxx.Tyyyyyyyy.lkp where xxxxxxxx and yyyyyyyy are hex values.
MEMORY (or MEM)
Specifies that the cache size is determined by the CACHESIZE value.
N
Specifies that caching is disabled.
CACHESIZE
Specifies the size of the in-memory cache in KB. This size represents only the in-memory size of the cache. If CACHE=MEMFILE , the cache is limited only by the operating system file limitations. Specify a value (in KB) in the range 1 - 2,147,483,647 (2 GB). The default value is 32 KB.
TMPDIR
Specifies the location of the cache file when CACHE=MEMFILE.

Connection parameters

Parameters for connecting to a lookup table.

ID
Required. Specifies the name of the lookup table.

For DB2 for Linux, UNIX, and Windows and DB2 for z/OS®, specify the schema-name and table-name. Use the format schema.table.

LIB
Specifies the name of the lookup data source (LDS) library. Enter one of the following options:
DB2LUW
DB2 for Linux, UNIX, and Windows
DB2ZOS
DB2 for z/OS
MSS
(Release 11.3.0.5 and later) Microsoft SQL Server
ORACLE
Oracle
lds-library-name
The name of a user-written LDS library.
USER
Required. The user name that is used to access the lookup table database.
PASS
Required. The password for the user name that is used to access the lookup table database.

Use double quotation marks to enclose passwords with special characters. For example, "ABC$".

CONN
Required. The connection string for the lookup table database.

Processing parameters

Parameters for managing provider processes.

DISCARDLIMIT (or DLIM)
Specifies the number of failed rows to discard before the provider stops processing.

Data definition parameters

Parameters for defining source and target data. For further information see, supported data types.

FLDDEFn
Required. Specifies the attributes of input values to use for processing. See Field definition parameter.
CODEPAGE (or CP)
An integer value that specifies the codepage or character-set identifier of the source fields. The default is UTF-8. The CP parameter within the FLDDEFn parameter overrides this value.
CPTYPE (or CPT)
The codepage type of the source fields. The CPT parameter within the FLDDEFn parameter overrides this value.

When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE. As there are no DBMS-specific code pages for Netezza®, a specification of NONE is implied when Netezza is specified.

Enter one of the following values:

Value Description
DBZ (or DB2zOS) DB2 for z/OS
DB2 (or DB2LUW) DB2 for Linux, UNIX, and Windows
IFX (or INFORMIX) Informix®
MSS (or SQLSERVER) Microsoft SQL Server
NZ or NETEZZA Netezza
ODBC ODBC
ORA (or ORACLE) Oracle
SYB (or SYBASE) Sybase
TD or TERADATA Teradata
ANY Any DBMS
NONE No DBMS

Supported data types

The hash lookup privacy provider supports the following data types for source and destination fields:
Data type Description
CHAR Fixed size character data that is left justified and space padded.
WCHAR Fixed size wide character data that is left justified and space padded.
VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow.
WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow.
VARCHAR_SZ Character data string which is terminated by a NULL character.
WVARCHAR_SZ Wide character data string that is terminated by a NULL character.
ODBC_DATE A data type that is used when the source value is a ODPP_ODBC_DATE structure of the Optim™ data masking API.
ODBC_TIME A data type that is used when the source value is a ODPP_ODBC_TIME structure of the Optim data masking API.
ODBC_TIMESTAMP A data type that is used when the source value is a ODPP_ODBC_TIMESTAMP structure of the Optim data masking API.
DATETIME_CHAR A fixed size character data value that is left justified with space padded and contains date-time values. To format this data type value, the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data containing date-time values to follow. To format this data type value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_WCHAR Fixed size wide character data that is left justified with space padded and contains date-time values. To format this data type value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data containing date-time values to follow. To format this data type value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_SZ A character data string containing date-time values and is terminated by a NULL character. To format this data type value, the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DATETIME_WSZ A character data string containing date-time values and is terminated by a NULL character. To format this data type, value the caller must supply a format string using parameter ODPP_OPR_SOURCE_DATE_FORMAT of the Optim data masking API.
DOUBLE Double precision floating point number in the range 1.7E +/- 308 (15 digits).
FLOAT Floating point number in the range 3.4E +/- 38 (7 digits).
LONG_LONG An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
U_LONG_LONG An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615.
INTEGER A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647.
U_INTEGER A 4 byte unsigned integer value in the range 0 to 4,294,967,295.
SMALLINT A 2 byte signed integer value in the range -32,768 to 32,767.
U_SMALLINT A 2 byte unsigned integer value in the range 0 to 65,535.
TINYINT A single byte signed integer value in the range -128 to 127.
U_TINYINT A single byte unsigned integer value in the range 0 to 255.