Data Privacy functions provide various methods to transform or mask sensitive data. They require an Optim Data Privacy License.
The LOOKUP function obtains the value for a destination column from a DB2 table, the lookup table, according to the value in the source column. Use the LOOKUP function to translate a source value to a corresponding look-up value, which is placed in the destination.
There are two forms of the LOOKUP function, single column and multiple column. The single column form inserts a value into a single destination column. The multiple column form inserts values from multiple lookup table columns into corresponding destination columns, which are based on a single source column value. You can use the LOOKUP function alone or in combination with other Column Map functions to achieve the appropriate data masking results.
You can enter the multiple column LOOKUP function for any source column that will be replaced by a lookup table value, but you must edit the Column Map to remove the names of remaining source columns that also will be replaced.
Use the IGNORE parameter to ignore the lookup table and use a source value when a row in a specified source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR].
Use the PRESERVE parameter to ignore the lookup table and use a source value when a source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR]. If the lookup table does not contain a value for a source column, PRESERVE=NOT_FOUND inserts the source column value at the destination.
The LOOKUP function uses the following syntax:
LOOKUP( [sourcecol,...] [dest=(col1,coln,...),] lktablename(search,{value|values=(col1,coln,...) } ) [cache|nocache] [ignore=(colname(spaces,null,zero_len),...] [PRESERVE=([NOT_FOUND]|colname(spaces,null,zero_len),...)])
Use the LOOKUP function to translate the source value in a lookup table to a corresponding value in another table.
For example, assume the source column, STATE, contains state abbreviations (such as NJ) and the destination column is to contain the complete state name (in this example, New Jersey). A lookup table that is named STATE_LOOKUP contains a column (CODE) for state abbreviations or codes and a column (NAME) for the corresponding names.
To obtain the value for the destination column by using the STATE_LOOKUP table, specify:
LOOKUP(STATE,STATE_LOOKUP(CODE,NAME))
The LOOKUP function searches for a value in the CODE column of the STATE_LOOKUP table that matches the value (NJ) in the source table STATE column. When a match is found, the function inserts the corresponding value from the NAME column (New Jersey) in the destination column.
Use the LOOKUP function to insert values from columns in a lookup table row into columns in a destination table row, which is based on a value in a source column.
Here is an example. Based on a source column (SOC_SEC) that contains social security numbers, you can replace values in destination columns (FIRST_NAME and LAST_NAME) with first and last names from a lookup table. A table that is named NAME_LOOKUP contains a column (SSN) with the social security numbers from the source table. That table also contains columns (FIRST_MASK and LAST_MASK) to mask corresponding names in the destination.
To replace names in the destination table that are based on a social security number, specify:
LOOKUP(SOC_SEC,DEST=(FIRST_NAME,LAST_NAME),NAME_LOOKUP(SSN,VALUES=(FIRST_MASK, LAST_MASK)))
The LOOKUP function searches for a value in the SSN column of the NAME_LOOKUP table that matches the value in the source table SOC_SEC column. When a match is found, the function inserts the corresponding values from the lookup table FIRST_MASK and LAST_MASK columns into the corresponding destination columns.
Use the following statement to extend the Single Column Example, where you want to use the source NULL and SPACES values instead of lookup table values:
LOOKUP(STATE,STATE_LOOKUP(CODE,NAME),IGNORE=(STATE(NULL,SPACES)))
Use the following statement to extend the Single Column Example, where you do not want to maintain a table of found lookup values in memory:
LOOKUP(STATE,STATE_LOOKUP(CODE,NAME),NO_CACHE)
The Random LOOKUP function selects a value at random from a specified lookup table to insert in a destination column. The function generates a random number between 1 and the limit or number of rows in the lookup table to use as a subscript into the table. The column value or values from the row that correspond to the subscript are inserted in the destination column.
The Random LOOKUP function is not supported for DECFLOAT data type.
There are two forms of the Random LOOKUP function, single column and multiple column. The single column form inserts a value into a single destination column. The multiple column form inserts values from multiple lookup table columns into corresponding destination columns.
You can enter the multiple column Random LOOKUP function for any source column that will be replaced by a lookup table value, but you must edit the Column Map to remove the names of remaining source columns that also will be replaced.
use the IGNORE parameter to ignore the lookup table and use a source value when a row in a specified source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR].
Use the PRESERVE parameter to ignore the lookup table and use a source value when a source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR]. If the lookup table does not contain a value for a source column, PRESERVE=NOT_FOUND inserts the source column value at the destination.
The Random LOOKUP function uses the following syntax:
RAND_LOOKUP(lktablename,{columnname | dest=(col1,coln,...),values=(col1,coln,...)} [limit] [ignore=(colname(spaces,null,zero_len),...)] [PRESERVE=([NOT_FOUND]|colname(spaces,null,zero_len),...)])
A table of column values is generated in memory. The size of this table might be limited by system resources.
To select a value at random from the STATE column in the first 50 rows of a table that is named STATE_LOOKUP and insert it in the destination column, specify:
RAND_LOOKUP(STATE_LOOKUP,STATE,50)
To select values from the CITY, STATE, and ZIP columns in a random row of a table that is named STATE_LOOKUP and insert them in the corresponding destination columns, specify:
RAND_LOOKUP(STATE_LOOKUP,DEST=(CITY,STATE,ZIP),VALUES=(CITY,STATE,ZIP))
Use the following statement to extend the Single Column Example, where the source column is named STATES and you want to use the source NULL and SPACES values instead of lookup table values:
RAND_LOOKUP(STATE_LOOKUP,STATE,50,IGNORE=(STATES(NULL,SPACES)))
The Hash LOOKUP function obtains the value for a destination column from a lookup table, according to a hashed value derived from a source column. Use the Hash LOOKUP function to consistently mask data when you use the same source and lookup tables in any environment. The source column that is hashed does not need to be a column that will be replaced by lookup table values.
There are two forms of the Hash LOOKUP function, single column and multiple column. The single column form inserts a value into a single destination column. The multiple column form inserts values from multiple lookup table columns into corresponding destination columns, based on a single hash value from a source column.
You can enter the multiple column Hash LOOKUP function for any source column that will be replaced by lookup table values, but you must edit the Column Map to remove the names of remaining source columns that also will be replaced.
The lookup table must include a key column that contains sequential number values without any gaps, and the remaining columns contain replacement values. The key column must be a numeric data type. The lookup table is typically indexed. The function hashes a source column to derive sequential numbers from 1 to the maximum value in the key column of the lookup table. The hashed value from the source table is matched with the sequential numbers in the lookup table, and values from the corresponding lookup table row are inserted at the destination.
The function assigns NULL, SPACES (for CHAR columns), and zero-length VARCHAR values to the numbers -1 (for NULL), -2 (for SPACES), and -3 (for zero-length VARCHAR). The lookup table should include a row for each of these numbers, which will allow you to insert a lookup value for each of these source values. If one of these source values is found and a corresponding number is not in the lookup table, a conversion error is reported.
Use the IGNORE parameter to ignore the lookup table and use a source value when a row in a specified source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR)].
Use the PRESERVE parameter to ignore the lookup table and use a source value when a source column contains a specified value [NULL, SPACES (for CHAR columns), or zero-length VARCHAR]. If the lookup table does not contain a value for a source column, PRESERVE=NOT_FOUND inserts the source column value at the destination.
You can use the SEED parameter to vary the calculation that is performed by the hashing algorithm. The hashed value from the source column and the SEED value are matched with a sequential number from the lookup table to obtain the replacement value for the destination column.
Here is the syntax:
HASH_LOOKUP( [sourcecol,...] [trim=([char1char2...][\u][\r])] [dest=(col1,coln )] lktablename(search,{value |values=(col1,coln,...)}) [cache|nocache] [ignore=(colname(spaces,null,zero_len),...)] | [PRESERVE=([NOT_FOUND]|colname(spaces,null,zero_len),...)] [seed=n])
To specify a backslash, \, or a right parentheses, ), you must precede the character with a backslash escape character. For example, to specify a right parenthesis, enter: trim=(\)). You can use the escape character only with a backslash, a right parenthesis, the uppercase indicator, or as part of the indicator to remove trailing blanks (\r).
Use the Hash LOOKUP function to insert values from a column in a lookup table into a destination table column, based on a value that is hashed from a source column.
For example, assume the source column, FIRST_NAME, contains first names and the destination column includes replacement first names from the lookup table. A lookup table, NAME_LOOKUP, contains a column (FIRST) with first names and a column (SEQ) containing sequential values.
To obtain values for the destination column by using the NAME_LOOKUP table, specify:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ,FIRST))
The Hash LOOKUP function matches the hash values from the source column with values in the SEQ column of the NAME_LOOKUP table. When a match is found, the function inserts the corresponding value from the FIRST column into the destination column.
Use the Hash LOOKUP function to insert values from columns in a lookup table row into columns in a destination table row, based on a value that is hashed from a source column.
Here is an example. Based on values that are hashed from a source column (FIRST_NAME) that contains first names, you can replace values in destination columns (FIRST and LAST) with first and last names from a lookup table. A lookup table that is named NAME_LOOKUP contains a column (SEQ) with sequential values. That table also contains columns (FIRST_MASK and LAST_MASK) to mask values in the destination.
To replace names in the destination table based on a value hashed from a source column, specify:
HASH_LOOKUP(FIRST_NAME,DEST=(FIRST,LAST),NAME_LOOKUP(SEQ,VALUES=(FIRST_MASK,LAST_MASK)))
The Hash LOOKUP function matches the hash values from the source FIRST_NAME column with values in the SEQ column of the NAME_LOOKUP table. When a match is found, the function inserts the corresponding values from the lookup table FIRST_MASK and LAST_MASK columns into the corresponding destination columns.
Use the following statement to extend the Single Column Example, where you want to use the source NULL and SPACES values instead of lookup table values:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ,FIRST),IGNORE=(FIRST_NAME(NULL,SPACES)))
Use the following statement to extend the Single Column Example, where you do not want to maintain a table of found lookup values in memory:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ,FIRST),NO_CACHE)
Use the following statement to trim spaces and commas from the source value and convert the source value to uppercase before it is hashed:
HASH_LOOKUP(FIRST_NAME,TRIM=( ,\u),NAME_LOOKUP(SEQ,FIRST))
HASH_LOOKUP(FULL_NAME,TRIM=( ),NAME_LOOKUP(SEQ,FULLNAME))
The Move AGE function:
AGE increments dates based on various parameters. To specify parameters to age dates before inserting them at the destination, use the AGE line command. This command invokes the Aging Specifications panel, which allows you to provide values to generate the AGE function and its parameters. Assume that AGE was specified for the ORDER_SHIP_DATE column. Some Aging Specifications values for the column are included in the following figure.
Figure: Aging Specifications
.--------------------- Aging Specifications --------------------. | Command ===> | | | | Source Column ===> ORDER_SHIP_DATE > | | Destination Column : ORDER_SHIP_DATE > | | | | Input Date Format ===> | | Output Date Format ===> | | Aging Rule ===> | | Rule Table ===> | | Pivot Year ===> (00 – 99) | | | | Specify Explicit Date or Aging Amount | | Explicit Date ===> YYYY/MM/DD | | | | Aging Amount | | Years ===> (-2500 to +1581) | | Months ===> (-30000 to +30000) | | Weeks ===> (-30000 to +30000) | | Days ===> (-99999 to +99999) | | Business Rules ===> (0 to 30000) | | | | Propagate Aging ===> (Y-Yes, N-No) | '---------------------------------------------------------------'
The Input Date Format defines a two-digit year, so the Pivot Year is included to determine the century. An Output Date Format is not specified in the figure, but you might specify a different format, such as MMDDYYYY, to adjust the input to a four-digit year output.
The other values, such as aging values and rules, are not specified. These values are to be supplied for each process when the Column Map is used.
The Aging Specifications panel includes the following items:
Use an asterisk to display a selection list of formats valid for the column. If not specified, the Input Data Format is used. For more information about specifying the date format, see Date Formats.
The combined values for Years, Months, Weeks, and Dayscannot result in an aging amount greater than 1581 years. If it does exceed that amount, an error results when the aging is attempted.
You can direct Move to age values in DATE and TIMESTAMP columns by using the global specifications for the processor. However, if you want to age specific DATE and TIMESTAMP columns exclusively, age these columns to unique values, or use an exit, specify the AGE function for the column. These DB2 defined data types must have the format DB2DATE.
DATE and TIMESTAMP columns that are defined as nullable are skipped if they contain NULL. These columns are not aged and are included in the reported count of skipped columns. These skipped columns are handled the same as other skipped values according to the process specifications.
Date formats specify the format of a date column. These are encoded internally as a list of possible data types in a single format table. New formats can be added easily to the table without requiring coding for each individual format.
Each date format is specified by its name. Specify these formats in the Input Date Format and Output Date Format prompts on the Aging Specifications panel. You can also display a selection list of values appropriate for the specific column data type by entering an asterisk in these prompts.
Valid names are determined by the data type of the column. Move can support a date format of MMDDYY as a character column and a decimal column, even though these are processed differently.
The formats that are distributed with Move are formatted with the following characters to represent the date component.
Move supports a wide variety of internal date storage schemes. For instance, a YYMMDD date column might be stored in various ways:
Character formats and numeric formats, as either packed decimal or binary columns, are allowed for any format without delimiters. The following are some examples of date formats.
After you define values for the AGE function on the Aging Specifications panel, use END to return to the Column Map editor. The AGE function is identified by the string AGE(column) where column is the name of the source column to be aged. You can modify these values by using the AGE line command again to redisplay the Aging Specifications panel. However, you must use the CLR line command to remove the entire AGE function specification or the SRC line command to replace the AGE function with the source column.