Affinity privacy provider

Use the affinity privacy provider to mask data while maintaining the format and character types of the source values. For example, the provider can maintain the format of data such as account numbers or driver's license numbers while masking character data with character data and numeric data with numeric data.

If the source data is uppercase, alphabetic characters, the affinity privacy provider generates uppercase, alphabetic characters at the destination. Source data in lowercase alphabetic characters will produce lowercase alphabetic characters at the destination. Numeric data in the source generates numeric data at the destination. The provider masks alphabetic and numeric characters, but copies other characters in the source data to the destination. For example, a credit card number in the format nnnn nnnn nnnn nnnn is masked to a different number that includes spaces at the same intervals, while one formatted with dashes is masked as nnnn nnnn nnnn nnnn. Using this provider, you can generate unique values, a different value for each occurrence of the same source, and values with a length different from the source.

Note:
  • Affinity currently does not support columns that contain multibyte data (MBCS and DBCS). Attempting to use Affinity with MBCS or DBCS character data produces unpredictable results.
  • Affinity masks Latin English alphabetic characters only (A-Z, a-z). Alphabetic characters such as ñ ç or ü are copied to the destination without being changed.

The provider includes two algorithms for masking data, a default algorithm and a format preserving encryption (FPE) algorithm.

The FPE algorithm is encryption-based and offers stronger masking capability than the default. The FPE algorithm is based on the Advanced Encryption Standard 256-bit (AES-256) algorithm, can optionally use an encryption key that is supplied by the user, and can produce masked values that are unique. The same user-supplied key produces repeatable masked values. For this reason, knowledge of the key should be secured from unauthorized users to prevent reverse engineering to discover the original values.

The FPE algorithm also produces outputs that are extremely varied and without a discernible pattern. For example, two close or similar source values (such as 001 and 002) are masked with values that are not similar (such as 196 and 837), thus masking any pattern these values may have had in the source data.

The TWEAKS parameter, available with the FPE algorithm only, can mask identical strings within source values by using other strings within each source value to influence the masking process. For example, for two source values such as ABC-123 and DEF-123, the TWEAKS parameter can mask the 123 string in each value by using the preceding string to produce the masked values ABC-981 and DEF-704.

While the FPE algorithm offers an industry standard level of encryption, consider using the default algorithm when your data does not require such encryption. The most significant differences between the two algorithms are the degree of varied masking patterns, the time that is required to complete the masking effort, and the strength of the masked results. The default algorithm usually outperforms the FPE algorithm in processing time, as the FPE algorithm in most cases takes more time to complete. The complexity of the input data can add more time to the masking operation of the FPE algorithm, depending on the amount of parallel processing and I/O being performed.

Examples

Account numbers
Masking account numbers might require that the account number format is maintained while producing repeatable results for testing. To meet these requirements, you might use the following syntax:
trans pro=aff, algo=fpe, mtd=rep, key="Xyz123", rule=num, whenmatch=prefix("33"), 
flddef1=(name=acctnbr,  datatype=varchar)

This example uses the following parameters:

ALGO=FPE
The FPE algorithm offers a high level of encryption and produces masked values that are much less likely to reveal the original source value than the default.
MTD=REP
This parameter produces consistent target values when the same data is processed multiple times.
KEY="fpekey123"
This parameter provides the case-sensitive encryption key for the FPE algorithm. To produce repeatable results, use the same key value.
RULE=NUM
This parameter ensures that numeric source values produce numeric target values.
WHENMATCH=PREFIX
This parameter prevents identical source and target values by adding a prefix to target values that match the source.

This syntax produces following results:

Source values Masked values
acct12345 uaho84506
acct12345 uaho84506
acct56789 uaho66769
Driver's license numbers
Masking drivers license numbers might require that the format is maintained, as the format might be unique to the issuing agency. To meet these requirements, you might use the following syntax:
trans pro=aff, algo=FPE, mtd=rep, key="fpekey123", flddef1=(name=driver_number,dt=varchar)

This example uses the following parameters:

ALGO=FPE
The FPE algorithm offers a high level of encryption and produces masked values that are much less likely to reveal the original source value than the default.
MTD=REP
This parameter produces consistent target values when the same data is processed multiple times.
KEY="fpekey123"
This parameter provides the case-sensitive encryption key for the FPE algorithm. To produce repeatable results, use the same key value.

This syntax produces following results:

Source values Masked values
M2267 89890 34567 O0280 50902 54225
M2267 89890 34567 O0280 50902 54225
P2267 89890 34567 D9254 86713 09303
Passport numbers
Passport number data might contain identical strings that occur among multiple values. To produce masked data that does not include such strings, use the TWEAKS parameter. This data might also contain hyphens that can be removed from masked output by using the REMOVE parameter. To meet these requirements, you might use the following syntax:
trans pro=aff, algo=fpe, mtd=rep, key="fpekey123", iterations=20, tweaks=yes, 
flddef1=(name=passport_nbr,  dt=varchar)

This example uses the following parameters:

ALGO=FPE
The FPE algorithm offers a high level of encryption and produces masked values that are much less likely to reveal the original source value than the default.
MTD=REP
This parameter produces consistent target values when the same data is processed multiple times.
KEY="fpekey123"
This parameter provides the case-sensitive encryption key for the FPE algorithm. To produce repeatable results, use the same key value.
ITERATIONS=20
This parameter determines the number of times the FPE algorithm processes each value to ensure uniqueness.
TWEAKS=YES
This parameter ensures that identical strings are masked with unique strings.

This syntax produces following results:

Source values Masked values
US20139999 YT03385676
US20139999 YT03385676
CA11233388 AU96923919

Some types of data you can mask using the Affinity provider are:

Syntax

The affinity privacy provider uses the following syntax:

TRANS PROVIDER = AFFINITY,
[ ALGO = {DEFAULT | FPE } ] ,
 METHOD = {REPEATABLE | HASH }, 

These parameters are valid when ALGO=FPE is specified.

	[ KEY = { "literal" | @environment-variable-name } ]  ,
	[ ITERATIONS = iterations-value ]  ,
	[ TWEAKS = { YES | NO } ]  ,

This parameter can be used when ALGO=DEFis specified.

	[ SEED = { "literal" | @variable | RANDOM } ]  ,

The following are Data definition parameters:

FLDDEFn = ( NAME = colname,    
		DATATYPE = datatype-value, 
		[ PRECISION = field-precision-value ], 
		[ SCALE = field-scale-value ],    
		[ LENGTH = field-length-value ],
		[ CODEPAGE = codepage-value ],
		[ CPTYPE = { DB2ZOS | ODBC |ANY |NONE } ] ) , 
 [ CODEPAGE = codepage-value ]
 	, 

Use these parameters for CHAR data:

	[ LENGTH = { n | MAX } ]  ,
	[ COPY = (copy-start,copy-len [,"copy-lit"]) …   ]  , 

The following parameters are for Numeric data:

	[ RULE = { NUMERIC | BINARY | SCALED } ]  ,

Source value parameters are as follows:

	[ CASE = UPPER ] ,      
	[ REMOVE = "remove-chars" ] , 
	[ TRIM = RIGHT ] ,      	

Processing parameters:

	[ DISCARDLIMIT = discard-limit-value ],

Masking parameters

Parameters that determine how to mask data.

PROVIDER (or PRO)
Required. Enter the provider name, AFFINITY (or AFF).
Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
ALGO
The algorithm to use. Enter one of the following options:
DEFAULT (or DEF)
Mask data by using the default algorithm.

ALGO=DEF is not compatible with the parameters KEY, ITERATIONS, and TWEAKS.

FPE
Mask data by using the FPE algorithm. Use the KEY parameter to enter the encryption key.

ALGO=FPE is not compatible with the parameters SEED and MTD=HASH.

METHOD (or MTD)
The masking method to use. This parameter is required. For small or less complex data sets, the HASH method has a better strength of masking compared to the REPEATABLE method. The HASH method might provide slightly slower processing compared to the REPEATABLE method. Enter one of the following options:
REPEATABLE (or REP)
Generates consistently repeatable and unique output values. The same masked value is generated for every instance of a given source value, and each unique source value has a unique output value.
HASH
Generates output values by using a hash algorithm. The output values might not be unique nor repeatable between masking operations.

MTD=HASH is not compatible with the parameter ALGO=FPE.

ALGO=FPE parameters

Parameters for use with ALGO=FPE only.

KEY
The encryption key. To produce repeatable masked data, use the same key value.

Enter one of the following options:

"literal"
A 1 - 32 character alphanumeric string (including special characters such as $, @, and #). A NULL value is not valid.
@environment-variable-name
Name of an environment variable that provides the encryption key. The environment variable name must be prefixed with the at sign (@) and assigned a value before invoking the masking provider.
ITERATIONS
The number of times the FPE algorithm processes each source value to avoid creating a masked value that matches the source value. Enter an integer in the range 1 - 99. The default is 12. A value less than 12 has a negligible improvement on performance.
TWEAKS
If two values contain identical strings, determines whether to mask the identical strings differently for each combination of non-identical characters.

TWEAKS is not applicable to numeric data types or with the parameters ALGO=DEF and RULE.

Enter one of the following options:

YES
Default. Use tweaks to mask identical strings in source values by using other strings within each source value to influence the masking process. For example, if TWEAKS=YES, the provider might mask values AB-1234 and CD-1234 to DX-4795 and NR-3687, rather than DX-4795 and NR-4795.
NO
Do not use tweaks.

ALGO=DEF parameters

Parameters for use with ALGO=DEF only.

SEED
Use the SEED parameter to influence the masking process and to produce repeatable or non-repeatable output values. To produce repeatable masked data, use the same seed value.

Enter one of the following options:

Parameters for producing repeatable values:
"literal"
A character string, within enclosing double quotation marks. Any special characters in the string are ignored. The string is case-sensitive. Note that there is no particular advantage to providing a lengthy string. Using a literal potentially exposes the seed value and might allow reverse engineering to discover the original values.
@variable
The name of an environment variable, preceded by the at sign (@). To prevent reverse engineering to discover the original values, the variable can be created by the Optim™ administrator or other authorized person and secured from unauthorized access.
Parameter for producing non-repeatable values:
RANDOM (or RAN)
Specifies a random seed value, which produces a non-repeatable output value. This option can be used as a test data generator by concatenating the output of multiple runs. This option might produce duplicate values each time the process is run.

CHAR parameters

Parameters for use with CHAR character types.

LENGTH (or LEN)
For character data types only. Specifies a fixed length for each destination value, regardless of the length of the source value. The source value is repeated or truncated to match the LENGTH value before processing. If this parameter is not specified, the length of the destination value is the same as the source value.

The trailing blanks of an unfilled CHAR input value are considered part of a source value, resulting in an output value that also has trailing blanks. To include only non-blank characters in the output value, also specify TRIM=RIGHT.

The destination value might not be unique when the LENGTH parameter is specified. For example, if LEN=2 and the unique values test1 and test2 are input, the provider masks only the first two characters in each value (te) and generates duplicate output values.

LEN is not compatible with the parameter RULE.

Enter one of the following options:

n
An integer value that specifies the destination value length. The value cannot exceed the length of the destination column.
MAX
Generate a value that matches the length of the destination column.
COPY
For character data types only. Specifies one or more pairs of substrings to copy to the destination value. The provider can also replace characters with a literal string. Note that this parameter might cause non-unique destination values.

If the literal string is shorter than the substring, the provider replicates the literal until the lengths match. If the literal is longer than the substring, it is truncated.

Spaces are supported within literals only (for example, X Y).

COPY is not compatible with the parameter RULE.

Enter the following options:

copy-start
The starting position of the substring to copy to the destination.
copy-len
The length of the substring to copy to the destination.
copy-lit
A literal string that replaces the source characters in the specified positions. If the literal string is shorter than the substring, the provider replicates the literal until the lengths match. If the literal is longer than the substring, it is truncated.

Numeric parameters

Parameters for use with numbers and numeric data types.

RULE
Specifies processing rules to ensure that numeric strings are properly masked as numeric values.

Enter one of the following options:

NUMERIC (or NUM)
For numeric values within character data types only, specifies that numeric values generate masked numeric values instead of other characters. Without this option, the value 12345 in a character type could produce a masked value that contains alphabetic characters. If a source value is not a number, the provider processes the value normally without an error or warning message. Use this option for the following situations:
  • To transform integers in a character data type column to match the values of a numeric data type column. This usage maintains DBMS-type foreign key integrity across differing data types.
  • To correctly transform floating point numbers in character data type columns. Without this parameter, the exponent character E (or e) is transformed into another letter.
  • To maintain leading zeros that might otherwise be removed. For example, to prevent a value such as 00015 from being masked as a two-digit value.

RULE=NUM is not compatible with the parameters COPY and LEN.

BINARY (or BIN)
Specifies that source values of a numeric data type are transformed into values that conform to the provider data type restrictions. The output value might differ from the source value in order of magnitude and sign. Without this parameter, source values in numeric data type columns are converted to character strings and then converted back to the original data type.

Without RULE=BIN, there is a discrepancy between the number of digits that are required to represent a numeric value in character format and the maximum value that a given number of digits can express. For example, an unsigned short integer needs five-digit positions to hold the maximum unsigned value of 65535, but five-digit positions can express a maximum string value of 99999, which when converted back to an unsigned short integer results in an overflow. Without RULE=BIN, boundary source values in numeric data type columns might be transformed into values that use this discrepancy and do not conform to the storage constraints.

RULE=BIN is not compatible with the parameters COPY, LEN, TRIM, REMOVE, and CASE.

SCALED (or SCA)
Specifies that the output value has the same sign and approximately the same magnitude as the source value of the numeric data type. For example, when masking data such as salary figures, use this option to ensure that a value such as 12,345.67 does not produce a value such as -3.1.

Source value parameters

Parameters for handling source values.

CASE
Specifies that the provider converts the source values to uppercase before processing.

CASE is not compatible with the parameters RULE=BIN and RULE=SCALED.

Enter the following option:

UPPER (or UP)
Convert the source values to uppercase before processing.
REMOVE
Specifies the characters, within enclosing double quotation marks, to remove from the source value before processing.

REMOVE is not compatible with the parameters RULE=BIN and RULE=SCALED.

TRIM
Specifies to remove trailing spaces from the source value before processing.

TRIM is not compatible with the parameters RULE=BIN and RULE=SCALED.

Enter the following option:

RIGHT
Right-trim the source value to remove trailing spaces before processing.

Processing parameters

Parameters for managing provider processes.

DISCARDLIMIT (or DLIM)
Specifies the number of failed rows to discard before the provider stops processing.

Data definition parameters

Parameters for defining source and destination data.

FLDDEFn
Required. Describes a source column to be processed. See Field definition parameter.
CODEPAGE (or CP)
An integer value that specifies the codepage or character-set identifier of the source column. The default codepage in z/OS is derived from the operating system default value, for example, 037. Codepage 037 is the basic EBCDIC codepage in the U.S. The CP parameter within the FLDDEFn parameter overrides this value.
CPTYPE (or CPT)
The codepage type of the source column. The CPT parameter within the FLDDEFn parameter overrides this value.

When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE.

Enter one of the following values:

Value Description
DBZ (or DB2zOS) DB2® for z/OS®
ODBC ODBC
ANY Any DBMS
NONE No DBMS

Affinity provider - Supported data types

The Affinity provider supports the following data types. Note that some data types are not compatible with RULE=BIN and RULE=SCALED.

DB2 data type ODPP equivalent Description RULE=BIN and RULE=SCALED
CHAR CHAR Fixed size character data that is left justified and space padded. No
INT INTEGER A 4 byte signed integer value in the range -2,147,483,648 to 2,147,483,647. Yes
VARCHAR VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow.  
SMALLINT SMALLINT A 2 byte signed integer value in the range -32,768 to 32,767. Yes
DECIMAL DECIMAL_370 Packed decimal encoded buffer. No
FLOAT FLOAT or DOUBLE based on precision Double precision floating point number in the range 1.7E +/- 308 (15 digits). Floating point number in the range 3.4E +/- 38 (7 digits). Yes
BIGINT LONG_LONG or U_LONG_LONG An 8 byte signed numeric value in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615. Yes