Use the affinity privacy provider to mask data while maintaining
the format and character types of the source values. For example,
the provider can maintain the format of data such as account numbers
or driver's license numbers while masking character data with character
data and numeric data with numeric data.
If the source data is uppercase, alphabetic characters,
the affinity privacy provider generates uppercase, alphabetic characters
at the destination. Source data in lowercase alphabetic characters
will produce lowercase alphabetic characters at the destination. Numeric
data in the source generates numeric data at the destination. The
provider masks alphabetic and numeric characters, but copies other
characters in the source data to the destination. For example, a credit
card number in the format nnnn nnnn nnnn nnnn is
masked to a different number that includes spaces at the same intervals,
while one formatted with dashes is masked as nnnn nnnn nnnn
nnnn. Using this provider, you can generate unique values,
a different value for each occurrence of the same source, and values
with a length different from the source.
Note:
- Affinity currently does not support columns that contain multibyte
data (MBCS and DBCS). Attempting to use Affinity with MBCS or DBCS
character data produces unpredictable results.
- Affinity masks Latin English alphabetic characters only (A-Z,
a-z). Alphabetic characters such as ñ ç or ü are copied to the destination
without being changed.
The provider includes two algorithms for masking data,
a default algorithm and a format preserving encryption (FPE) algorithm.
The
FPE algorithm is encryption-based and offers stronger masking capability
than the default. The FPE algorithm is based on the Advanced Encryption
Standard 256-bit (AES-256) algorithm, can optionally use an encryption
key that is supplied by the user, and can produce masked values that
are unique. The same user-supplied key produces repeatable masked
values. For this reason, knowledge of the key should be secured from
unauthorized users to prevent reverse engineering to discover the
original values.
The FPE algorithm also produces outputs that
are extremely varied and without a discernible pattern. For example,
two close or similar source values (such as 001 and 002) are masked
with values that are not similar (such as 196 and 837), thus masking
any pattern these values may have had in the source data.
The
TWEAKS parameter, available with the FPE algorithm only, can mask
identical strings within source values by using other strings within
each source value to influence the masking process. For example,
for two source values such as ABC-123 and DEF-123, the TWEAKS parameter
can mask the 123 string in each value by using the preceding string
to produce the masked values ABC-981 and DEF-704.
While the
FPE algorithm offers an industry standard level of encryption, consider
using the default algorithm when your data does not require such encryption.
The most significant differences between the two algorithms are the
degree of varied masking patterns, the time that is required to complete
the masking effort, and the strength of the masked results. The default
algorithm usually outperforms the FPE algorithm in processing time,
as the FPE algorithm in most cases takes more time to complete. The
complexity of the input data can add more time to the masking operation
of the FPE algorithm, depending on the amount of parallel processing
and I/O being performed.
Examples
- Account numbers
- Masking account numbers might require that the account number
format is maintained while producing repeatable results for testing.
To meet these requirements, you might use the following syntax:
trans pro=aff, algo=fpe, mtd=rep, key="Xyz123", rule=num, whenmatch=prefix("33"),
flddef1=(name=acctnbr, datatype=varchar)
This example
uses the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
- RULE=NUM
- This parameter ensures that numeric source values produce numeric
target values.
- WHENMATCH=PREFIX
- This parameter prevents identical source and target values by
adding a prefix to target values that match the source.
This syntax produces following results:
| Source values |
Masked values |
| acct12345 |
uaho84506 |
| acct12345 |
uaho84506 |
| acct56789 |
uaho66769 |
- Driver's license numbers
- Masking drivers license numbers might require that the format
is maintained, as the format might be unique to the issuing agency.
To meet these requirements, you might use the following syntax:
trans pro=aff, algo=FPE, mtd=rep, key="fpekey123", flddef1=(name=driver_number,dt=varchar)
This
example uses the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
This syntax produces following results:
| Source values |
Masked values |
| M2267 89890 34567 |
O0280 50902 54225 |
| M2267 89890 34567 |
O0280 50902 54225 |
| P2267 89890 34567 |
D9254 86713 09303 |
- Passport numbers
- Passport number data might contain identical strings that occur
among multiple values. To produce masked data that does not include
such strings, use the TWEAKS parameter. This data might also contain
hyphens that can be removed from masked output by using the REMOVE
parameter. To meet these requirements, you might use the following
syntax:
trans pro=aff, algo=fpe, mtd=rep, key="fpekey123", iterations=20, tweaks=yes,
flddef1=(name=passport_nbr, dt=varchar)
This example
uses the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
- ITERATIONS=20
- This parameter determines the number of times the FPE algorithm
processes each value to ensure uniqueness.
- TWEAKS=YES
- This parameter ensures that identical strings are masked with
unique strings.
This syntax produces following results:
| Source values |
Masked values |
| US20139999 |
YT03385676 |
| US20139999 |
YT03385676 |
| CA11233388 |
AU96923919 |
Some types of data you can mask using the Affinity
provider are:
- Account numbers - Masking account numbers might require that the
account number format is maintained while producing repeatable results
for testing.
- Driver's license numbers - Masking drivers license numbers might
require that the format is maintained, as the format might be unique
to the issuing agency.
- Passport numbers - Passport number data might contain identical
strings that occur among multiple values.
Syntax
The affinity privacy provider uses
the following syntax:
TRANS PROVIDER = AFFINITY,
[ ALGO = {DEFAULT | FPE } ] ,
METHOD = {REPEATABLE | HASH },
These
parameters are valid when ALGO=FPE is specified.
[ KEY = { "literal" | @environment-variable-name } ] ,
[ ITERATIONS = iterations-value ] ,
[ TWEAKS = { YES | NO } ] ,
This parameter can be used when ALGO=DEFis
specified.
[ SEED = { "literal" | @variable | RANDOM } ] ,
The following are Data definition parameters:
FLDDEFn = ( NAME = colname,
DATATYPE = datatype-value,
[ PRECISION = field-precision-value ],
[ SCALE = field-scale-value ],
[ LENGTH = field-length-value ],
[ CODEPAGE = codepage-value ],
[ CPTYPE = { DB2ZOS | ODBC |ANY |NONE } ] ) ,
[ CODEPAGE = codepage-value ]
,
Use these parameters for CHAR data:
[ LENGTH = { n | MAX } ] ,
[ COPY = (copy-start,copy-len [,"copy-lit"]) … ] ,
The following parameters are for Numeric data:
[ RULE = { NUMERIC | BINARY | SCALED } ] ,
Source value parameters are as follows:
[ CASE = UPPER ] ,
[ REMOVE = "remove-chars" ] ,
[ TRIM = RIGHT ] ,
Processing
parameters:
[ DISCARDLIMIT = discard-limit-value ],
Masking parameters
Parameters
that determine how to mask data.
-
- PROVIDER (or PRO)
- Required. Enter the provider name, AFFINITY (or AFF).
Note: The
PRO parameter must be first in the masking string. All other parameters
can appear in any order.
-
- ALGO
- The algorithm to use. Enter one of the following options:
- DEFAULT (or DEF)
- Mask data by using the default algorithm.
ALGO=DEF is not compatible
with the parameters KEY, ITERATIONS, and TWEAKS.
- FPE
- Mask data by using the FPE algorithm. Use the KEY parameter to
enter the encryption key.
ALGO=FPE is not compatible with the parameters
SEED and MTD=HASH.
- METHOD (or MTD)
- The masking method to use. This parameter is required. For small
or less complex data sets, the HASH method has a better strength of
masking compared to the REPEATABLE method. The HASH method might provide
slightly slower processing compared to the REPEATABLE method. Enter
one of the following options:
- REPEATABLE (or REP)
- Generates consistently repeatable and unique output values. The
same masked value is generated for every instance of a given source
value, and each unique source value has a unique output value.
- HASH
- Generates output values by using a hash algorithm. The output
values might not be unique nor repeatable between masking operations.
MTD=HASH
is not compatible with the parameter ALGO=FPE.
ALGO=FPE parameters
Parameters
for use with ALGO=FPE only.
-
- KEY
- The encryption key. To produce repeatable masked data, use the
same key value.
Enter one of the following options:
- "literal"
- A 1 - 32 character alphanumeric string (including special characters
such as $, @, and #). A NULL value is not valid.
- @environment-variable-name
- Name of an environment variable that provides the encryption key.
The environment variable name must be prefixed with the at sign (@)
and assigned a value before invoking the masking provider.
- ITERATIONS
- The number of times the FPE algorithm processes each source value
to avoid creating a masked value that matches the source value. Enter
an integer in the range 1 - 99. The default is 12. A value less than
12 has a negligible improvement on performance.
- TWEAKS
- If two values contain identical strings, determines whether to
mask the identical strings differently for each combination of non-identical
characters.
TWEAKS is not applicable to numeric data types or with
the parameters ALGO=DEF and RULE.
Enter one of the following
options:
- YES
- Default. Use tweaks to mask identical strings in source values
by using other strings within each source value to influence the masking
process. For example, if TWEAKS=YES, the provider might mask values
AB-1234 and CD-1234 to DX-4795 and NR-3687, rather than DX-4795 and
NR-4795.
- NO
- Do not use tweaks.
ALGO=DEF parameters
Parameters
for use with ALGO=DEF only.
-
- SEED
- Use the SEED parameter to influence the masking process and to
produce repeatable or non-repeatable output values. To produce repeatable
masked data, use the same seed value.
Enter one of the following
options:
- Parameters for producing repeatable values:
- "literal"
- A character string, within enclosing double quotation marks. Any
special characters in the string are ignored. The string is case-sensitive.
Note that there is no particular advantage to providing a lengthy
string. Using a literal potentially exposes the seed value and might
allow reverse engineering to discover the original values.
- @variable
- The name of an environment variable, preceded by the at sign (@).
To prevent reverse engineering to discover the original values, the
variable can be created by the Optim™ administrator
or other authorized person and secured from unauthorized access.
- Parameter for producing non-repeatable values:
- RANDOM (or RAN)
- Specifies a random seed value, which produces a non-repeatable
output value. This option can be used as a test data generator by
concatenating the output of multiple runs. This option might produce
duplicate values each time the process is run.
CHAR parameters
Parameters
for use with CHAR character types.
-
- LENGTH (or LEN)
- For character data types only. Specifies a fixed length for each
destination value, regardless of the length of the source value. The
source value is repeated or truncated to match the LENGTH value before
processing. If this parameter is not specified, the length of the
destination value is the same as the source value.
The trailing
blanks of an unfilled CHAR input value are considered part of a source
value, resulting in an output value that also has trailing blanks.
To include only non-blank characters in the output value, also specify
TRIM=RIGHT.
The destination value might not be unique when the
LENGTH parameter is specified. For example, if LEN=2 and the unique
values test1 and test2 are
input, the provider masks only the first two characters in each value
(te) and generates duplicate output values.
LEN is not compatible
with the parameter RULE.
Enter one of the following options:
- n
- An integer value that specifies the destination value length.
The value cannot exceed the length of the destination column.
- MAX
- Generate a value that matches the length of the destination column.
- COPY
- For character data types only. Specifies one or more pairs of
substrings to copy to the destination value. The provider can also
replace characters with a literal string. Note that this parameter
might cause non-unique destination values.
If the literal string
is shorter than the substring, the provider replicates the literal
until the lengths match. If the literal is longer than the substring,
it is truncated.
Spaces are supported within literals only (for
example, X Y).
COPY is not compatible
with the parameter RULE.
Enter the following options:
- copy-start
- The starting position of the substring to copy to the destination.
- copy-len
- The length of the substring to copy to the destination.
- copy-lit
- A literal string that replaces the source characters in the specified
positions. If the literal string is shorter than the substring, the
provider replicates the literal until the lengths match. If the literal
is longer than the substring, it is truncated.
Numeric parameters
Parameters
for use with numbers and numeric data types.
-
- RULE
- Specifies processing rules to ensure that numeric strings are
properly masked as numeric values.
Enter one of the following options:
- NUMERIC (or NUM)
- For numeric values within character data types only, specifies
that numeric values generate masked numeric values instead of other
characters. Without this option, the value 12345 in a character type
could produce a masked value that contains alphabetic characters.
If a source value is not a number, the provider processes the value
normally without an error or warning message. Use this option for
the following situations:
- To transform integers in a character data type column to match
the values of a numeric data type column. This usage maintains DBMS-type
foreign key integrity across differing data types.
- To correctly transform floating point numbers in character data
type columns. Without this parameter, the exponent character E (or
e) is transformed into another letter.
- To maintain leading zeros that might otherwise be removed. For
example, to prevent a value such as 00015 from being masked as a two-digit
value.
RULE=NUM is not compatible with the parameters COPY and LEN.
- BINARY (or BIN)
- Specifies that source values of a numeric data type are transformed
into values that conform to the provider data type restrictions. The
output value might differ from the source value in order of magnitude
and sign. Without this parameter, source values in numeric data type
columns are converted to character strings and then converted back
to the original data type.
Without RULE=BIN, there is a discrepancy
between the number of digits that are required to represent a numeric
value in character format and the maximum value that a given number
of digits can express. For example, an unsigned short integer needs
five-digit positions to hold the maximum unsigned value of 65535,
but five-digit positions can express a maximum string value of 99999,
which when converted back to an unsigned short integer results in
an overflow. Without RULE=BIN, boundary source values in numeric data
type columns might be transformed into values that use this discrepancy
and do not conform to the storage constraints.
RULE=BIN is not
compatible with the parameters COPY, LEN, TRIM, REMOVE, and CASE.
- SCALED (or SCA)
- Specifies that the output value has the same sign and approximately
the same magnitude as the source value of the numeric data type. For
example, when masking data such as salary figures, use this option
to ensure that a value such as 12,345.67 does not produce a value
such as -3.1.
Source
value parameters
Parameters for handling source values.
-
- CASE
- Specifies that the provider converts the source values to uppercase
before processing.
CASE is not compatible with the parameters RULE=BIN
and RULE=SCALED.
Enter the following option:
- UPPER (or UP)
- Convert the source values to uppercase before processing.
- REMOVE
- Specifies the characters, within enclosing double quotation marks,
to remove from the source value before processing.
REMOVE is not
compatible with the parameters RULE=BIN and RULE=SCALED.
- TRIM
- Specifies to remove trailing spaces from the source value before
processing.
TRIM is not compatible with the parameters RULE=BIN
and RULE=SCALED.
Enter the following option:
- RIGHT
- Right-trim the source value to remove trailing spaces before processing.
Processing
parameters
Parameters for managing provider processes.
- DISCARDLIMIT (or DLIM)
- Specifies the number of failed rows to discard before the provider
stops processing.
Data
definition parameters
Parameters for defining source
and destination data.
-
- FLDDEFn
- Required. Describes a source column to be processed. See Field definition parameter.
- CODEPAGE (or CP)
- An integer value that specifies the codepage or character-set identifier of the source column.
The default codepage in z/OS is derived from the operating
system default value, for example, 037. Codepage 037 is the
basic EBCDIC codepage in the U.S. The CP parameter within the
FLDDEFn parameter overrides this
value.
- CPTYPE (or CPT)
- The codepage type of the source column. The CPT parameter within the FLDDEFn
parameter overrides this value.
When the origin of the data is
DBMS-specific but not tied to any one DBMS, specify the
value as ANY. When the origin of the data is from a non-DBMS
source, specify the value as NONE.
Enter one of the
following values:
| Value |
Description |
| DBZ (or DB2zOS) |
DB2® for
z/OS® |
| ODBC |
ODBC |
| ANY |
Any DBMS |
| NONE |
No DBMS |
Affinity provider - Supported
data types
The Affinity provider supports the following
data types. Note that some data types are not compatible with RULE=BIN
and RULE=SCALED.
| DB2 data type |
ODPP equivalent |
Description |
RULE=BIN and RULE=SCALED |
| CHAR |
CHAR |
Fixed size character data that is left justified and space
padded. |
No |
| INT |
INTEGER |
A 4 byte signed integer value in the range -2,147,483,648
to 2,147,483,647. |
Yes |
| VARCHAR |
VARCHAR |
Character data starting with a short integer value that indicates
the length, in bytes, of the character data to follow. |
|
| SMALLINT |
SMALLINT |
A 2 byte signed integer value in the range -32,768 to 32,767. |
Yes |
| DECIMAL |
DECIMAL_370 |
Packed decimal encoded buffer. |
No |
| FLOAT |
FLOAT or DOUBLE based on precision |
Double precision floating point number in the range 1.7E
+/- 308 (15 digits). Floating point number in the range 3.4E +/- 38
(7 digits). |
Yes |
| BIGINT |
LONG_LONG or U_LONG_LONG |
An 8 byte signed numeric value in the range -9,223,372,036,854,775,808
to 9,223,372,036,854,775,807. An 8 byte unsigned numeric value in
the range 0 to 18,446,744,073,709,551,615. |
Yes |