NID privacy provider

Use the NID privacy provider to mask national ID numbers. The provider can mask national ID numbers with either a repeatable method that preserves part of the source value or a random method that does not preserve any part of the source value. The provider also includes several separator options for the output values (dashes, periods, spaces, or no separators).

Examples

Repeatable with character separators

The following example masks a United States SSN by using a repeatable method with separator characters. The NID privacy provider will mask the area, group, and serial numbers.

pro=nid, swi=us, fmt=(US=3X-2X-4X), mtd=repeatable, flddef1=(name=nidvarchar, dt=varchar)

This example uses the following parameters:

SWI=US
This parameter masks United States SSN values.
FMT=(US=3X-2X-4X)
This parameter masks all values in the SSN and includes a dash separator character between the subfields.
MTD=REP
This parameter produces consistent target values when the same data is processed multiple times.
Repeatable with validation

The following example masks a Canadian SIN by using a repeatable method. The NID privacy provider copies the header values and masks the remaining digits. The output values uses spaces as separators. Validation is performed and invalid source values are preserved.

pro=nid, switch=ca, fmt=(ca=3c 3x 3x), mtd=repeatable, wheninv=pre, val=y, flddef1=(name=nid_fld, dt=wchar,dt=char,len=20)

This example uses the following parameters:

SWI=CA
This parameter masks Canadian SIN values.
FMT=(CA=3c 3x 3x)
This parameter copies the header values and masks the remaining digits of the SIN. The output values include spaces as separators.
MTD=REP
This parameter produces consistent target values when the same data is processed multiple times.
WHENINV=PRE
The parameter copies invalid source values to the destination field.
VAL=Y
The parameter performs country-specific validation on the source values.
Random

The following example masks a Canadian Social Insurance Number (SIN) by using a random method. The NID privacy provider copies the header values and mask the remaining digits. The output value includes dash (-) separators.

pro=nid, swi=ca, fmt=(ca=3x-3x-3x), mtd=ran, flddef1=(name=nid_fld, dt=wcharchar)

This example uses the following parameters:

SWI=CA
This parameter masks Canadian SIN values.
FMT=(ca=3x-3x-3x)
This parameter masks all values in the SIN and includes a dash separator character between the subfields.
MTD=RAN
This parameter uses a random masking algorithm. The output values are not based on the source values.

Syntax

The NID privacy provider uses the following syntax:

Masking parameters
PROVIDER = NID  , 
	[ METHOD = { REPEATABLE | RANDOM } ] ,
NID parameters
	SWITCH = country-identifier  , 
	[ FORMAT = ( format-expression ) ]  , 
SWI=US parameters
	[ AREATABLE = { MAXGROUP | DIST | ZOS } ]  ,
Processing parameters
	[ VALIDATE = { Y | N } ] ,
	[ WHENINVALID = PRESERVE ]  ,
	[ DISCARDLIMIT = discard-limit-value ] ,
Data definition parameters
	FLDDEFn = ( NAME = field-name,    
		DATATYPE = datatype-value, 
		[ PRECISION = field-precision-value ], 
		[ SCALE = field-scale-value ],    
		[ LENGTH = field-length-value ],
		[ CODEPAGE = codepage-value ],
		[ CPTYPE = { DB2ZOS |DB2LUW | ORACLE |SYBASE |ODBC | INFORMIX |NETEZZA |SQLSERVER |TERADATA |ANY |NONE } ] ) , 
	[ CODEPAGE = codepage-value ]  ,
	[ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX |
		     NETEZZA | SQLSERVER | TERADATA | ANY | NONE  } ]

Masking parameters

Parameters that determine how to mask data.

PROVIDER (or PRO)
Required. Enter the provider name, NID.
Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
METHOD (or MTD)
Specifies the masking method to use, repeatable or random. If this parameter is omitted, the provider performs repeatable masking (MTD=REP), and the output value uses the same separators as the source value. Enter one of the following options:
REPEATABLE (or REP)
Default. Mask the source values in a repeatable manner. The output values are based on the source values.
RANDOM (or RAN)
Mask the output values by using a random masking algorithm. The output values are not based on the source values.

If MTD=RAN and the FMT parameter specifies to copy part of the source value to the output value, the output value does not include source values. However, any separators in the FMT parameter are included in the output value.

METHOD=RAN is not compatible with the parameters VAL=Y and WHENINV=PRE.

NID parameters

Parameters that identify the type of NID to mask and define formatting options specific to the NID.

SWITCH (or SWI)

Required. A two-character value that indicates the type of national ID to mask. Only one switch value is permitted.

Enter one of the following options:

Canadian Social Insurance Number (SIN)
CA
French National Institute for Statistics and Economic Studies Number (INSEE)
FR
Italian Fiscal Code Number (CF)
IT
Spanish Fiscal Identification Number (NIF)
ES
United Kingdom National Insurance Number (NINO)
UK
United States Social Security Number (SSN)
US
FORMAT (or FMT)
Specifies the output format and the parts of the source value to mask. The national ID determines the syntax for this parameter.

If MTD=random and the FMT parameter specifies to copy part of the source value to the output value, the output value does not include source values. However, any separators in the FMT parameter are included in the output value.

SWI=US parameters

Parameters for use with SWI=US only.

AREATABLE (or ATAB)
For use with SWI=US only. Specifies the area list to use for United States Social Security Number (SSN) masking. Enter one of the following options:
MAXGROUP
Default. Use the high group list, which is dated 10/10/2010. This list represents the new method of SSN allocation by the Social Security Administration, which uses all of the area code and group IDs (99 is the maximum value) for each area.
DIST
Use the Optim™ area list, which is dated 11/01/2006. This list represents the old method of SSN allocation, which uses the area codes and group IDs assigned until 11/01/2006.
ZOS
Use the Optim for z/OS® area list, which is dated 01/02/2008. This list represents the old method of SSN allocation, which uses the area codes and group IDs assigned until 11/01/2008.

Processing parameters

Parameters for managing provider processes.

VALIDATE (or VAL)
Specifies if the provider performs country-specific validation on the source values. If this parameter is omitted, the provider does not perform validation (VAL=N). Enter one of the following options:
Y
Validate the source values. The provider performs the following validations, according to the NID:
Canadian Social Insurance Numbers (SIN)
  • The first digit must not be 8.
  • The check digit must be valid.
French National Institute for Statistics and Economic Studies numbers (INSEE)
  • The commune value must be valid.
  • The check digit must be valid.
Italian Fiscal Codes (CF)
The check digit must be valid.
Spanish Fiscal Identification Numbers (NIF) and Foreign Identification Numbers (NIE)
The suffix must be valid.
United Kingdom National Insurance Numbers (NINO)
No validations.
United States Social Security Numbers (SSN)
  • The serial number must not be 0.
  • The SSN must not be a reserved value such as 078-05-1120 and 457-55-5462.
  • The group number must be in use for the source area number.
N
Default. Do not validate the source values.
WHENINVALID (or WHENINV)
Determines how to process invalid source values. If this parameter is omitted, the provider does not copy invalid source values to the destination field and skips rows that contain these values.

WHENINV is not compatible with the parameter MTD=RAN.

Enter the following option:

PRESERVE (or PRE)
Copy invalid source values to the destination field.
DISCARDLIMIT (or DLIM)
Specifies the number of failed rows to discard before the provider stops processing.

Data definition parameters

Parameters for defining source and target data. For further information see, supported data types.

FLDDEFn
Required. Specifies the attributes of input values to use for processing. See Field definition parameter.
CODEPAGE (or CP)
An integer value that specifies the codepage or character-set identifier of the source fields. The default is UTF-8. The CP parameter within the FLDDEFn parameter overrides this value.
CPTYPE (or CPT)
The codepage type of the source fields. The CPT parameter within the FLDDEFn parameter overrides this value.

When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE. As there are no DBMS-specific code pages for Netezza®, a specification of NONE is implied when Netezza is specified.

Enter one of the following values:

Value Description
DBZ (or DB2zOS) DB2® for z/OS
DB2 (or DB2LUW) DB2 for Linux®, UNIX, and Windows
IFX (or INFORMIX) Informix®
MSS (or SQLSERVER) Microsoft SQL Server
NZ or NETEZZA Netezza
ODBC ODBC
ORA (or ORACLE) Oracle
SYB (or SYBASE) Sybase
TD or TERADATA Teradata
ANY Any DBMS
NONE No DBMS

Output formats

The following output formats are available, according to NID type:

Canadian Social Insurance Numbers (SIN)

An SIN is a nine-digit number that consists of a one-digit region code number followed by an eight-digit serial number. The first three digits are called the header. The last digit of the serial number is a check digit.

The NID privacy provider generates a masked SIN with a check digit that is calculated based on the preceding masked eight digits of the output value.

The following output formats are available for an SIN. C indicates values to copy. X indicates values to mask. For example, 3C4X indicates that the first three characters are copied and the next four characters are masked.

Fields to mask Format without separator Format with dash separator Format with space separator Format with period separator
Serial number without header digits (MTD=REP default) CA=3C6X CA=3C-3X-3X CA=3C 3X 3X CA=3C.3X.3X
Serial number and header digits CA=9X CA=3X-3X-3X CA=3X 3X 3X CA=3X.3X.3X
French National Institute for Statistics and Economic Studies numbers (INSEE)
An INSEE number is a 15-digit number with the following format: SYYMMDDCCCOOOKK.
S
Sex and citizenship information.
YY
Last two digits of the year of birth.
MM
Month of birth.
DD
Department of origin.
CCC
Commune of origin.
OOO
Order number.
KK
Control key or check digits.

The NID privacy provider masks an INSEE according to the following rules:

  • If the provider masks the department field, the provider also masks the commune field with a compatible value.
  • The provider always masks the order field.
  • The provider calculates the check digit field, which is based on the preceding masked 13 digits of the output value.

The following output formats are available for an INSEE. All formats mask the order and check digit fields.

C indicates values to copy. X indicates values to mask. For example, 3C4X indicates that the first three characters are copied and the next four characters are masked.

Fields to mask (in addition to Order and Check Digit) Format without separator Format with dash separator Format with space separator
Sex, Year, Month, Commune (MTD=REP default) FR=5X2C8X FR=5X2C6X-2X FR=5X2C6X 2X
Sex FR=1X9C5X FR=1X9C3X-2X FR=1X9C3X 2X
Sex, Year FR=3X7C5X FR=3X7C3X-2X FR=3X7C3X 2X
Sex, Month FR=1X2C2X5C5X FR=1X2C2X5C3X-2X FR=1X2C2X5C3X 2X
Sex, Commune FR=1X6C8X FR=1X6C6X-2X FR=1X6C6X 2X
Sex, Department FR=1X4C8X FR=1X4C6X-2X FR=1X4C6X 2X
Sex, Year, Month FR=5X5C5X FR=5X5C3X-2X FR=5X5C3X 2X
Sex, Year, Commune FR=3X4C8X FR=3X4C6X-2X FR=3X4C6X 2X
Sex, Year, Department, Commune FR=3X2C10X FR=3X2C8X-2X FR=3X2C8X 2X
Sex, Month, Commune FR=1X2C2X2C8X FR=1X2C2X2C6X-2X FR=1X2C2X2C6X 2X
Sex, Month, Department, Commune FR=1X2C12X FR=1X2C10X-2X FR=1X2C10X 2X
Sex, Year, Month, Department, Commune FR=15X FR=13X-2X 13X FR=13X 2X
Year FR=1C2X7C5X FR=1C2X7C3X-2X FR=1C2X7C3X 2X
Year, Month FR=1C4X5C5X FR=1C4X5C3X-2X FR=1C4X5C3X 2X
Year, Commune FR=1C2X4C8X FR=1C2X4C6X-2X FR=1C2X4C6X 2X
Year, Department FR=1C2X2C10X FR=1C2X2C8X-2X FR=1C2X2C8X 2X
Year, Month, Commune FR=1C4X2C8X FR=1C4X2C6X-2X FR=1C4X2C6X 2X
Year, Month, Department FR=1C14X FR=1C12X-2X FR=1C12X 2X
Month FR=3C2X5C5X FR=3C2X5C3X-2X FR=3C2X5C3X 2X
Month, Commune FR=3C2X2C8X FR=3C2X2C6X-2X FR=3C2X2C6X 2X
Month, Department FR=3C12X FR=3C10X-2X FR=3C10X 2X
Commune FR=7C8X FR=7C6X-2X FR=7C6X 2X
Department FR=5C10X FR=5C8X-2X FR=5C8X 2X
Italian Fiscal Codes (CF)
A CF is a 16-character alphanumeric value with the following format: FFF-NNN-YYMDD-RRRRC.
FFF
Encoded surname.
NNN
Encoded given name.
YY
Year of birth.
M
Month of birth.
DD
Date of birth.
RRRR
Region code.
C
Control character.

The NID privacy provider masks a CF according to the following rules:

  • The provider masks any consonant that appears in the given name or surname fields as a consonant and masks any vowel as a vowel. If an X appears after a vowel, the provider copies it to the output value.
  • The provider calculates the control character field, which is based on the preceding masked 15 digits of the output value.

The following output formats are available for a CF. C indicates values to copy. X indicates values to mask. For example, 3C4X indicates that the first three characters are copied and the next four characters are masked.

Fields to mask Format without separator Format with dash separator Format with space separator
Date of birth, Region (MTD=REP default) IT=6C10X IT=3C-3C-5X-5X IT=3C 3C 5X 5X
Surname, Given name, Region IT=6X5C5X IT=3X-3X-5C-5X IT=3X 3X 5C 5X
Surname, Given name, Date of birth IT=11X4C1X IT=3X-3X-5X-4C1X IT=3X 3X 5X 4C1X
Surname, Given name IT=6X9C1X IT=3X-3X-5C-4C1X IT=3X 3X 5C 4C1X
Date of birth IT=6C5X4C1X IT=3C-3C-5X-4C1X IT=3C 3C 5X 4C1X
Region IT=11C5X IT=3C-3C-5C-5X IT=3C 3C 5C 5X
Surname, Given name, Date of birth, Region IT=16X IT=3X-3X-5X-5X IT=3X 3X 5X 5X
Spanish Fiscal Identification Numbers (NIF) and Foreign Identification Numbers (NIE)

An NIF is an eight character value in the following format NNNNNNN-A, where the first seven characters are a serial number and the final character is an alphabetic suffix. The suffix is a check digit.

Foreign Spanish nationals use a Foreign Identification Number (NIE), which is a nine character value that uses the same format as an NIF, but is preceded by an X. An NIE uses the following format: X-NNNNNNN-A.

The NID privacy provider generates a masked NIF or NIE with a check digit that is calculated based on the preceding masked 7 digits of the output value.

The following output formats are available for an NIF and NIE. The provider masks all characters for each format. NIF and NIE numbers use the same format options. An NIE source value always includes an X prefix in the output value.

Fields to mask Format without separator Format with dash separator Format with space separator
Serial, Suffix (MTD=REP default) ES=8X ES=7X-1X ES=7X 1X
United Kingdom National Insurance Numbers (NINO)

A NINO consists of three parts: two letters (the prefix), six digits (the number), and one optional letter (the suffix).

The NID privacy provider can mask a NINO without a separator or with a separator in either a three- or five-part format.

The following output formats are available for a NINO. C indicates values to copy. X indicates values to mask. For example, 3C4X indicates that the first three characters are copied and the next four characters are masked.

To create a NINO without a separator, use the following parameters:
Fields to mask Format without separator
Prefix, Number (MTD=REP only) UK=8X1C
Number (MTD=REP only, MTD=REP default) UK=2C6X1C
Prefix, Number, Suffix (MTD=RAN only, MTD=RAN default) UK=9X
To create a NINO with either a three- or five-part format, use the following parameters:
Fields to mask Format with dash separator Format with space separator Format with period separator
Prefix, Number (three-part) (MTD=REP only) UK=2X-6X-1C UK=2X 6X 1C UK=2X.6X.1C
Prefix, Number (five-part) (MTD=REP only) UK=2X-2X-2X-2X-1C UK=2X 2X 2X 2X 1C UK=2X.2X.2X.2X.1C
Number (three-part) (MTD=REP only) UK=2C-6X-1C UK=2C 6X 1C UK=2C.6X.1C
Number (five-part) (MTD=REP only) UK=2C-2X-2X-2X-1C UK=2C 2X 2X 2X 1C UK=2C.2X.2X.2X.1C
Prefix, Number, Suffix (three-part) (MTD=RAN only) UK=2X-6X-1X UK=2X 6X 1X UK=2X.6X.1X
Prefix, Number, Suffix (five-part) (MTD=RAN only) UK=2X-2X-2X-2X-1X UK=2X 2X 2X 2X 1X UK=2X.2X.2X.2X.1X
United States Social Security Numbers (SSN)

An SSN consists of 3 subfields with the following format: AAAGGSSSS.

AAA
Area number. The state in which the SSN is issued generally determines the area number.
GG
Group number. A group number is assigned based on the area number.
SSSS
Serial number.

The NID privacy provider generates a masked SSN according to the following rules:

  • The provider generates a group number that is appropriate for the area number. The provider uses the most recent group number that is issued by the Social Security Administration for the area.
  • Serial numbers begin with 0001 and are incremented by 1 for each additional SSN generated for the area number. When the serial number exceeds 9999, the serial number is reset to 0001 and the provider uses the group number that precedes the number that is most recently issued for the area number.
  • When MTD=mask, the output value includes an area number corresponding to the same state as the source area number.

The following output formats are available for an SSN.C indicates values to copy. X indicates values to mask. For example, 3C4X indicates that the first three characters are copied and the next four characters are masked.

Fields to mask Format without separator Format with dash separator Format with space separator Format with period separator
Group, Serial number (MTD=REP default) US=3C6X US=3C-2X-4X US=3C 2X 4X US=3C.2X.4X
Area, Group, Serial number US=9X US=3X-2X-4X US=3X 2X 4X US=3X.2X.4X

Supported data types

The NID privacy provider supports the following data types for source and destination fields, according to NID type:

Canadian Social Insurance Numbers (SIN) and United States Social Security Numbers (SSN)
Data type Description
CHAR Fixed size character data that is left justified and space padded.
WCHAR Fixed size wide character data that is left justified and space padded.
VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow.
WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow.
VARCHAR_SZ Character data string which is terminated by a NULL character.
WVARCHAR_SZ Wide character data string that is terminated by a NULL character.
U_LONG_LONG An 8 byte unsigned numeric value in the range 0 to 18,446,744,073,709,551,615.
U_INTEGER A 4 byte unsigned integer value in the range 0 to 4,294,967,295.
DECIMAL_370 Packed decimal encoded buffer.
ORA_VARNUM Oracle varnum encoded buffer.
French National Institute for Statistics and Economic Studies numbers (INSEE), Italian Fiscal Codes (CF), Spanish Fiscal Identification Numbers (NIF) and Foreign Identification Numbers (NIE), and United Kingdom National Insurance Numbers (NINO)
Data type Description
CHAR Fixed size character data that is left justified and space padded.
WCHAR Fixed size wide character data that is left justified and space padded.
VARCHAR Character data starting with a short integer value that indicates the length, in bytes, of the character data to follow.
WVARCHAR Wide character data starting with a short integer value that indicates the length, in bytes, of the wide character data to follow.
VARCHAR_SZ Character data string which is terminated by a NULL character.
WVARCHAR_SZ Wide character data string that is terminated by a NULL character.