Enforced subset conversion

An enforced subset conversion occurs when a character in the source CCSID does not have a code point in the target CCSID. In this case, the character is converted to a single substitution character.

The default substitution characters (SUB) are:

X'3F' for SBCS EBCDIC
X'1A' or X'7F' for SBCS ASCII
X'1A'for UTF-8
X'001A' for UTF-16

For DBCS data, the substitution character varies depending on the CCSID.

One alternative to an enforced subset conversion is a round-trip conversion, which preserves characters if they are converted back to the originally CCSID. Whether a particular conversion uses a round-trip conversion or an enforced subset conversion depends on how your system is set up to do conversions. For example, in DB2® for z/OS®, many conversions are defined by z/OS Unicode Services. Each of the conversion definitions specifies whether to use a round-trip or enforced subset conversion.

Example: In ASCII CCSID 1252, the trademark symbol ™ is represented by the code point X'99'. In EBCDIC CCSID 37, this code point does not exist. During an enforced subset character conversion to EBCDIC CCSID 37, this code point is converted to the substitution character X'3F'. When the code point is converted back to ASCII CCSID 1252, the character remains a substitution character and is represented by the code point X'1A'.

Example: In ASCII CCSID 5348, the euro symbol (€ ) is represented by the code point X'80'. In EBCDIC CCSID 37, this code point does not exist. During an enforced subset character conversion to EBCDIC CCSID 37, this code point is converted to the substitution character X'3F'. When the code point is converted back to ASCII CCSID 5348, the character remains a substitution character and is represented by the code point X'1A'.

z/OS Unicode Services uses enforced subset conversions when converting from Unicode to ASCII or EBCDIC to handle characters that do not exist in the target CCSID. In this situation, enforced subset conversions are required because Unicode has room to include over 1 million code points, but ASCII and EBCDIC single-byte character sets can include only 256 code points.