Conversion of character data
The Character Data Representation Architecture (CDRA) system of tags ensures that you can convert character data in a predictable, repeatable way.
Conversion pertains to converting the code points assigned to one or more characters in one code page to their corresponding code points in another code page. The conversion might cause a single character to map to a sequence of characters, or a sequence of characters to map to a single character. Conversion should not be equated to translating from one language to another.
Conversion methods
The following methods are used for conversion:
- Round-trip conversion. The integrity of all character data is maintained
from the source coded character set identifier (CCSID) to the target CCSID
and back to the source.
When performing a round-trip conversion, you might see incorrect representation of the characters displayed in the target CCSID. The integrity is preserved, however. When the characters are converted back to the source CCSID, they regain their original hexadecimal values and representation.
- Enforced subset match conversion (substitution). Characters that
exist in both the source and target CCSID have their integrity maintained.
Characters in the source CCSID but not in the target CCSID are replaced. Replaced
values are also referred to as substitution characters. For EBCDIC encoding,
these appear on most display stations as a solid block. For ASCII encoding,
these substitution characters appear differently.
This substitution is permanent when converting back to the source CCSID because it is not possible to retrieve the original hexadecimal values.
For a list of CCSID conversions that result in substitution characters, see the Default conversion that might use substitution table.
- Linguistic conversion. Also known as best-fit conversion, a partial
mapping is done from the source code page to the target code page. The integrity
of characters that are in both the target CCSID and the source CCSID are preserved.
Characters that are not in the target CCSID are mapped to the most culturally
acceptable alternative for that character.
For example, the source CCSID might support an A grave character (
). The target CCSID might not support this character.
During the conversion, the most linguistically acceptable character (a Latin
capital A) is substituted for the A grave. After the conversion, characters
that are not included in the target CCSID are presented to the user as the
most linguistically acceptable substitution characters. This substitution
is permanent. Any loss of character integrity is permanent.Through an application programming interface (API), linguistic conversion is available from any supported single-byte CCSID to any other supported single-byte CCSID.