Unicode considerations for database files: Keyword considerations (positions 45 through 80)

This information discusses how Unicode data is used with some DDS keywords.

The CCSID keyword is used to enable an A-type or G-type field to contain Unicode data.

The CCSID parameter must have a CCSID using a Unicode encoding scheme. This keyword is enabled for both physical and logical files.

For logical files, the following characteristics must be true before the CCSID keyword is allowed on a logical file field.
  • If the specified value on the logical file CCSID keyword uses Unicode encoding schemes, the field data type must be A for UTF-8 or G for UCS-2/UTF-16. Also, the corresponding physical file field must be of types A, G, or O. If the CCSID keyword is specified on the physical file field, it must contain a value other than 65 535.
  • If the specified value on the logical file CCSID keyword does not use Unicode encoding schemes, the field data type must be A, O, or G. Also, the corresponding physical file field must be a Unicode field. The CCSID keyword specified on the logical file field must contain a value other than 65 535.

The DFT keyword can contain SBCS, bracketed-DBCS, or bracketed-DBCS-graphic character strings when specified on a Unicode-capable field.

You can use the COMP keyword only to compare data in another Unicode-capable field. Two equal length UTF-8 data strings can be compared to each other using their hex values regardless of character boundaries.

You can specify a character literal on a select or omit field that is tagged with a Unicode CCSID on the COMP, RANGE, and VALUES keywords. The maximum length of the literal is equal to the number of Unicode code units that is defined in positions 30 to 34 of the DDS specification.

The VARLEN keyword can be used on Unicode fields.

Logical files can have UTF-8 and UTF-16 data keys.

Concatenation of Unicode fields

Unicode fields can be used in the CONCAT keyword. The following rules apply:

  • The parameters of the CONCAT keyword can be UCS-2 graphic fields, UTF-16 fields, or a mix of each type. No other field types can be concatenated with UCS-2 or UTF-16 fields.
    • The concatenation result is UTF-16 if one of the parameters is UTF-16.
    • Otherwise, the result is UCS-2.
  • A UTF-8 field can be concatenated only with other UTF-8 fields. No other field type is allowed to be concatenated with a UTF-8 field. The concatenation result is UTF-8.
  • The resulting field must be an input-only field; use I in position 38 of the DDS source statement.

Join logical file support

A UTF-8 field can be joined to another UTF-8 field in a join logical file. UTF-16 fields can be joined. To join a UTF-8 or UTF-16 field to a CHAR field, for example, the UTF-8, UTF-16, or CHAR field must be redefined in the logical file to the same type field.

Select/Omit fields in logical files

UTF-8 and UTF-16 fields are allowed as select and omit field specifications.