Format conversion of elementary data

Elementary data items within identifier-2 are converted in the sequence of the following steps. Some of these steps are optional.

Conversion to character format:

Elementary data items are converted to character format depending on the type of the data item:

  • Data items of category alphabetic, alphanumeric, alphanumeric-edited, DBCS, external floating-point, national, national-edited, and numeric-edited are not converted, except as required to the correct Unicode encoding form.
  • Fixed-point numeric data items other than COMPUTATIONAL-5 (COMP-5) binary data items or binary data items compiled with the TRUNC(BIN) compiler option are converted as if they were moved to a numeric-edited item that has:
    • As many integer positions as the numeric item has, but with at least one integer position, possibly zero (0)
    • An explicit decimal point, if the numeric item has at least one decimal position; the decimal point is represented by a period regardless of whether the DECIMAL-POINT IS COMMA clause is specified in the SPECIAL-NAMES paragraph
    • The same number of decimal positions as the numeric item has
    • A leading '-' picture symbol if the data item is signed (has an S in its PICTURE clause)
  • COMPUTATIONAL-5 (COMP-5) binary data items or binary data items compiled with the TRUNC(BIN) compiler option are converted in the same way as the other fixed-point numeric items, except for the number of integer positions. The number of integer positions is computed depending on the number of '9' symbols in the picture character string as follows:
    • 5 minus the number of decimal places, if the data item has 1 to 4 '9' picture symbols
    • 10 minus the number of decimal places, if the data item has 5 to 9 '9' picture symbols
    • 20 minus the number of decimal places, if the data item has 10 to 18 '9' picture symbols
  • Internal floating-point data items are converted as if they were moved to a data item as follows:
    • For COMP-1: an external floating-point data item with PICTURE -9.9(8)E+99
    • For COMP-2: an external floating-point data item with PICTURE -9.9(17)E+99 (illegal because of the number of digit positions)
  • External floating-point data items are converted as if they were moved to another external floating-point data item with the same precision and scale, and with:
    • A - sign for the mantissa.
    • An actual decimal point, indicated by a . PICTURE symbol.
    • A + sign for the exponent.
    For example, an external floating-point item defined with PICTURE -9(3)V9(5)E-99 would be converted as if it were moved to an external floating-point item defined with PICTURE -9(3).9(5)E+99.
  • Index data items are converted as if they were declared USAGE COMP-5 PICTURE S9(9).

Trimming:

After any conversion to character format, leading and trailing spaces and leading zeroes are eliminated, as described under Trimming of generated JSON data.

Conversion to the target encoding:

If identifier-1 is a data item of category national, any nonnational values are converted to national format. Otherwise, all values are represented in UTF-8. The conversion is done according to the compiler CODEPAGE option in effect for the compilation.

Escaping characters:

The characters NX'0022' (") and NX'005C' (\) are escaped as \" and \\, respectively.

For compactness and appearance, other common characters are represented by a two-character escape sequence as follows:
  NX'0008' \b - backspace
  NX'0009' \t - tab
  NX'000A' \n - line feed
  NX'000C' \f - form feed
  NX'000D' \r - carriage return
  NX'0085' \x - next line

The remaining characters in the range NX'0000' through NX'001F' are escaped as \uhhhh, where "h" represents a hexadecimal digit 0 through F.

Representation of out-of-range Unicode characters:

Any remaining Unicode character that has a Unicode scalar value greater than NX'FFFF' is represented by a surrogate pair for national output, or a four-byte sequence for UTF-8 output. For example, the musical symbol G clef (U+1D11E) is represented in UTF-16 by the surrogate pair NX'D834' NX'DD1E', and in UTF-8 by the byte sequence x'F09D849E'.