Determining the length attribute of the final result

When CODEUNITS32, CODEUNITS16, or OCTETS is specified for a function or the CAST specification, the length attribute of the final result string is calculated by applying specific formulas depending on which function is specified.

To determine the final result of a function or the CAST specification, DB2® might need to use an intermediate result string if CODEUNITS32 or CODEUNITS16 is specified, depending on the encoding scheme of the data:

  • ASCII and EBCDIC data require the use of a UTF-16 intermediate result string when either CODEUNITS32 or CODEUNITS16 is specified.
  • UTF-8 data requires the use of a UTF-16 intermediate result string only when CODEUNITS16 is specified.

Regardless of whether an intermediate string is used, when CODEUNITS32, CODEUNITS16, or OCTETS is specified for a function or the CAST specification, the length attribute of the final result string is calculated by applying the formulas that are described in the following table. The length attributes that are calculated at each step in the formulas are measured in bytes, unless indicated otherwise.

Table 1. Formulas for the length attribute of the final result string
Function Determination of the length attribute of the string1

CAST specification

CHAR

CLOB

DBCLOB

GRAPHIC

VARCHAR

VARGRAPHIC

Follow these three steps to determine the length attribute of the final result:
1. Length of the intermediate string (IML)

When CODEUNITS32 or CODEUNITS16 is specified:

  • If the source string is not in Unicode CCSID 1200, 1208, or 367, convert the source string to CCSID 1200, using the formulas in Conversion rules for comparisons to determine the result length of the intermediate string (IML).
  • If source string is in Unicode CCSID 1208 or 367, and CODEUNITS16 is specified, convert the source string to CCSID 1200, using the formulas in Conversion rules for comparisons to determine the result length of the intermediate string (IML).
  • Otherwise, the intermediate string is the same as the source string.

When OCTETS is specified:

  • If the CCSID of the source string is different from the CCSID of the result of the function, convert the source string to the CCSID of the result of the function, using the formulas in Conversion rules for comparisons to determine the result length of the intermediate string (IML).
  • Otherwise, the intermediate string is the same as the source string.

Exception: For the GRAPHIC and VARGRAPHIC function, if the source string is EBCDIC, the source is widened with prefix X'42' before the source string is converted to CCSID 1200 and the length of the intermediate string is determined.

2. Result length attribute of the intermediate string (rl)
The result length (rl) of the intermediate string depends on whether a length argument was explicitly specified.
If length was not specified, the result length (rl) attribute is:
   rl = IML
If length was specified, the result length (rl) attribute is:
   IF (ol * n) < r_IML THEN
      rl = ol * n
   ELSE
        IF intermediate string is in CCSID 1200
        (UTF-16) THEN
           rl = MIN( ol * n , IML + ( r  * 2 )) 
        ELSE
           rl = MIN( ol * n , IML + r ) 
Where:
  • ol = original length argument, expressed in the specified string units
    n =
    4 bytes for CODEUNITS32
    2 bytes for CODEUNITS16
  • IML = length of the intermediate string
  • r_IML = IML rounded up to next multiple of n
  • r = ol - (r_IML/n), expressed in the specified string units

    The calculation for r is an estimate of the difference between the length argument and the estimated number of characters of the input argument, expressed in the specified string units.

3. Length of the final result string (the result of the function)
The result length attribute of the final string is determined by converting the result length (rl) of the intermediate string to the CCSID of the result of the function, using the formulas in Conversion rules for comparisons, if CCSID conversion is necessary. Otherwise, the result length attribute of the final string is rl.
CHARACTER_
LENGTH
LOCATE
LOCATE_
IN_STRING
POSITION
Follow these three steps to determine the length attribute of the final result:
1. Length of the intermediate string (IML)
The length of the intermediate string (IML) is determined the same way as for the CAST specification. (See Length of the intermediate string (IML).)

For the LOCATE, LOCATE_IN_STRING, and POSITION functions, this applies to both the source-string and search-string. If the CCSIDs of intermediate strings for the converted source-string and search-string differ, the intermediate string for the search-string is converted to the CCSID of intermediate string for the source-string.

2. Result length attribute of the intermediate string (rl)
The result length (rl) attribute is always 4 (the length of an integer):
   rl = 4
3. Length of the final result string (the result the function)
The length of the final result of the function is always an integer.
INSERT
OVERLAY
Follow these three steps to determine the length attribute of the final result:
1. Length of the intermediate string (IML)
The length of the intermediate string (IML) for both the source-string and the insert-string is determined the same way as for the CAST specification. (See Length of the intermediate string (IML).)

If the CCSIDs of the intermediate strings for the converted source-string and insert-string differ, the intermediate string for the insert-string is converted to the CCSID of the intermediate string for the source-string.

2. Result length attribute of the intermediate string (rl)
The result length (rl) attribute of the intermediate string depends on whether the start and length arguments are constants.
If the start and length arguments are both constants, the result length attribute is:
   rl = L1 - MIN ( MAX ( 0, L1 - (V2 - 1)
   * n ), V3 * m) + L4 
If at least one argument (the start or length argument) is not a constant, the result length attribute is:
   rl = L1 + L4 
Where:
  • L1 and L4 are the length attributes of the intermediate strings of the source-string and insert-string, respectively.
  • V2 and V3 are the start and length values, respectively, expressed in the specified string units.
    m=
    1 if the intermediate string of the source-string is not CCSID 1200 (UTF-16)
    2 if the intermediate string of the source-string is CCSID 1200 (UTF-16)
    n=
    4 bytes for CODEUNITS32
    2 bytes for CODEUNITS16
3. Length of the final result string (the result the function)
The length of the final result is the same as the length of the final result for the CAST specification. (See Length attribute of the final result string (the result of the function).)

LEFT

RIGHT

Follow these three steps to determine the length attribute of the final result:
1. Length of the intermediate string (IML)
The length of the intermediate string (IML) is determined the same way as for the CAST specification. (See Length of the intermediate string (IML).)
2. Result length attribute of the intermediate string (rl)
The result length (rl) attribute is the same as the length of the intermediate string:
   rl = IML
3. Length of the final result string (the result of the function)
The result length attribute of the final string is determined by converting the result length (rl) of the intermediate string to the CCSID of the result of the function, using the formulas in Conversion rules for comparisons, if CCSID conversion is necessary. Otherwise, the result length attribute of the final string is rl.
The result length attribute of the final string is:
   MIN(length of source string, length of CCSID
   converted string )
SUBSTRING Follow these three steps to determine the length attribute of the final result:
1. Length of the intermediate string (IML)
The length of the intermediate string (IML) is determined the same way as for the CAST specification. (See Length of the intermediate string (IML).)
2. Result length attribute of the intermediate string (rl)
The result length (rl) of the intermediate string depends on whether a length argument was explicitly specified.
If length was not specified, the result length (rl) attribute is:
   rl = IML
If length was specified, the result length (rl) attribute is:
   rl = MIN(ol * n, IML)
Where:
  • ol = original length argument, expressed in the specified string units
    n =
    4 bytes for CODEUNITS32
    2 bytes for CODEUNITS16
  • IML = length of the intermediate string
3. Length of the final result string (the result of the function)
The length of the final result string is the same as for LEFT built-in function.
Note:
  1. The final value of the calculation for each length attribute (IML, rl, and the final result of the function) is limited by the maximum length of the function or by the maximum length of the corresponding data type of the result, whichever is applicable. Each length attribute is expressed in terms of bytes.
Example 1: Assume that T1 is a table encoded in EBCDIC and C1 is a CHAR(26) column (SBCS data with EBCDIC CCSID 37). The CHAR function is invoked in the following statement:
SELECT CHAR(C1,10,CODEUNITS32) as COL1 FROM T1; 
DB2 uses an intermediate string to evaluate the function and determines the intermediate and final result string lengths using these steps:
  1. C1, which is SBCS EBCDIC 37 data, is converted to Unicode 1200 (UTF-16). The result length of the conversion (using the formula from Conversion rules for comparisons, X * 2) is 26 * 2. Thus, the length of the intermediate string is 52 bytes (IML = 52).
  2. The CHAR function is evaluated against the first 10 UTF-32 characters in this string. The result length attribute is 40 bytes (rl = ol * n or 10 * 4) because ol * n < r_IML or 40 < 52.
  3. The 40 bytes of the string are converted back to SBCS EBCDIC 37. The result length of the conversion (using the formula from Conversion rules for comparisons, X * .5) is 40 * .5. Thus, the length of the final result of the functions is 20 bytes.
Example 2: This example is similar to the first example, except that the specified length for the function is 20 instead of 10. Assume that T1 is a table encoded in EBCDIC and C1 is a CHAR(26) column (SBCS data with EBCDIC CCSID 37). The CHAR function is invoked in the following statement:
SELECT CHAR(C1,20,CODEUNITS32) as COL1 FROM T1; 
DB2 uses an intermediate string to evaluate the function and determines the intermediate and final result string lengths using these steps:
  1. C1, which is SBCS EBCDIC 37 data, is converted to Unicode 1200 (UTF-16). The result length of the conversion (using the formula from Conversion rules for comparisons, X * 2) is 26 * 2. Thus, the length of the intermediate result string is 52 bytes (IML = 52).
  2. The CHAR function is evaluated against the first 20 UTF-32 characters in this intermediate string. However, because the estimated number of characters in the intermediate string, as expressed in the specified string units, is only 13 characters (r_IML/n or 52/4), the intermediate string must be padded with 7 padding characters to satisfy the 20 characters that are requested (r = ol - (r_IML/n) or 20 - 13). In Unicode 1200 (UTF-16), each padding character takes 2 bytes.

    The result length attribute is then calculated to be 66 bytes (rl = MIN(ol * n, IML + (r * 2)) or MIN(20 * 4, 52 + 14)) because ol * n < r_IML or 80 < 52 is not true.

  3. The 66 bytes of the string are converted back to SBCS EBCDIC 37. The result length of the conversion (using the formula from Conversion rules for comparisons, X * .5) is 66 * .5. Thus, the length of the final result of the function is 33 bytes.