Application development in unequal code page situations

Depending on the character encoding schemes used by the application code page and the database code page, there might or might not be a change in the length of a string as it is converted from the source code page to the target code page. A change in length is usually associated with conversions between multibyte code pages with different encoding schemes, for example DBCS and EUC.

A possible increase in length is usually more serious than a possible decrease in length, because an over-allocation of memory is less problematic than an under-allocation. Application considerations for sending or retrieving data depending on where the possible expansion might occur need to be dealt with separately. It is also important to note the differences between a best-case and worst-case situation when an expansion or contraction in length is indicated. Positive values, indicating a possible expansion, will give the worst-case multiplying factor. For example, a value of 2 for the SQLERRD(1) or SQLERRD(2) field means that a maximum of twice the string length of storage will be required to handle the data after conversion. This is a worst-case indicator. In this example, best-case would be that after conversion the length remains the same.

Negative values for SQLERRD(1) or SQLERRD(2), indicating a possible contraction, also provide the worst-case expansion factor. For example, a value of -1 means that the maximum storage required is equal to the string length before conversion. It is indeed possible that less storage might be required, but practically this is of little use unless the receiving application knows in advance how the source data is structured.

To ensure that you always have sufficient storage allocated to cover the maximum possible expansion after character conversion, you should allocate storage equal to the value max_target_length obtained from the following calculation:
  1. Determine the expansion factor for the data.
    For data transfer from the application to the database:
       expansion_factor = ABS[SQLERRD(1)] 
       if expansion_factor = 0 
          expansion_factor = 1
    For data transfer from the database to the application:
       expansion_factor = ABS[SQLERRD(2)] 
       if expansion_factor = 0 
          expansion_factor = 1 

    In the calculations shown previously, ABS refers to the absolute value.

    The check for expansion_factor = 0 is necessary because some Db2® products return 0 in SQLERRD(1) and SQLERRD(2). These servers do not support code page conversions that result in the expansion or shrinkage of data; this is represented by an expansion factor of 1.

  2. Intermediate length calculation.
       temp_target_length = actual_source_length * expansion_factor
  3. Determine the maximum length for target data type.
    Target data type
    Maximum length of type (type_maximum_length)
    CHAR
    254
    VARCHAR
    32 672
    LONG VARCHAR
    32 700
    CLOB
    2 147 483 647
  4. Determine the maximum target length.
     
    1   if temp_target_length < actual_source_length
             max_target_length = type_maximum_length 
          else 
    2   if temp_target_length > type_maximum_length 
             max_target_length = type_maximum_length 
          else 
    3   max_target_length = temp_target_length 

    All the checks listed previously, are required to allow for overflow, which might occur during the length calculation. The specific checks are:

    1
    Numeric overflow occurs during the calculation of temp_target_length in step 2.

    If the result of multiplying two positive values together is greater than the maximum value for the data type, the result wraps around and is returned as a value less than the larger of the two values.

    For example, the maximum value of a 2-byte signed integer (which is used for the length of non-CLOB data types) is 32 767. If the actual_source_length is 25 000 and the expansion factor is 2, temp_target_length is theoretically 50 000. This value is too large for the 2-byte signed integer so it gets wrapped around and is returned as -15 536.

    For the CLOB data type, a 4-byte signed integer is used for the length. The maximum value of a 4-byte signed integer is 2 147 483 647.

    2
    temp_target_length is too large for the data type.

    The length of a data type cannot exceed the values listed in step 3.

    If the conversion requires more space than is available in the data type, it might be possible to use a larger data type to hold the result. For example, if a CHAR(250) value requires 500 bytes to hold the converted string, it will not fit into a CHAR value because the maximum length is 254 bytes. However, it might be possible to use a VARCHAR(500) to hold the result after conversion. See the topic on code page conversion string-length overflow in mixed code set environments for more information about what happens when converted data exceeds the limit for a data type.

    3
    temp_target_length is the correct length for the result.

Using the SQLERRD(1) and SQLERRD(2) values returned when connecting to the database and the calculations shown previously, you can determine whether the length of a string will possibly increase or decrease as a result of character conversion. In general, a value of 0 or 1 indicates no expansion; a value greater than 1 indicates a possible expansion in length; a negative value indicates a possible contraction. (Note that values of 0 will only come from previous versions of Db2 products.) Also, these values are undefined for other database server products. The following table lists values to expect for various application code page and database code page combinations when using Db2 database systems.

Table 1. SQLCA.SQLERRD Settings on CONNECT
Application Code Page Database Code Page SQLERRD(1) SQLERRD(2)
SBCS SBCS +1 +1
DBCS DBCS +1 +1
eucJP eucJP +1 +1
eucJP DBCS -1 +2
DBCS eucJP +2 -1
eucTW eucTW +1 +1
eucTW DBCS -1 +2
DBCS eucTW +2 -1
eucKR eucKR +1 +1
eucKR DBCS +1 +1
DBCS eucKR +1 +1
eucCN eucCN +1 +1
eucCN DBCS +1 +1
DBCS eucCN +1 +1
If the SQLERRD(1) or SQLERRD(2) values indicate an expansion at either the database server or the application client, you should consider the following:
  • Expansion at the database server

    If the SQLERRD(1) entry indicates an expansion at the database server, your application must consider the possibility that length-dependent character data that is valid at the client will not be valid at the database server after it is converted. For example, Db2 products require that column names be no more than 128 bytes in length. It is possible that a character string that is 128 bytes in length encoded under a DBCS code page expands past the 128-byte limit when it is converted to an EUC code page. This possibility means that there might be activities that are valid when the application code page and the database code page are equal, and invalid when they are different. Exercise caution when you design EUC and DBCS databases for unequal code page situations.

  • Expansion at the application

    If the SQLERRD(2) entry indicates an expansion at the client application, your application must consider the possibility that length-dependent character data will expand in length after being converted. For example, a row with a CHAR(128) column is retrieved. When the database and application code pages are equal, the length of the data returned is 128 bytes. However, in an unequal code page situation, 128 bytes of data encoded under a DBCS code page might expand past 128 bytes when converted to an EUC code page. Thus, additional storage might have to be allocated to retrieve the complete string.