Application development in unequal code page situations
Depending on the character encoding schemes used by the application code page and the database code page, there might or might not be a change in the length of a string as it is converted from the source code page to the target code page. A change in length is usually associated with conversions between multibyte code pages with different encoding schemes, for example DBCS and EUC.
A possible increase in length is usually more serious than a possible
decrease in length, because an over-allocation of memory is less problematic
than an under-allocation. Application considerations for sending or
retrieving data depending on where the possible expansion might occur
need to be dealt with separately. It is also important to note the
differences between a best-case and worst-case situation
when an expansion or contraction in length is indicated. Positive
values, indicating a possible expansion, will give the worst-case multiplying
factor. For example, a value of 2
for the SQLERRD(1)
or SQLERRD(2) field means that a maximum of twice the string length
of storage will be required to handle the data after conversion. This
is a worst-case indicator. In this example, best-case would
be that after conversion the length remains the same.
Negative values for SQLERRD(1) or SQLERRD(2), indicating a possible
contraction, also provide the worst-case expansion factor.
For example, a value of -1
means that the maximum
storage required is equal to the string length before conversion.
It is indeed possible that less storage might be required, but practically
this is of little use unless the receiving application knows in advance
how the source data is structured.
max_target_length
obtained
from the following calculation: - Determine the expansion factor for the data. For data transfer from the application to the database:
expansion_factor = ABS[SQLERRD(1)] if expansion_factor = 0 expansion_factor = 1
For data transfer from the database to the application:expansion_factor = ABS[SQLERRD(2)] if expansion_factor = 0 expansion_factor = 1
In the calculations shown previously,
ABS
refers to the absolute value.The check for
expansion_factor = 0
is necessary because some Db2® products return 0 in SQLERRD(1) and SQLERRD(2). These servers do not support code page conversions that result in the expansion or shrinkage of data; this is represented by an expansion factor of 1. - Intermediate length calculation.
temp_target_length = actual_source_length * expansion_factor
- Determine the maximum length for target data type.
- Target data type
- Maximum length of type (
type_maximum_length
) - CHAR
- 254
- VARCHAR
- 32 672
- LONG VARCHAR
- 32 700
- CLOB
- 2 147 483 647
- Determine the maximum target length.
1 if temp_target_length < actual_source_length max_target_length = type_maximum_length else 2 if temp_target_length > type_maximum_length max_target_length = type_maximum_length else 3 max_target_length = temp_target_length
All the checks listed previously, are required to allow for overflow, which might occur during the length calculation. The specific checks are:
- 1
- Numeric overflow occurs during the calculation of
temp_target_length
in step 2.If the result of multiplying two positive values together is greater than the maximum value for the data type, the result wraps around and is returned as a value less than the larger of the two values.
For example, the maximum value of a 2-byte signed integer (which is used for the length of non-CLOB data types) is 32 767. If the
actual_source_length
is 25 000 and the expansion factor is 2,temp_target_length
is theoretically 50 000. This value is too large for the 2-byte signed integer so it gets wrapped around and is returned as -15 536.For the CLOB data type, a 4-byte signed integer is used for the length. The maximum value of a 4-byte signed integer is 2 147 483 647.
- 2
temp_target_length
is too large for the data type.The length of a data type cannot exceed the values listed in step 3.
If the conversion requires more space than is available in the data type, it might be possible to use a larger data type to hold the result. For example, if a CHAR(250) value requires 500 bytes to hold the converted string, it will not fit into a CHAR value because the maximum length is 254 bytes. However, it might be possible to use a VARCHAR(500) to hold the result after conversion. See the topic on code page conversion string-length overflow in mixed code set environments for more information about what happens when converted data exceeds the limit for a data type.
- 3
temp_target_length
is the correct length for the result.
Using the SQLERRD(1) and SQLERRD(2) values returned when connecting to the database and the calculations shown previously, you can determine whether the length of a string will possibly increase or decrease as a result of character conversion. In general, a value of 0 or 1 indicates no expansion; a value greater than 1 indicates a possible expansion in length; a negative value indicates a possible contraction. (Note that values of 0 will only come from previous versions of Db2 products.) Also, these values are undefined for other database server products. The following table lists values to expect for various application code page and database code page combinations when using Db2 database systems.
Application Code Page | Database Code Page | SQLERRD(1) | SQLERRD(2) |
---|---|---|---|
SBCS | SBCS | +1 | +1 |
DBCS | DBCS | +1 | +1 |
eucJP | eucJP | +1 | +1 |
eucJP | DBCS | -1 | +2 |
DBCS | eucJP | +2 | -1 |
eucTW | eucTW | +1 | +1 |
eucTW | DBCS | -1 | +2 |
DBCS | eucTW | +2 | -1 |
eucKR | eucKR | +1 | +1 |
eucKR | DBCS | +1 | +1 |
DBCS | eucKR | +1 | +1 |
eucCN | eucCN | +1 | +1 |
eucCN | DBCS | +1 | +1 |
DBCS | eucCN | +1 | +1 |
- Expansion at the database server
If the SQLERRD(1) entry indicates an expansion at the database server, your application must consider the possibility that length-dependent character data that is valid at the client will not be valid at the database server after it is converted. For example, Db2 products require that column names be no more than 128 bytes in length. It is possible that a character string that is 128 bytes in length encoded under a DBCS code page expands past the 128-byte limit when it is converted to an EUC code page. This possibility means that there might be activities that are valid when the application code page and the database code page are equal, and invalid when they are different. Exercise caution when you design EUC and DBCS databases for unequal code page situations.
- Expansion at the application
If the SQLERRD(2) entry indicates an expansion at the client application, your application must consider the possibility that length-dependent character data will expand in length after being converted. For example, a row with a CHAR(128) column is retrieved. When the database and application code pages are equal, the length of the data returned is 128 bytes. However, in an unequal code page situation, 128 bytes of data encoded under a DBCS code page might expand past 128 bytes when converted to an EUC code page. Thus, additional storage might have to be allocated to retrieve the complete string.