DESCRIBE statement in mixed code set environments
A DESCRIBE performed against an EUC database will return information about mixed character and GRAPHIC columns based on the definition of these columns in the database. This information is based on code page of the server before it is converted to the client's code page.
When you perform a DESCRIBE against a select list item that is
resolved in the application context (for example VALUES SUBSTR(?,1,2)
)
then, for any character or graphic data involved, you should evaluate
the returned SQLLEN value along with the returned code page. If the
returned code page is the same as the application code page, there
is no expansion. If the returned code page is the same as the database
code page, expansion is possible. Select list items that are FOR BIT
DATA (code page 0) or in the application code page are not converted
when returned to the application, therefore there is no expansion
or contraction of the reported length.
- EUC application accessing a DBCS database
If your application's code page is an EUC code page, and it issues a DESCRIBE against a database with a DBCS code page, the information returned for CHAR and GRAPHIC columns is returned in the database context. For example, a CHAR(5) column returned as part of a DESCRIBE has a value of five for the SQLLEN field. In the case of non-EUC data, you allocate five bytes of storage when you fetch the data from this column. With EUC data, this might not be the case. When the code page conversion from DBCS to EUC takes place, there can be an increase in the length of the data due to the different encoding used for characters for CHAR columns. For example, with the Traditional Chinese character set, the maximum increase is double. That is, the maximum character length in the DBCS encoding is two bytes, which might increase to a maximum character length of four bytes in EUC. For the Japanese code set, the maximum increase is also double. Note, however, that while the maximum character length in Japanese DBCS is two bytes, it can increase to a maximum character length in Japanese EUC of three bytes. Although this increase appears to be only by a factor of 1.5, the single-byte Katakana characters in Japanese DBCS are only one byte in length, while they are two bytes in length in Japanese EUC.
Possible changes in data length as a result of character conversions apply only to mixed character data. Graphic character data encoding is always the same length, two bytes, regardless of the encoding scheme. To avoid losing the data, you need to evaluate whether an unequal code page situation exists, and whether or not it is between an EUC application and a DBCS database. You can determine the database code page and the application code page from tokens in the SQLCA returned from a CONNECT statement. If such a situation exists, your application needs to allocate additional storage for mixed character data based on the maximum expansion factor for that encoding scheme.
- DBCS application accessing an EUC database
If your application code page is a DBCS code page and issues a DESCRIBE against an EUC database, the situation is similar to that in which an EUC application accesses a DBCS database. However, in this situation your application might require less storage than is indicated by the value of the SQLLEN field. The worst case in this situation is that all of the data is single-byte or double-byte under EUC, meaning that exactly SQLLEN bytes are required under the DBCS encoding scheme. In any other situation, less than SQLLEN bytes are required because a maximum of two bytes is required to store any EUC character.