Unicode string functions in Db2
When you use string functions in ESQL expressions, certain parameters refer to character positions or counts.
SUBSTRING('Hello World!' FROM 7 FOR 4)
results
in the string Worl
because 7
refers
to the seventh character position, and 4
refers to
four characters.If the string Hello World!
is
represented in an ASCII code page, the seventh byte of that
representation is the seventh character. However, in other code pages
or encoding schemes (for example, some Unicode representations) the
seventh character could start at byte 13, and 4 characters can occupy
8 bytes.
IBM® App Connect Enterprise correctly handles this situation; that is, the numeric parameters and results of these string functions always refer to characters and not the bytes used to represent them.
In some situations, IBM App Connect Enterprise delegates expressions to a database engine for processing. For example, if there is a WHERE clause, in a SELECT function, applied to a database data source, the database is passed the WHERE clause if it can interpret all the functions in the expression.
If there are functions that are not supported by the database, IBM App Connect Enterprise passes only those parts of the expression that can be interpreted, retrieves an unfiltered record set, and performs the remaining filtering itself.
DB2® string functions use byte indexing and not character indexing. Therefore, for Unicode data, the meaning of certain functions differs from the IBM App Connect Enterprise functions, even though they can be ‘interpreted'.
Characters in Unicode UTF8 representation, for example, can occupy from 1-4 bytes, so that the seventh character can start anywhere from byte 7 to byte 25.
- INSERT function
- LEFT function
- LENGTH function
- OVERLAY function
- POSITION function
- RIGHT function
- SPACE function
- SUBSTRING function
These functions either take numeric parameters, or return numeric results that refer to indexes, or counts, of characters in a string.
When expressions involving these functions are passed to the Db2 database, and Unicode string data is manipulated in the database, the results can be unexpected, or an error might occur.
This error might also occur if, for example, an expression of this type is passed directly to the database by using the PASSTHRU function. In this situation, you could modify each expression yourself, as necessary, for the target database.
It is not possible to systematically modify expressions to avoid this problem and IBM App Connect Enterprise does not attempt to do so.
If the Unicode strings do not use any characters that, in UTF8 representation, occupy more than 1 byte each, the functions perform correctly.