last time we found the physical layout of varchar column is not so "utf-8". DB2 chooses to store the data that way out of some undocumented reason. Until it's released, we should not trust what we guess even it's proven correct from limited test.
But the principal we should trust in, whatever db2 saves the data, it should deliver standard byte stream to its client. Serge has the implementation with c. this time I'll use Java.
Take "天朝上国" as example,
in db2, the byte list is,
string byte# byte_value
------------------------------------------------------------
When it is passed to Java,
-- select vchar, varchar(stringscan(vchar),100) byte_list from utf8test where vchar = '天朝上国'
天朝上国 229 164 169 230 156 157 228 184 138 229 155 189
This time it looks all right.
To make it easier for more processing in pure SQL, we need several element function,
1. character_width_utf8(varchar()), how many byte for the first utf-8 character. return -1 when it's not first byte of a utf-8 character.
Then we can easily have more,
2. length_utf8
3. substr_utf8
4. replace_utf8
5. overlay_utf8
etc.
I'm planning a project as db2 external library. will treat utf-8 string functions as first part to implement.