Topic
  • 3 replies
  • Latest Post - ‏2009-03-25T18:26:35Z by SystemAdmin
SystemAdmin
SystemAdmin
6968 Posts

Pinned topic Could not display DBCS(Chinese) characters after sync database from DB2

‏2009-03-18T03:05:31Z |
DB2e command line sync to client DB2e database, and display DBCS charaters withDB2eCMD.exe is ok.
But After sync to client Derby database with Java code and sync API, all DBCS characters displayed as ?.
Env:
DB2 9.5 Workgroup
DB2 Everyplace 9.1.3
Redhat Linux 4.0
Derby 10.0.3
Updated on 2009-03-25T18:26:35Z at 2009-03-25T18:26:35Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    6968 Posts

    Re: Could not display DBCS(Chinese) characters after sync database from DB2

    ‏2009-03-18T10:35:58Z  
    Java client receives data from server and does not convert characters before feeding them to Derby engine. Applications have to convert characters while fetching or storing data from Derby database.Please see the site (http://db.apache.org/derby/)
  • SystemAdmin
    SystemAdmin
    6968 Posts

    Re: Could not display DBCS(Chinese) characters after sync database from DB2

    ‏2009-03-24T08:59:38Z  
    Java client receives data from server and does not convert characters before feeding them to Derby engine. Applications have to convert characters while fetching or storing data from Derby database.Please see the site (http://db.apache.org/derby/)
    I finally figure it out, but it is strange to me.

    step 1:
    add following in sync java code:
    userProps.put("isync.encoding", "UTF-8");

    step 2:
    when select/update derby database, you have to convert twice, otherwise the simplified chinese characters not display correctly in java client code, and not updated correctly in remote db2 database.
    public void selftest() throws Exception {
    String s = "";
    try {
    con = getConnection();
    st = con.createStatement();
    ResultSet rs = st.executeQuery("select * from test");
    System.out.println("jdbc result: ");
    while (rs.next()) {
    s = "uid: " + rs.getString(1);
    byte[] bytes = rs.getString(2).getBytes("GBK");
    s = new String(bytes, "UTF-8");
    System.out.println(s);
    }

    PreparedStatement ps = con.prepareStatement("insert into test values(?, ?)");

    ps.setString(1, "zhangsana");
    s = "张三a";
    byte[] bytes = s.getBytes("UTF-8");
    s = new String(bytes, "GBK");
    ps.setString(2,s);
    ps.execute();
    con.commit();

    } finally {
    closeConnection();
    }

    my enironment:
    DB2 Enterprise 9.5 on Windows 2003 Simplified Chinese
    DB2 Everyplace 9.1.0.3 on Windows 2003 Simplified Chinese
    Derby 10.0.3 on Windows XP Simplified Chinese
    JDK 1.6 on Windows XP Simplified Chinese
    DB2 db sample encoding UTF-8
    DB2 mirror db m_sample encoding UTF-8
  • SystemAdmin
    SystemAdmin
    6968 Posts

    Re: Could not display DBCS(Chinese) characters after sync database from DB2

    ‏2009-03-25T18:26:35Z  
    I finally figure it out, but it is strange to me.

    step 1:
    add following in sync java code:
    userProps.put("isync.encoding", "UTF-8");

    step 2:
    when select/update derby database, you have to convert twice, otherwise the simplified chinese characters not display correctly in java client code, and not updated correctly in remote db2 database.
    public void selftest() throws Exception {
    String s = "";
    try {
    con = getConnection();
    st = con.createStatement();
    ResultSet rs = st.executeQuery("select * from test");
    System.out.println("jdbc result: ");
    while (rs.next()) {
    s = "uid: " + rs.getString(1);
    byte[] bytes = rs.getString(2).getBytes("GBK");
    s = new String(bytes, "UTF-8");
    System.out.println(s);
    }

    PreparedStatement ps = con.prepareStatement("insert into test values(?, ?)");

    ps.setString(1, "zhangsana");
    s = "张三a";
    byte[] bytes = s.getBytes("UTF-8");
    s = new String(bytes, "GBK");
    ps.setString(2,s);
    ps.execute();
    con.commit();

    } finally {
    closeConnection();
    }

    my enironment:
    DB2 Enterprise 9.5 on Windows 2003 Simplified Chinese
    DB2 Everyplace 9.1.0.3 on Windows 2003 Simplified Chinese
    Derby 10.0.3 on Windows XP Simplified Chinese
    JDK 1.6 on Windows XP Simplified Chinese
    DB2 db sample encoding UTF-8
    DB2 mirror db m_sample encoding UTF-8
    Congratulations,
    you run into the same issue as me. (DB2 Everyplace 9.1.2 in my case)
    I had a long dicussion with IBM DB2e development team last year regarding this topic.
    I had the feeling that i was the first person ever syncing non-latin characters with UTF-8 and derby!
    Anyhow, first you do right.
    Set "isync.encoding" to "UTF-8"

    Now Sync-Server and client will exchange data in "UTF-8" encoded transport stream.
    So far so good.

    But the issue is in IBM's "db2jisync.jar" !!!
    Derby can only read and write Strings on CHAR-fields! Direct reading/writing raw bytes is not supported.
    I assume the driver needs to create Strings out of the UTF-8 encoded byte stream when writing data
    and get raw bytes out of Strings and put them to transport stream when reading.

    Now I wanted to know what the driver is really doing in detail when reading from/writing to derby database and decompiled some classes.
    And guess what?
    It is really done in the way i assumed, but without passing in the encoding schema when
    creating Strings and getting bytes from a String !!!!!
    So the JVMs default encoding will be used instead, which is not "UTF-8" in most cases!!!

    try
    "System.out.println(Charset.defaultCharset().name());"
    to verify your default encoding.

    The workarround is that you need to set your JVMs default encoding also to UTF-8!
    This can be done by setting the VM argument "file.encoding=UTF8"
    when starting the VM! Verify again. Note that this argument must be set on VM startup and not at runtime !!!
    Now reset your client and sync again. Everything should be fine now.
    Note that setting the default encoding to UTF-8 will also effect all other I/O operations
    in your application as well. Be careful!

    I wanted to convince IBM that this workarround - even working - makes not much sense because now
    i need to set the encoding parameter twice! "isync.encoding" and "file.encoding". And the whole thing will only work
    when both are set the same. So in the end one of them is unnecessary!!!
    And secound, i have direct impact to other I/O operations in my application when changeing the VM default encoding.
    This is in my opinion not transparent to the user!!!

    The better and simple solution would be to use "isync.encoding" property also in "db2jisync.jar" driver
    when creating Strings from bytes and getting bytes from Strings. This solution would be independent from JVMs default encoding
    and more transparent to the user.
    In fact there were only 4 lines of code to change in the driver!!!

    At first IBM agreed and even provided me a working updated driver (based on version 9.1.2) to test!
    They also promised that this fix will be part of the next official release! I was so proud! ;-)
    But for some unknown reason a few weeks later they declined this agreement :-(
    The only response i get now is: "Works as designed - workarround offered - no need to fix !" :-(

    By the way the reason why you don't have the same issue using DB2e as client database is also pretty simple.
    Here the driver can direct read/write raw bytes to DB2e CHAR-fileds. No need to convert to Strings.

    Hope all this stuff could help you and all other users wondering why syncing non-latin charaters with UTF-8 and derby is not working properly.
    Maybe if some more users complain about this issue, IBM might think about their "working design" again and consider
    to implement the discussed and working transparent solution as discussed.

    So long ....