Converting existing Db2 data to Unicode

If your database and applications handle international data, consider converting your Db2 data to Unicode. Using Unicode might prevent character conversions and thus improve performance and help ensure data integrity. However, Unicode data might require more space. Depending on the data, these characters can be two to three times the size of EBCDIC or ASCII characters.

Before you begin

Before you convert your existing Db2 data to Unicode, think about the following items:
  • Consider the affects on all associated applications and tools. For example, consider the affects on any green screen applications.
  • Understand where the data originally came from. If the data was already converted from its original form, it might contain substitution characters. If so, consider converting the data back to its original form and then converting it to Unicode.

    For example, suppose that you convert data from EBCDIC to Unicode, and the data was originally in ASCII. You might need to convert the data to its original ASCII format and then to Unicode. Do this extra conversion if the original ASCII data underwent a round-trip conversion to EBCDIC and not all of the characters exist in EBCDIC. For example, suppose that you converted data from ASCII CCSID 1252 to EBCDIC CCSID 37. CCSID 1252 contains characters that do not exist in CCSID 37. Thus, the EBCDIC data has control characters in place of any characters that existed in ASCII but not in EBCDIC. (Consider the example of the trademark symbol ™ in Round-trip conversion.) Converting the data to ASCII first recovers the original values before you convert to Unicode.

Procedure

To convert existing Db2 data to Unicode:

  1. Create one or more Unicode tables for this data.
  2. Use one of the following techniques to load the existing data into your new Unicode tables. Db2 converts the data to Unicode when it loads it.
    • Use the INSERT statement with a subselect.
      INSERT INTO UNICODETABLE
           SELECT *
           FROM EBCDICTABLE;
      For this example, make sure that the columns for both UNICODETABLE and EBCDICTABLE are compatible. For example, if the first column of EBCDICTABLE is a character column, the first column of UNICODETABLE should also be a character column; if the second column of EBCDICTABLE is a numeric column, the second column of UNICODETABLE should also be a numeric column.
    • Use the UNLOAD utility to unload the data as is into an EBCDIC or ASCII data set. Then, LOAD the data into your new Unicode table. Specify the appropriate encoding scheme keyword (ASCII, EBCDIC, or CCSID) in the LOAD statement.
      Recommendation: Use the PUNCHDDN option of the UNLOAD utility to generate corresponding LOAD utility statements for the data as Db2 unloads it.
      The following example JCL performs the following actions:
      • STEP1 creates and populates two tables. T1 is a Unicode table. T2 is an EBCDIC table.
      • STEP2 unloads the data from EBCDIC table T2. The UNLOAD statement contains the PUNCHDDN option. This option generates (in the SYSPUNCH data set) corresponding LOAD statements to load the data back into the original table, T2. To use this SYSPUNCH file to load the unloaded data to table T1, you must modify the SYSPUNCH or JCL.
      • STEP3 then loads the data that was unloaded in STEP2 into Unicode table T1. Because the catalog defines the table as Unicode, the data is converted to Unicode when it is loaded.
      • STEP4 outputs the current data in both tables.
      //STEP1    EXEC  TSOBATCH
      //SYSTSIN  DD *
      DSN S(SSTR) R(1) T(1)
      RUN PROGRAM DSNTEP2 PLAN(DSNTEPC1)
      END
      //SYSIN DD *
          DROP DATABASE DB1;
          COMMIT;
          CREATE DATABASE DB1 CCSID UNICODE;
          CREATE TABLESPACE TS1 IN DB1;
          CREATE TABLE T1 (C1 CHAR(7)) IN DB1.TS1;
       
          DROP DATABASE DB2;
          COMMIT;
          CREATE DATABASE DB2 CCSID EBCDIC;
          CREATE TABLESPACE TS2 IN DB2;
          CREATE TABLE T2 (C1 CHAR(7)) IN DB2.TS2;
          INSERT INTO T2 VALUES ('ABCDEFG');
       
          INSERT INTO T1 (SELECT * FROM T2);
       
          SELECT * FROM SYSADM.T1;
          SELECT * FROM SYSADM.T2;
      /*
      //***************************************************
      //STEP2    EXEC DSNUPROC,UID='SMPLUNLD',UTPROC='',SYSTEM='SSTR'
      //SYSPRINT DD SYSOUT=*
      //SYSUT1   DD UNIT=SYSDA,SPACE=(CYL,(5,5),RLSE)
      //SYSUT2   DD UNIT=SYSDA,SPACE=(CYL,(5,5),RLSE)
      //SORTOUT  DD UNIT=SYSDA,SPACE=(CYL,(5,5),RLSE)
      //SYSIN    DD *
        TEMPLATE REC DSN TEST123.STEP2.UNLOAD.SYSREC SPACE(15,5)
                 CYL UNIT(3390) VOLUMES(SCR03)
        TEMPLATE CARD DSN TEST123.STEP2.UNLOAD.SYSPUNCH SPACE(15,5)
                 CYL UNIT(3390) VOLUMES(SCR03)
        UNLOAD DATA FROM TABLE SYSADM.T2
           (C1  CHAR(7))
               UNLDDN REC PUNCHDDN CARD SHRLEVEL CHANGE
      /*
      /*
      //**********************************************************************
      //* DSNUPROC UTILITY STEP
      //**********************************************************************
      //STEP3    EXEC DSNUPROC,UID='LI848.LOAD1',TIME=1440,
      //         UTPROC='',
      //         SYSTEM='SSTR',DB2LEV=DB2A
      //SYSUT1   DD DSN=TEST123.STEP3.LOAD.SYSUT1,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SORTOUT  DD DSN=TEST123.STEP3.LOAD.SORTOUT,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SYSMAP   DD DSN=TEST123.STEP3.LOAD.SYSMAP,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SYSIN    DD *
      TEMPLATE CWPH5APV
           DSN('TEST123.STEP2.UNLOAD.SYSREC')
           DISP(OLD,KEEP,KEEP)
      LOAD DATA INDDN CWPH5APV LOG NO  RESUME YES
        EBCDIC CCSID(00037,00000,00000)
       INTO TABLE
       "SYSADM".
       "T1"
       WHEN(00001:00002) = X'0006'
       NUMRECS                    1
       ( "C1"
        POSITION(  00004:00010) CHAR MIXED(007)
                                NULLIF(00003)=X'FF'
       )
      //********************************************************************
      //STEP4    EXEC TSOBATCH,DB2LEV=DB2A
      //SYSTSIN  DD *
      DSN SYSTEM(SSTR)
      RUN PROGRAM DSNTEP2 PLAN(DSNTEPC1))
      //SYSIN    DD *
      SELECT * FROM SYSADM.T1;
      SELECT * FROM SYSADM.T2;
      /*
    • Use the cross-loader function to load the output of a dynamic SQL SELECT statement into your new Unicode table. The SELECT statement selects the entire table. In the following example, assume that table T1 is in Unicode and table T2 is in EBCDIC. This example uses a cursor to select all data from T2 and then load it into T1. This process is known as the cross-loader function. The data is converted to Unicode when it is loaded.
      //STEP5    EXEC DSNUPROC,UID='LOADIT',TIME=1440,COND=(EVEN),
      //         UTPROC='',
      //         SYSTEM='SSTR',DB2LEV=DB2A
      //SYSUT1   DD DSN=TEST123.STEP5.LOAD.SYSUT1,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SORTOUT  DD DSN=TEST123.STEP5.LOAD.SORTOUT,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SYSCOPY  DD DSN=TEST123.STEP5.LOAD.COPY,DISP=(MOD,DELETE,CATLG),
      //         UNIT=SYSDA,SPACE=(4000,(20,20),,,ROUND)
      //SYSIN    DD *
      EXEC SQL
        DECLARE C1 CURSOR FOR SELECT C1 FROM SYSADM.T2
      ENDEXEC
       
      LOAD DATA REPLACE INCURSOR C1 INTO TABLE SYSADM.T1
       
      //********************************************************************
      //STEP1    EXEC TSOBATCH,DB2LEV=DB2A
      //SYSTSIN  DD *
      DSN SYSTEM(SSTR)
      RUN PROGRAM DSNTEP2 PLAN(DSNTEPC1)
      //SYSIN    DD *
      SELECT * FROM SYSADM.T1;
      SELECT * FROM SYSADM.T2;
      /*
  3. Modify any SQL in your applications to account for length differences. If you use any length functions, such as CHARACTER_LENGTH and SUBSTRING, use the CODEUNITS16 and CODEUNITS32 options to specify how you want Db2 to calculate the length.