Creating custom character encodings

CDC Replication supports a wide variety of character encodings (or charsets) provided with Java and ICU (International Components for Unicode). You can add your custom encodings to CDC if it does not support your encodings.

Before you begin

  • Download and install the ICU makeconv utility. For windows, refer to makeconv utility download. For Linux, you can use the package manager to install it. For example, sudo dnf install icu or sudo apt install icu-devtools.
  • Verify that you have permissions to restart the CDC instance.
  • Ensure that you have access to the Management Console with subscription configuration privileges.

About this task

Custom character encodings allow you to handle data in legacy systems or specialized applications that use non-standard character sets. This procedure applies to both source and target CDC instances.
Note: Adding custom encodings requires restarting the CDC instance, which temporarily interrupts active replications.

Procedure

  1. Create or download ICU CHARMAP UCM (Unicode conversion mapping) files for your encodings.

    For sample UCM files, see <CDC Replication installation directory>/samples/*.ucm. To download UCM files, refer to ICU GitHub repository.

    Note: UCM files define the mapping between Unicode code points and your custom encoding's byte sequences.
  2. Edit the UCM files as needed.
  3. Compile the *.ucm files to create *.cnv files using ICU makeover utility.
  4. Place the *.cnv files at <CDC Replication installation directory>/lib/user/charset.
  5. Restart the CDC instance by performing these steps:
    1. Stop all subscriptions.
    2. Run dmshutdown -I <instancename> to stop the instance.
    3. Wait for the instance to shut down completely (check process list or service status).
    4. Start the subscriptions again, except for subscriptions you plan to modify.
  6. Specify encoding overrides for columns in table mappings by using the management console.

    For more information, refer to MC documentation.