IBM Support

AIX Globalization: Examining and understanding data encoding

How To


Summary

How to examine data to determine character encoding, and resolve data rendering conflicts.

Objective

The characters rendered to screen depend on the client encoding. Data files might not have headers that indicate the encoding. To determine the appropriate encoding for rendering, or processing, ask the database administrator, or data submitter, to confirm the encoding.  Another option is to examine some strings of the file to determine the likely encoding.

Steps

Important note: Command runtime locales can affect the output.  In the next example, the locale of the 'od' command runtime affects the displayed data. In some of the later examples, the "perl -e 'print'" command simply generates the code points to the terminal window, so the displayed text depends on the terminal encoding. 
A) Check the data for UTF-8 encoding.
PuTTY encoding: UTF-8
# LANG=FR_FR.UTF-8;od -xc French.dat
0000000     5ec2    cace    d4db    e2ea    eef4    fb00
           ^ 302 312 316 324 333 342 352   îôû   ôû   û
  • Results:
    • The hex values in the top row are the code points for the data.  In the second row, octal values are displayed, instead of characters, so this data is not UTF-8. 
B) Check the data for ISO8859-15 encoding.
PuTTY encoding: UTF-8
# LANG=fr_FR.8859-15;od -xc French.dat
0000000     5ec2    cace    d4db    e2ea    eef4    fb00
           ^   ▒   ▒   ▒   ▒   ▒   ▒   ▒   ▒   ▒   ▒
  • Results:
    • Substitute characters were displayed. There are matching code points in this locale, but the characters cannot be displayed.
C) Now, check the data for ISO8859-15 encoding, in an ISO8859-15 encoded PuTTY terminal client.
  • Change the PuTTY encoding to ISO8859-15:
    • Configuration->Window->Translation->Remote Character Set->ISO-8859-15:1999 (Latin-9, "euro") ->Apply
PuTTY encoding: ISO8859-15
# LANG=fr_FR.8859-15;od -xc French.dat
0000000     5ec2    cace    d4db    e2ea    eef4    fb00
           ^   Â   Ê   Î   Ô   Û   â   ê   î   ô   û
  • Results:
    • This data seems to be ISO8859-15 encoding.
Exploring More:
For more information, you can use the "od -xc" command to get the hex code points of data.  Then, you can examine the  Locale Development Toolkit /usr/lib/nls/charmap files to compare code points for different characters.
For example,
1) Install the bos.loc.adt.locale file set.
2) Determine code points for some characters.  The following example uses diacritic circumflex characters common in the ISO8859-15 encoding.
# cd /usr/lib/nls/charmap
# grep CIRCUMFLEX ISO8859-15 | cut -f2 -d">" | while read myChar ; do printf "\%s" $myChar ;done
# echo $myString
\x5e\xc2\xca\xce\xd4\xdb\xe2\xea\xee\xf4\xfb

3) Now, use a simple Perl one-liner to generate a string.
PuTTY encoding: UTF-8
# perl -e 'print "\x5e\xc2\xca\xce\xd4\xdb\xe2\xea\xee\xf4\xfb"'
^▒▒▒▒▒▒▒▒▒▒
  • Results:
    • Substitute characters were displayed.  
      • "Garbage" characters could be displayed, depending on the code points, and PuTTY encoding.
4) To render the correct characters, use one of the following options (A, or B)
A) Convert the string to UTF-8 encoding.
#  perl -e 'print "\x5e\xc2\xca\xce\xd4\xdb\xe2\xea\xee\xf4\xfb"' | iconv -f ISO8859-15 -t UTF-8
^ÂÊÎÔÛâêîôû

B) Change the PuTTY encoding.
  • Configuration->Window->Translation->Remote Character Set->ISO-8859-15:1999 (Latin-9, "euro") ->Apply
PuTTY encoding: ISO8859-15
#  perl -e 'print "\x5e\xc2\xca\xce\xd4\xdb\xe2\xea\xee\xf4\xfb"'
^ÂÊÎÔÛâêîôû
Related Errors:
As stated in a previous section, the command runtime can affect the output. If you edit the data with the "vi" command, the encoding of the "vi" command runtime is in effect, so you might see "invalid data" errors.
For example,
PuTTY encoding: ISO-8859-15:1999
AIX locale: LANG=EN_US.UTF-8
# perl -e 'print "\x5e\xc2\xca\xce\xd4\xdb\xe2\xea\xee\xf4\xfb"' > iso8859-15.dat
# cat iso8859-15.dat
^ÂÊÎÔÛâêîôû
# vi iso8859-15.dat
"iso8859-15.dat"Incomplete or invalid multibyte character, conversion failed

Additional Information

SUPPORT

If you require more assistance, use the following step-by-step instructions to contact IBM to open a case for software with an active and valid support contract.  

1. Document (or collect screen captures of) all symptoms, errors, and messages related to your issue.

2. Capture any logs or data relevant to the situation.

3. Contact IBM to open a case:

   -For electronic support, see the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, see the web page:
      https://www.ibm.com/planetwide/

4. Provide a clear, concise description of the issue.

 - For more information, see: Working with IBM AIX Support: Describing the problem.

5. If the system is accessible, collect a system snap, and upload all of the details and data for your case.

 - For more information, see: Working with IBM AIX Support: Collecting snap data

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cw2lAAA","label":"Desktop->Globalization"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
26 July 2021

UID

ibm16475321