IBM Support

PQ83328: INCORRECT TRANSLATION FROM UTF-8 TO IBM-1026 FOR THE LATIN CAPITAL LETTER U WITH DIAERESIS CHARACTER

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When using EDCICONV procedure to convert some UTF-8 coded text
    to IBM-1026, the latin capital letter U with diaresis (x'7f')
    is converted to x'3F'. An example follows:-
    with input data x'7FC5D5C4' first issuing the command:-
    iconv -f IBM-1026 -t UTF-8 infile > outfile
    and viewing the outfile you see it contains, x'C39C454E440A'
    Next issuing: iconv -f UTF-8 -t IBM-1026 outfile > outfile2
    outfile2 now contains: x'3FC5D5C4'
    The x'7f' was not converted correctly in this rountrip
    conversion. This happens because internally conversion
    between UTF-8 and IBM-1026 uses UCS-2 ie the conversion is
    UTF-8 --> UCS-2 --> IBM-1026 and uconvdef() has a need to define
    a value for an undefined character to perform the conversion
    b/w UCS-2 and IBM-1026. However, in a codepage where all 256
    codepoints are used, uconvdef() cannot find an unused code
    point and uses x'7f' causing the conversion to be incorrect
    later on.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Users of iconv(), converting codepoints to   *
    *                 IBM-1026 from another code page involving    *
    *                 indirect conversion using direct unicode     *
    *                 converters.  The codepoint effected is       *
    *                 x7f (LATIN CAPITAL LETTER U WITH DIAERESIS)  *
    *                 in IBM-1026.                                 *
    ****************************************************************
    * PROBLEM DESCRIPTION: Customer is using iconv() to convert    *
    *                      UTF-8 coded text to IBM-1026, the       *
    *                      LATIN CAPITAL LETTER U WITH DIAERESIS   *
    *                      character is converted incorrectly. It  *
    *                      is converted to X'3F' (substitute       *
    *                      character) when the customer expects it *
    *                      to be converted back to x'7F' when      *
    *                      performing a round trip                 *
    *                      IBM-1026=>UTF-8=>IBM-1026 conversion.   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    uconvdef has a need to define a value for an undefined
    character.  It places this undefined character into
    the UCS-2<-><codepage> conversion table for any
    UCS-2<-><codepage> conversion pair that doesn't exist.
    A problem arises when a codepage has valid code points
    for all 256 single byte values.  iconv() doesn't have a
    means of distinguishing an undefined character from a valid
    character with the same code point.  If character substitution
    is turned on, iconv() replaces any undefined characters
    encountered in the conversion table with the substitution
    character.  In this case what happened is the UTF-8 character
    x'C39C' was correctly translated to the IBM-1026 character x7F
    from the conversion table and then incorrectly changed to the
    IBM-1026 substitution character x3F because x7F was being
    interpreted as an undefined character.
    

Problem conclusion

  • The UCS-2<->IBM-1026 conversion table is laid out such that
    the index value into the table for a UCS-2 to IBM-1026
    conversion is equal to the UCS-2 value being converted from.
    Changing the _F2M_sbcs function located in the direct unicode
    converter EDCUF8EW.C to check for a value of the index equal
    to the UCS-2 equivalent value for the undefined character
    indicates that the x7F is a valid code point and should not be
    substituted for.  Any other x7F encountered in the
    UCS-2<->IBM-1026 conversion table will not have an index value
    equal to the UCS-2 equivalent value for the undefined character.
    In this case the x7F represents an undefined character and can
    be safely replaced with the substitution character.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PQ83328

  • Reported component name

    LE C LIBRARY

  • Reported component ID

    568819805

  • Reported release

    703

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2004-01-16

  • Closed date

    2004-02-25

  • Last modified date

    2004-04-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UQ85399 UQ85400 UQ85401 UQ85402 UQ85403

Modules/Macros

  • CEHUF8EW EDC404D6
    

Fix information

  • Fixed component name

    LE C LIBRARY

  • Fixed component ID

    568819805

Applicable component levels

  • R703 PSY UQ85399

       UP04/03/09 P F403

  • R705 PSY UQ85400

       UP04/03/09 P F403

  • R706 PSY UQ85401

       UP04/03/09 P F403

  • R707 PSY UQ85402

       UP04/03/09 P F403

  • R708 PSY UQ85403

       UP04/03/09 P F403

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVSBD","label":"Runtime"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
03 April 2004