A fix is available
APAR status
Closed as program error.
Error description
When using EDCICONV procedure to convert some UTF-8 coded text to IBM-1026, the latin capital letter U with diaresis (x'7f') is converted to x'3F'. An example follows:- with input data x'7FC5D5C4' first issuing the command:- iconv -f IBM-1026 -t UTF-8 infile > outfile and viewing the outfile you see it contains, x'C39C454E440A' Next issuing: iconv -f UTF-8 -t IBM-1026 outfile > outfile2 outfile2 now contains: x'3FC5D5C4' The x'7f' was not converted correctly in this rountrip conversion. This happens because internally conversion between UTF-8 and IBM-1026 uses UCS-2 ie the conversion is UTF-8 --> UCS-2 --> IBM-1026 and uconvdef() has a need to define a value for an undefined character to perform the conversion b/w UCS-2 and IBM-1026. However, in a codepage where all 256 codepoints are used, uconvdef() cannot find an unused code point and uses x'7f' causing the conversion to be incorrect later on.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: Users of iconv(), converting codepoints to * * IBM-1026 from another code page involving * * indirect conversion using direct unicode * * converters. The codepoint effected is * * x7f (LATIN CAPITAL LETTER U WITH DIAERESIS) * * in IBM-1026. * **************************************************************** * PROBLEM DESCRIPTION: Customer is using iconv() to convert * * UTF-8 coded text to IBM-1026, the * * LATIN CAPITAL LETTER U WITH DIAERESIS * * character is converted incorrectly. It * * is converted to X'3F' (substitute * * character) when the customer expects it * * to be converted back to x'7F' when * * performing a round trip * * IBM-1026=>UTF-8=>IBM-1026 conversion. * **************************************************************** * RECOMMENDATION: * **************************************************************** uconvdef has a need to define a value for an undefined character. It places this undefined character into the UCS-2<-><codepage> conversion table for any UCS-2<-><codepage> conversion pair that doesn't exist. A problem arises when a codepage has valid code points for all 256 single byte values. iconv() doesn't have a means of distinguishing an undefined character from a valid character with the same code point. If character substitution is turned on, iconv() replaces any undefined characters encountered in the conversion table with the substitution character. In this case what happened is the UTF-8 character x'C39C' was correctly translated to the IBM-1026 character x7F from the conversion table and then incorrectly changed to the IBM-1026 substitution character x3F because x7F was being interpreted as an undefined character.
Problem conclusion
The UCS-2<->IBM-1026 conversion table is laid out such that the index value into the table for a UCS-2 to IBM-1026 conversion is equal to the UCS-2 value being converted from. Changing the _F2M_sbcs function located in the direct unicode converter EDCUF8EW.C to check for a value of the index equal to the UCS-2 equivalent value for the undefined character indicates that the x7F is a valid code point and should not be substituted for. Any other x7F encountered in the UCS-2<->IBM-1026 conversion table will not have an index value equal to the UCS-2 equivalent value for the undefined character. In this case the x7F represents an undefined character and can be safely replaced with the substitution character.
Temporary fix
Comments
APAR Information
APAR number
PQ83328
Reported component name
LE C LIBRARY
Reported component ID
568819805
Reported release
703
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2004-01-16
Closed date
2004-02-25
Last modified date
2004-04-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UQ85399 UQ85400 UQ85401 UQ85402 UQ85403
Modules/Macros
CEHUF8EW EDC404D6
Fix information
Fixed component name
LE C LIBRARY
Fixed component ID
568819805
Applicable component levels
R703 PSY UQ85399
UP04/03/09 P F403
R705 PSY UQ85400
UP04/03/09 P F403
R706 PSY UQ85401
UP04/03/09 P F403
R707 PSY UQ85402
UP04/03/09 P F403
R708 PSY UQ85403
UP04/03/09 P F403
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVSBD","label":"Runtime"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"703","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
03 April 2004