PI97434: MAKE ENTERPRISE COBOL UNICODE SURROGATE PAIR AWARE, HANDLE 4 BYTE CHARACTERS (COBOL RTE)

A fix is available

APAR status

Closed as new function.

Error description

Make Enterprise COBOL Unicode surrogate pair aware, handle 4
byte characters (COBOL RTE)

Local fix

```
N/A
```

Problem summary

****************************************************************
* USERS AFFECTED: Users of Enterprise COBOL V6.2 who want to   *
*                 use intrinsic functions ULENGTH, UPOS,       *
*                 USUBSTR, UWIDTH, and REVERSE to process      *
*                 NATIONAL data items and have them be         *
*                 aware of surrogate pair characters.          *
****************************************************************
* PROBLEM DESCRIPTION: New Function: Users need a way to       *
*                      process NATIONAL data items even when   *
*                      the UTF-16 data contains surrogate pair *
*                      characters. This APAR adds support for  *
*                      NATIONAL data items and includes adding *
*                      surrogate pair awareness to functions   *
*                      ULENGTH, UPOS USUBSTR, UWIDTH, and      *
*                      REVERSE.                                *
****************************************************************
* RECOMMENDATION: Applied provided PTF.                        *
****************************************************************
Customers find that they need to create their own user COBOL
libraries if they want to process NATIONAL data items using
Enterprise COBOL. This APAR provides added support in intrinsic
functions ULENGTH, UPOS, USUBSTR, UWIDTH, and REVERSE for
processing NATIONAL data items.

Problem conclusion

Temporary fix

Comments

The compiler was changed to add support for processing NATIONAL
data items with intrinsic functions ULENGTH, UPOS, USUBSTR,
UWIDTH, and REVERSE.
+-------------------------------------------------------------+
| Start of changes for:                                       |
| Enterprise COBOL for z/OS Language Reference, SC27-8713-01  |
Chapter 21. Intrinsic functions
UVALID:
Change .character string consists of. to
.character data item contains., and remove .Unicode.  in 2
places in the description of UVALID:
If a character data item contains valid UTF-8 or UTF-16 data,
the UVALID function returns the value zero. If a character
data item contains invalid UTF-8 or UTF-16  data, the UVALID
function returns the index of the first invalid Unicode
element.
***Please Insert the examples below after Table 55 in
UVALID  ***
Example 1
If A is an alphabetic or alphanumeric data item that
contains value x'4BC3A4666572' in UTF-8 encoding, the returned
 value from UVALID(A) is zero.
Example 2
If B is a national data item that contains value
x'005400F6006200750072D858DC6B0073' in UTF-16 encoding, the
returned value from UVALID(B) is zero.
Example 3
If C is a national data item that contains value
x'0054D9C3006200750072D858DC6B0073' in UTF-16 encoding, the
returned value from UVALID(C) is two because x'D9C3'
does not have a low surrogate pair.
Example 4
If D is a national data item that contains value
x'005400F60062DC010072D858DC6B0073' in UTF-16 encoding,
the returned value from UVALID(B) is four because x'DC01.
does not have corresponding high surrogate pair.
USUPPLEMENTARY
Change .character string argument that is encoded in UTF-8 or
UTF-16. to .character data item that contains UTF-8 or UTF-16
data. in the description of USUPPLEMENTARY:
The USUPPLEMENTARY function returns an integer value that is
equal to the index of the first Unicode supplementary
character in a character data item argument
that is encoded in UTF-8 or UTF-16.
USUBSTR:
Change .character string argument that is encoded in UTF-8..
to .character data item that contains UTF-8 or UTF-16 data.
the description of USUBSTR:
The USUBSTR function returns a substring of the data in a
character data item argument that contains UTF-8 or
UTF-16 data.
Add this line:
The function type is alphanumeric or national, depending
on the class of argument-1.
Add UTF-16 to description of argument-1:
argument-1
Must be of class alphabetic, alphanumeric, or national.
argument-1 must contain valid
UTF-8 or UTF-16 encoded characters:
- If argument-1 is of class alphabetic or alphanumeric,
it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain
valid UTF-16 data.
Rewrite the following paragraph to remove character string
and add UTF-16:
Change:
Suppose argument-2 = n and argument-3 = m, the returned
value is an alphanumeric character string that contains
m UTF-8 characters in argument-1, starting with the
nth character.
To:
Suppose argument-1 is alphabetic or alphanumeric,
argument-2 = n and argument-3 = m, the returned value is
an alphanumeric item that contains m UTF-8 characters
from argument-1, starting with the nth character.
Suppose argument-1 is a national data item, argument-2 = n
and argument-3 = m, the returned value is a national item
that contains m UTF-16 characters from argument-1,
starting with the nth character.
Add a second example:
Example 2
If B is a national item that contains the UTF-16
value x'005400F6006200750072D858DC6B0073',
the returned values are as follows:
- USUBSTR(B 1 2) returns x'005400F6'
- USUBSTR(B 2 1) returns x'00F6'
- USUBSTR(B 2 2) returns x'00F60062'
- USUBSTR(B 3 2) returns x'00620075'
- USUBSTR(B 5 2) returns x'0072D858DC6B'
- USUBSTR(B 6 2) returns x'D858DC6B0073'
UPOS
Change .character string argument that is encoded in.
to .character data item that contains. and add UTF-16
to the description of UPOS:
The UPOS function returns an integer value that is equal
to the index of the nth UTF-8 or UTF-16 character in a
character data item argument that contains UTF-8 or UTF-16.
argument-1
Must be of class alphabetic, alphanumeric, or national.
argument-1 must contain valid
UTF-8 or UTF-16 encoded characters.
- If argument-1 is of class alphabetic or alphanumeric,
it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain
valid UTF-16 data.
Suppose argument-1 is alphabetic or alphanumeric and
argument-2=n, the returned value is the byte position
of the nth UTF-8 character in argument-1.
Suppose argument-1 is a national data item and
argument-2=n, the returned value is the byte
position of the nth UTF-16 character in argument-1.
If argument-2 is not positive or if argument-2
is larger than ULENGTH(argument-1),
zero is returned. Otherwise, if argument-2=n,
the returned value is the byte position
in argument-1 where the nth UTF-8 or
UTF-16 character starts.
Add a second example:
Example 2
If B is a national data item that contains the UTF-16
value x'005400F6006200750072D858DC6B0073',
the returned values are as follows:
- UPOS (B 1 ) returns 1
- UPOS (B 2 ) returns 3
- UPOS (B 3 ) returns 5
- UPOS (B 4 ) returns 7
- UPOS (B 5 ) returns 9
- UPOS (B 6 ) returns 11
- UPOS (B 7 ) returns 15
ULENGTH
Change .character string argument that is encoded in. to
.character data item that contains. and add UTF-16 to the
description of ULENGTH:
The ULENGTH function returns an integer value that is equal to
the number of UTF-8 or UTF-16 characters in a character data
item argument that contains UTF-8 or UTF-16 data.
argument-1
Must be of class alphabetic, alphanumeric, or national.
The returned value is the number of UTF-8 or UTF-16 characters
in argument-1.
- If argument-1 is of class alphabetic or alphanumeric, it must
contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid
UTF-16 data.
If argument-1 is a national data item that contains UTF-16 data
and argument-1 contains surrogate pairs, each pair of low and
high surrogates will be counted as one UTF-16 character. For
example, if B is a national item that contains the UTF-16
value x'005400F6006200750072D858DC6B0073', the returned value
from ULENGTH(B) will be 7. Character X' D858DC6B. is counted
as 1 UTF-16 character.
UWIDTH
Change .character string argument that is encoded in. to
.character data item that contains. and add UTF-16 to the
description of UWIDTH:
The UWIDTH function returns an integer value that is equal to
the width in bytes of the nth UTF-8 or UTF-16 character in a
character data item argument that is encoded in UTF-8
or UTF-16.
argument-1
Must be of class alphabetic, alphanumeric, or national.
- If argument-1 is of class alphabetic or alphanumeric,
it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid
UTF-16 data.
argument-2
Must be an integer.
If argument-2 is not positive or if argument-2 is larger than
ULENGTH(argument-1), zero is returned. Otherwise, if
argument-2=n, the returned value is the width in bytes of the
nth UTF-8 or UTF-16 character in argument-1.
The returned value is an integer
For example, if B is a national data item that contains the
UTF-16 value x'005400F6006200750072D858DC6B0073', the returned
values are as follows:
- UWIDTH (B 1) returns 2
- UWIDTH (B 2) returns 2
- UWIDTH (B 2) returns 2
- UWIDTH (B 3) returns 2
- UWIDTH (B 4) returns 2
- UWIDTH (B 5) returns 2
- UWIDTH (B 6) returns 4
- UWIDTH (B 7) returns 2
REVERSE
Change .character string. to .character value. in the
description of REVERSE, and add national and UTF-16 details:
The REVERSE function returns a character value of the same
length as the argument, whose characters are the same as those
specified in the argument except that they are in reverse order.
For arguments of type national, character positions are
reversed; UTF-16 characters that are surrogate pairs are
treated as one character and UTF-16 characters that
are not surrogate pairs are treated as one character.
argument-1
Must be class alphabetic, alphanumeric, or national and must be
at least one character in length.
- If argument-1 is of class alphabetic or alphanumeric, it must
contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid
UTF-16 data.
Add this example:
Example 1
If argument-1 is an alphanumeric data item that contains the
UTF-8 value x'4BC3A4666572', the returned value is
x'726566C3A44B'
Example 2
If argument-1 is a national data item that contains the UTF-16
value x'0054 00F6 D847DDF3 0062 0075 0072 D858DC6B 0073',
the returned value is
x'0073 D858DC6B 0072 0075 0062 D847DDF3 00F6 0054.
| End of changes for:                                         |
| Enterprise COBOL for z/OS Language Reference, SC27-8713-01  |
+-------------------------------------------------------------+

APAR Information

APAR number
PI97434
Reported component name
ENT COBOL FOR Z
Reported component ID
5655EC600
Reported release
620
Status
CLOSED UR1
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-05-01
Closed date
2018-05-28
Last modified date
2019-06-04

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Modules/Macros

IGY8RWTU IGYCASMB IGYCCBE  IGYCCCRT IGYCCICS IGYCCSRV IGYCDGEN
IGYCDIAG IGYCDMAP IGYCDOPT IGYCEN$0 IGYCEN$1 IGYCEN$2 IGYCEN$3
IGYCEN$4 IGYCEN$5 IGYCEN$8 IGYCEN$D IGYCEN$R IGYCFGEN IGYCFREE
IGYCINIT IGYCJA$0 IGYCJA$1 IGYCJA$2 IGYCJA$3 IGYCJA$4 IGYCJA$5
IGYCJA$8 IGYCJA$D IGYCJA$R IGYCLIBH IGYCLIBO IGYCLIBR IGYCLSTR
IGYCLVL0 IGYCLVL1 IGYCLVL2 IGYCLVL3 IGYCLVL8 IGYCMALL IGYCOB2
IGYCOPI  IGYCOPT  IGYCOSCN IGYCPGEN IGYCRCTL IGYCRDPR IGYCRDSC
IGYCREAL IGYCRWT  IGYCSCAN IGYCSIMD IGYCUE$0 IGYCUE$1 IGYCUE$2
IGYCUE$3 IGYCUE$4 IGYCUE$5 IGYCUE$8 IGYCUE$D IGYCUE$R IGYCXREF
IGYDRV   IGYEQCWI IGYMSGE  IGYMSGK  IGYMSGT  IGYQCBE  IGYZQDRV
IGYZQENU IGYZQJPN

*Publications Referenced*
SC27871301

Fix information

Fixed component name
ENT COBOL FOR Z
Fixed component ID
5655EC600

Applicable component levels

R620 PSY UI56120
UP18/06/01 P F805
R621 PSY UI56121
UP18/06/01 P F805
R622 PSY UI56122
UP18/06/01 P F805
R62H PSY UI56123
UP18/06/01 P F805

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS6SG3","label":"Enterprise COBOL for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"620","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
12 December 2023

Tips

PI97434: MAKE ENTERPRISE COBOL UNICODE SURROGATE PAIR AWARE, HANDLE 4 BYTE CHARACTERS (COBOL RTE)

A fix is available

Subscribe

APAR status

Closed as new function.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R620 PSY UI56120

R621 PSY UI56121

R622 PSY UI56122

R62H PSY UI56123

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

Document Information

Share your feedback

Need support?