UPOS
The UPOS function returns an integer value that is equal to the index of the nth UTF-8 or UTF-16 character in a character data item argument that contains UTF-8 or UTF-16.
The function type is integer.
- argument-1
- Must be of class alphabetic, alphanumeric, national or UTF-8. argument-1 must
contain valid UTF-8 or UTF-16 encoded characters:
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid UTF-16 data.
- argument-2
- Must be an integer.
Suppose argument-1 is alphabetic or alphanumeric and argument-2=n, the returned value is the byte position of the nth UTF-8 character in argument-1. Suppose argument-1 is a national data item and argument-2=n, the returned value is the byte position of the nth UTF-16 character in argument-1.
If argument-2 is not positive or if argument-2 is larger than ULENGTH(argument-1), zero is returned. Otherwise, if argument-2=n, the returned value is the byte position in argument-1 where the nth UTF-8 or UTF-16 character starts.
The returned value of UPOS is a 9-digit integer if LP(32) is in effect or an 18-digit integer if LP(64) is in effect.
Example 1
If A is an alphanumeric item that contains the UTF-8 value x'4BC3A4666572' ('Käfer'), the returned values are as follows:
- UPOS(A 1) returns 1
- UPOS(A 2) returns 2
- UPOS(A 3) returns 4
- UPOS(A 4) returns 5
- UPOS(A 5) returns 6
Example 2
If B is a national item that contains the UTF-16 value nx'005400F6006200750072D858DC6B0073' ('Töber𦁫s'), the returned values are as follows:
- UPOS (B 1) returns 1
- UPOS (B 2) returns 3
- UPOS (B 3) returns 5
- UPOS (B 4) returns 7
- UPOS (B 5) returns 9
- UPOS (B 6) returns 11
- UPOS (B 7) returns 15
Example 3
If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the UPOS function are different. See the following table for details.
argument-1 | Unicode encoding | UTF-8 encoding | Returned values of the UPOS function |
---|---|---|---|
C = äK | U+00E4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K) |
x'C3A44B' (äK) | UPOS(C 1) returns 1 UPOS(C 2) returns 3 UPOS(C 3) returns 0 |
U+0061 + U+0308 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K) |
x'61CC884B' (äK) | UPOS(C 1) returns 1 UPOS(C 2) returns 2 UPOS(C 3) returns 4 |