UPOS

The UPOS function returns an integer value that is equal to the index of the nth UTF-8 or UTF-16 character in a character data item argument that contains UTF-8 or UTF-16.

The function type is integer.

Format

Read syntax diagramSkip visual syntax diagramFUNCTION UPOS(argument-1argument-2 )
argument-1
Must be of class alphabetic, alphanumeric, national or UTF-8. argument-1 must contain valid UTF-8 or UTF-16 encoded characters:
  • If argument-1 is of class alphabetic, alphanumeric or UTF-8, it must contain valid UTF-8 data.
  • If argument-1 is of class national, it must contain valid UTF-16 data.
argument-2
Must be an integer.

Suppose argument-1 is alphabetic or alphanumeric and argument-2=n, the returned value is the byte position of the nth UTF-8 character in argument-1. Suppose argument-1 is a national data item and argument-2=n, the returned value is the byte position of the nth UTF-16 character in argument-1.

If argument-2 is not positive or if argument-2 is larger than ULENGTH(argument-1), zero is returned. Otherwise, if argument-2=n, the returned value is the byte position in argument-1 where the nth UTF-8 or UTF-16 character starts.

The returned value of UPOS is a 9-digit integer if LP(32) is in effect or an 18-digit integer if LP(64) is in effect.

Example 1

If A is an alphanumeric item that contains the UTF-8 value x'4BC3A4666572' ('Käfer'), the returned values are as follows:

  • UPOS(A 1) returns 1
  • UPOS(A 2) returns 2
  • UPOS(A 3) returns 4
  • UPOS(A 4) returns 5
  • UPOS(A 5) returns 6

Example 2

If B is a national item that contains the UTF-16 value nx'005400F6006200750072D858DC6B0073' ('Töber𦁫s'), the returned values are as follows:

  • UPOS (B 1) returns 1
  • UPOS (B 2) returns 3
  • UPOS (B 3) returns 5
  • UPOS (B 4) returns 7
  • UPOS (B 5) returns 9
  • UPOS (B 6) returns 11
  • UPOS (B 7) returns 15

Example 3

If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the UPOS function are different. See the following table for details.

Table 1. Returned values of the UPOS function
argument-1 Unicode encoding UTF-8 encoding Returned values of the UPOS function
C = äK
U+00E4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K)
x'C3A44B' (äK) UPOS(C 1) returns 1
UPOS(C 2) returns 3
UPOS(C 3) returns 0
U+0061 + U+0308 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K)
x'61CC884B' (äK) UPOS(C 1) returns 1
UPOS(C 2) returns 2
UPOS(C 3) returns 4