USUBSTR
The USUBSTR function returns a substring of the data in a character data item argument that contains UTF-8 or UTF-16 data.
The function type is alphanumeric, national, or UTF-8, depending on the class of argument-1.
- argument-1
- Must be of class alphabetic, alphanumeric, national or UTF-8. argument-1 must
contain valid UTF-8 or UTF-16 encoded characters:
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid UTF-16 data.
- argument-2
- Must be an integer that is greater than zero. It represents the starting position of a substring in argument-1.
- argument-3
- Must be an integer that is greater than or equal to zero. It represents the length of a substring in argument-1.
Suppose argument-1 is alphabetic or alphanumeric, argument-2 = n and argument-3 = m, the returned value is an alphanumeric item that contains m UTF-8 characters from argument-1, starting with the nth UTF-8 character. Suppose argument-1 is a national data item, argument-2 = n and argument-3 = m, the returned value is a national item that contains m UTF-16 characters from argument-1, starting with the nth UTF-16 character.
Example 1
If A is an alphanumeric item that contains the UTF-8 value x'4BC3A4666572' ('Käfer'), the returned values are as follows:- USUBSTR(A 1 2) returns x'4BC3A4' ('Kä')
- USUBSTR(A 2 1) returns x'C3A4' ('ä')
- USUBSTR(A 2 2) returns x'C3A466' ('äf')
- USUBSTR(A 3 2) returns x'6665' ('fe')
Example 2
If B is a national item that contains the UTF-16 value nx'005400F6006200750072D858DC6B0073' ('Töber𦁫s'), the returned values are as follows:
- USUBSTR(B 1 2) returns x'005400F6' ('Tö')
- USUBSTR(B 2 1) returns x'00F6' ('ö')
- USUBSTR(B 2 2) returns x'00F60062' ('öb')
- USUBSTR(B 3 2) returns x'00620075' ('be')
- USUBSTR(B 5 2) returns x'0072D858DC6B' ('r𦁫')
- USUBSTR(B 6 2) returns x'D858DC6B0073' ('𦁫s')
Example 3
If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the USUBSTR function are different. See the following table for details.
argument-1 | Unicode encoding | UTF-8 encoding | Returned values of the USUBSTR function |
---|---|---|---|
C = äK | U+00E4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K) |
x'C3A44B' (äK) | USUBSTR (C 1 1) returns x'C3A4' (ä) USUBSTR (C 2 1) returns x'4B' (K) USUBSTR (C 1 2) returns x'C3A44B' (äK) |
U+0061 + U+0308 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K) |
x'61CC884B' (äK) | USUBSTR (C 1 1) returns x'61' (a) USUBSTR (C 2 1) returns x'CC88' (¨) USUBSTR (C 1 2) returns x'61CC88' (ä) USUBSTR (C 1 3) returns x'61CC884B' (äK) |