UVALID

If a character string consists of valid Unicode UTF-8 or UTF-16 data, the UVALID function returns the value zero. If a character string contains invalid Unicode data, the UVALID function returns the index of the first invalid element.

The function type is integer.

Format

Read syntax diagramSkip visual syntax diagramFUNCTION UVALID(argument-1)
argument-1
Must be of class alphabetic, alphanumeric, or national.
The returned value is an integer, which differs based on argument-1:
  • If argument-1 is of class alphabetic or alphanumeric, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
  • If argument-1 is of class alphabetic or alphanumeric, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
  • If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
  • If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.
Note: The UVALID function indicates whether the character string contains well-formed Unicode UTF-8 or UTF-16 data. It does not indicate whether any or all of the Unicode code points represented by the character string are assigned to characters.
For UTF-8 data, the validity of a byte varies according to its range as listed in the table:
Table 1. Byte validity for UTF-8 data
Value Range Dependency Validity
x'00' - x'7F' None Valid
x'80' - x'C1' None Invalid
x'C2' - x'DF' Followed by another byte that is in the range x'80' to x'BF' Valid
x'E0' - x'EF'
If the first byte is x'E0', followed by two more bytes that meet the following
requirements:
  • The second byte is in the range x'A0' to x'BF'
  • The third byte is in the range x'80' to x'BF'
Valid
If the first byte is in the range x'E1' to x'EC', both the second and third bytes
are in the range x'80' to x'BF'
Valid
If the first byte is x'ED', followed by two more bytes that meet the following
requirements:
  • The second byte is in the range x'80' to x'9F'
  • The third byte is in the range x'80' to x'BF'
Valid
If the first byte is in the range x'EE' to x'EF', both the second and third bytes
are in the range x'80' to x'BF'
Valid
x'F0' - x'F4'
If the first byte is x'F0', followed by three more bytes that meet the following
requirements:
  • The second byte is in the range x'90' to x'BF'
  • The third byte is in the range x'80' to x'BF'
  • The fourth byte is in the range x'80' to x'BF'
Valid
If the first byte is in the range x'F1' to x'F3', all the second, third, and fourth bytes
are in the range x'80' to x'BF'
Valid
If the first byte is x'F4', followed by three more bytes that meet the following
requirements:
  • The second byte is in the range x'80' to x'8f'
  • The third byte is in the range x'80' to x'BF'
  • The fourth byte is in the range x'80' to x'BF'
Valid
x'F5' - x'FF' None Invalid
For UTF-16 data, the validity of an encoding unit varies according to its range as listed in the table:
Table 2. Encoding unit validity for UTF-16 data
Value Range Dependency Validity Number of bytes if converted to UTF-8
nx'0000' - nx'007F' None Valid 1
nx'0080' - nx'07FF' None Valid 2
nx'0800' - nx'D7FF' None Valid 3
nx'D800' - nx'DBFF' Must be followed by a second encoding unit with a value in the range nx'DC00' to nx'DFFF' Valid
4
(A Unicode surrogate pair)
Other cases Invalid Not applicable
nx'E000' - nx'FFFF' None Valid 3