UVALID

If a character string consists of valid Unicode UTF-8 or UTF-16 data, the UVALID function returns the value zero. If a character string contains invalid Unicode data, the UVALID function returns the index of the first invalid element.

The function type is integer.

Format

argument-1: Must be of class alphabetic, alphanumeric, or national.

The returned value is an integer, which differs based on argument-1:

If argument-1 is of class alphabetic or alphanumeric, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
If argument-1 is of class alphabetic or alphanumeric, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.

Note: The UVALID function indicates whether the character string contains well-formed Unicode UTF-8 or UTF-16 data. It does not indicate whether any or all of the Unicode code points represented by the character string are assigned to characters.

For UTF-8 data, the validity of a byte varies according to its range as listed in the table:

Table 1. Byte validity for UTF-8 data
Value Range	Dependency	Validity
x'00' - x'7F'	None	Valid
x'80' - x'C1'	None	Invalid
x'C2' - x'DF'	Followed by another byte that is in the range x'80' to x'BF'	Valid
x'E0' - x'EF'	If the first byte is x'E0', followed by two more bytes that meet the following requirements: The second byte is in the range x'A0' to x'BF' The third byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'E1' to x'EC', both the second and third bytes are in the range x'80' to x'BF'	Valid
	If the first byte is x'ED', followed by two more bytes that meet the following requirements: The second byte is in the range x'80' to x'9F' The third byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'EE' to x'EF', both the second and third bytes are in the range x'80' to x'BF'	Valid
x'F0' - x'F4'	If the first byte is x'F0', followed by three more bytes that meet the following requirements: The second byte is in the range x'90' to x'BF' The third byte is in the range x'80' to x'BF' The fourth byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'F1' to x'F3', all the second, third, and fourth bytes are in the range x'80' to x'BF'	Valid
	If the first byte is x'F4', followed by three more bytes that meet the following requirements: The second byte is in the range x'80' to x'8f' The third byte is in the range x'80' to x'BF' The fourth byte is in the range x'80' to x'BF'	Valid
x'F5' - x'FF'	None	Invalid

For UTF-16 data, the validity of an encoding unit varies according to its range as listed in the table:

Table 2. Encoding unit validity for UTF-16 data
Value Range	Dependency	Validity	Number of bytes if converted to UTF-8
nx'0000' - nx'007F'	None	Valid	1
nx'0080' - nx'07FF'	None	Valid	2
nx'0800' - nx'D7FF'	None	Valid	3
nx'D800' - nx'DBFF'	Must be followed by a second encoding unit with a value in the range nx'DC00' to nx'DFFF'	Valid	4 (A Unicode surrogate pair)
nx'D800' - nx'DBFF'	Other cases	Invalid	Not applicable
nx'E000' - nx'FFFF'	None	Valid	3