REGEXP_LIKE predicate
The REGEXP_LIKE predicate searches for a regular expression pattern in a string.
If the pattern-expression is found, the result is true. If the pattern-expression is not found, the result is false. If source-string and pattern-expression are empty strings, the result is true. If source-string or pattern-expression is the empty string (but not both), the result is false. If the value of any of the arguments is null, the result is unknown.
- source-string
- An expression that specifies the string in which the search is to take place. The expression must return a value that is a built-in character string, graphic string, numeric, or datetime data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast to a UTF-16 DBCLOB before searching for the regular expression pattern. A character string with the FOR BIT DATA attribute or a binary string is not supported. The length of a string must not be greater than 1 gigabyte.
- pattern-expression
- An expression that specifies the regular expression string that
is the pattern for the search. The expression must return a value
that is a built-in character string, graphic string, numeric, or datetime
data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast
to a UTF-16 DBCLOB before searching for the regular expression pattern.
A character string with the FOR BIT DATA attribute or a binary string
is not supported. The length of the string must not be greater than
32K.
A valid pattern-expression consists of a set of characters and control characters that describe the pattern of the search. For a description of the valid control characters, see Regular expression control characters.
- start
- An expression that specifies the position within source-string at which the search is to start. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before searching for the regular expression pattern. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 1. If the value of the integer is greater than the actual length of the source-string, the result of the predicate is false.
- flags
- An expression that specifies flags that control aspects of the
pattern matching. The expression must return a value that is a built-in
character string or graphic string data type. A character string with
the FOR BIT DATA attribute or a binary string is not supported. The
string can include one or more valid flag values and the combination
of flag values must be valid. An empty string is the same as the value
'c'.
For a description of the valid flag characters, see Regular expression flag values.
Regular expression flag values
Flag value | Description |
---|---|
c | Specifies that matching is case sensitive. This is the default value if neither 'c' nor 'i' is specified. This value must not be specified with a value of 'i'. |
i | Specifies that matching is case insensitive. This value must not be specified with a value of 'c'. |
m | Specifies that the input data could contain more than one line. By default, the '^' and the '$' in a pattern will only match the start and the end, respectively, of the input string. If this flag is set, "^" and "$" will also match at the start and end of each line within the input string. |
n | Specifies that the '.' character in a pattern matches a line terminator in the input string. By default, the '.' in a pattern will not match a line terminator. A carriage-return and line-feed pair in the input string behaves as a single line terminator, and will match a single "." in a pattern. |
s | Specifies that the '.' character in a pattern matches a line terminator in the input string. This is a synonym for the 'n' value. |
x | Specifies that white space characters in a pattern are ignored, unless escaped. |
Regular expression control characters
The regular expression processing is performed using the International Components for Unicode (ICU) regular expression API on Unicode data with regular expression patterns that use the control characters listed below.
Note that only halfwidth characters are recognized. Any fullwidth characters that correspond to the characters in these tables will not be recognized.
Character | Outside of sets | [Inside sets] | Description |
---|---|---|---|
\a | ✓ | ✓ | Match a BELL, \u0007 |
\A | ✓ | Match at the beginning of the input. Differs from ^ in that \A will not match after a new line within the input. | |
\b | ✓ | Match if the current position is a word boundary. Boundaries occur at the transitions between word (\w) and non-word (\W) characters, with combining marks ignored. | |
\B | ✓ | Match if the current position is not a word boundary. | |
\cX | ✓ | ✓ | Match a control-X character. |
\d | ✓ | ✓ | Match any character with the Unicode General Category of Nd (Number, Decimal Digit.) |
\D | ✓ | ✓ | Match any character that is not a decimal digit. |
\e | ✓ | ✓ | Match an ESCAPE, \u001B. |
\E | ✓ | ✓ | Terminates a \Q ... \E quoted sequence. |
\f | ✓ | ✓ | Match a FORM FEED, \u000C. |
\G | ✓ | Match if the current position is at the end of the previous match. | |
\n | ✓ | ✓ | Match a LINE FEED, \u000A. |
\N{UNICODE CHARACTER NAME} | ✓ | ✓ | Match the named character. |
\p{UNICODE PROPERTY NAME} | ✓ | ✓ | Match any character with the specified Unicode Property. |
\P{UNICODE PROPERTY NAME} | ✓ | ✓ | Match any character not having the specified Unicode Property. |
\Q | ✓ | ✓ | Quotes all following characters until \E. |
\r | ✓ | ✓ | Match a CARRIAGE RETURN, \u000D |
\s | ✓ | ✓ | Match a white space character. White space is defined as [\t\n\f\r\p{Z}]. |
\S | ✓ | ✓ | Match a non-white space character. |
\t | ✓ | ✓ | Match a HORIZONTAL TABULATION, \u0009. |
\uhhhh | ✓ | ✓ | Match the character with the hex value hhhh. |
\Uhhhhhhhh | ✓ | ✓ | Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff. |
\w | ✓ | ✓ | Match a word character. Word characters are
[\p{Alphabetic}\p{Mark}\p{Decimal_Number} \p{Connector_Punctuation}\u200c\u200d]. |
\W | ✓ | ✓ | Match a non-word character. |
\x{hhhh} | ✓ | ✓ | Match the character with hex value hhhh. From one to six hex digits may be supplied. |
\xhh | ✓ | ✓ | Match the character with two digit hex value hh |
\X | ✓ | Match a Grapheme Cluster | |
\Z | ✓ | Match if the current position is at the end of input, but before the final line terminator, if one exists. | |
\z | ✓ | Match if the current position is at the end of input. | |
\n | ✓ | Back Reference. Match whatever the nth capturing group matched. n must be a number > 1 and < total number of capture groups in the pattern. | |
\0ooo | ✓ | ✓ | Match an Octal character. 'ooo' is from one to three octal digits. 0377 is the largest allowed Octal character. The leading zero is required; it distinguishes Octal constants from back references. |
[pattern] | ✓ | ✓ | Match any one character from the set. |
. | ✓ | Match any character. | |
^ | ✓ | Match at the beginning of a line. | |
$ | ✓ | Match at the end of a line. | |
\ | ✓ | Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . | |
\ | ✓ | Quotes the following character. Characters that must be quoted to be treated as literals are [ ] \ Characters that may need to be quoted, depending on the context are - & |
Operator | Description |
---|---|
| | Alternation. A|B matches either A or B. |
* | Match 0 or more times. Match as many times as possible. |
+ | Match 1 or more times. Match as many times as possible. |
? | Match zero or one times. Prefer one |
{n} | Match exactly n times |
{n,} | Match at least n times. Match as many times as possible. |
{n,m} | Match between n and m times. Match as many times as possible, but not more than m. |
*? | Match 0 or more times. Match as few times as possible. |
+? | Match 1 or more times. Match as few times as possible. |
?? | Match zero or one times. Prefer zero. |
{n}? | Match exactly n times |
{n,}? | Match at least n times, but no more than required for an overall pattern match |
{n,m}? | Match between n and m times. Match as few times as possible, but not less than n. |
*+ | Match 0 or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails (Possessive Match) |
++ | Match 1 or more times. Possessive match |
?+ | Match zero or one times. Possessive match |
{n}+ | Match exactly n times |
{n,}+ | Match at least n times. Possessive Match. |
{n,m}+ | Match between n and m times. Possessive Match. |
( ... ) | Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match. |
(?: ... ) | Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses. |
(?> ... ) | Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the "(?>" |
(?# ... ) | Free-format comment (?# comment ). |
(?= ... ) | Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position. |
(?! ... ) | Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position. |
(?<= ... ) | Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.) |
(?<! ... ) | Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.) |
(?ismwx-ismwx: ... ) | Flag settings. Evaluate the parenthesized expression with the specified flags enabled or disabled. |
(?ismwx-ismwx) | Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match. |
Example | Description |
---|---|
[abc] | Match any of the characters a, b or c |
[^abc] | Negation - match any character except a, b or c |
[A-M] | Range - match any character from A to M. The characters to include are determined by Unicode code point ordering. |
[\u0000-\U0010ffff] | Range - match all characters. |
[\p{Letter}]
[\p{General_Category=Letter}] [\p{L}] |
Characters with Unicode Category = Letter. All forms shown are equivalent. |
[\P{Letter}] | Negated property. (Upper case \P) Match everything except Letters. |
[\p{numeric_value=9}] | Match all numbers with a numeric value of 9. Any Unicode Property may be used in set expressions. |
[\p{Letter}&&\p{script=cyrillic}] | Logical AND or intersection. Match the set of all Cyrillic letters. |
[\p{Letter}--\p{script=latin}] | Subtraction. Match all non-Latin letters. |
[[a-z][A-Z][0-9]]
[[a-zA-Z0-9]] |
Implicit Logical OR or Union of Sets. The examples match ASCII letters and digits. The two forms are equivalent. |
[:script=Greek:] | Alternate POSIX-like syntax for properties. Equivalent to \p{script=Greek} |
Notes
Prerequisites: In order to use the REGEXP_LIKE predicate, the International Components for Unicode (ICU) option must be installed
Processing: The regular expression processing is done using the International Components for Unicode (ICU) regular expression interface. For more information see, http://userguide.icu-project.org/strings/regexp.
If only three arguments are specified, the third argument may be a start or flags argument. If the third argument is a string, it is interpreted as a flags argument. Otherwise, it is interpreted as a start argument.
Examples
- Example 1: Select the employee number where the last name
is spelled LUCCHESSI, LUCHESSI, or LUCHESI from the EMPLOYEE table
without considering upper or lower case letters.
The result is 1 row with EMPNO value '000110'.SELECT EMPNO FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME,'luc+?hes+?i','i')
- Example 2: Select any invalid product identifier values
from the PRODUCT table. The expected format is 'nnn-nnn-nn' where
'n' is a digit from 0 to 9.
The result is 0 rows because all the product identifiers match the pattern.SELECT PID FROM PRODUCT WHERE NOT REGEXP_LIKE(pid,'[0-9]{3}-[0-9]{3}-[0-9]{2}')