Matches regular expression check
This data rule definition logic checks to see if your data matches a regular expression.
Syntax
Positive:
source-data MATCHES_REGEX pattern-string
Negative:
source data NOT MATCHES_REGEX pattern-string
Description
You can use a regular expression to search for a particular character, or a character string in the beginning, middle, or end of a word. You can also use regular expressions to find a particular variable, a range of characters, or specific kinds of characters, such as printable or unprintable characters. When positive, it finds rows for which the source data matches the specified expression. When negative, it finds rows for which the source data does not match the specified expression.
Any lowercase character that you specify in a pattern string searches for that specific lowercase character. Any uppercase character that you specify in a pattern string indicates that particular uppercase letter. Any specific digit you specify, from zero to nine, indicates the particular digit that you specified in the pattern string. You can use these operators in combination with other valid characters to build very complex expressions.
| Operator | Function |
|---|---|
| . (period) | Indicates a match for a single character. |
| $ (dollar sign) | Indicates the end of a line. |
| ^ (caret) | Indicates that the pattern string starts at the beginning of a line. |
| [uppercase character] | Indicates a match for a specific uppercase character. |
| [lowercase character] | Indicates a match for a specific lowercase character. |
| [digit from 0–9] | Indicates a match for a specific single digit. |
| \d [a particular digit from 0–9] | Indicates a match for a specific single digit. |
| \b | Indicates a match at a word boundary. |
| \s | Indicates a match for any whitespace character. |
| \S | Indicates a match for any non-whitespace character. |
| \w | Indicates a match for any single alphanumeric character. |
| [ ] | Indicates the start for the character class definition that you specify. |
| [^X] | Indicates a match for the start of a line, followed by the specific characters that you specify. |
| X? | Indicates a match for no or one occurrence of the qualifier. It also extends the meaning of the qualifier. |
| X* | Indicates a match for a qualifier of zero or more. |
| X+ | Indicates a match for one or more qualifiers. |
| {X, Y} | Indicates a match between X and Y number of occurrences. |
| (X|Y) + | Indicates the start of an alternative branch. |
| ? | Indicates an optional match. |
Regular expression support
Additional regular expression information is provided by the Perl Compatible Regular Expression (PCRE) library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. The PCRE documentation can be found at http://regexkit.sourceforge.net/Documentation/pcre/pcrepattern.html.
Notes
- You can use a expression that is made up of scalar functions, arithmetic operations and variables as a source data. If you do this, make sure that each expression evaluates to a character or string data type.
- To perform a simple format check, you can use the matches format check.
- To prevent incorrect evaluation results of data that is padded with the null character (ASCII '00') apply the appropriate trim function (trim, ltrim, or rtrim) in the expression.
Examples
- Postal Codes
- You are searching for all US postal codes in a column:
'\b[0-9]{5}(?:-[0-9]{4})?\b' - You are searching for all Canadian postal codes in a column:
'\b[ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]\b' - You are searching for all UK postal codes in a column:
'\b[A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}\b' - You are searching for all US postal codes in a column that start
with '02', such as 02112, 02113, 02114, and so on:
'\b02[0-9]{3}(?:-[0-9]{4})?\b'
- You are searching for all US postal codes in a column:
- Phone numbers
- You are searching for phone numbers in the standard North American
format. You want to account for variations in the format such as 123-456-7890,
(123)-456-7890, 123 456 7890 , 123.456.7890 and so on:
'\(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}' - You want to search for phone numbers with the US area code of
212:
\(?212\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}'
- You are searching for phone numbers in the standard North American
format. You want to account for variations in the format such as 123-456-7890,
(123)-456-7890, 123 456 7890 , 123.456.7890 and so on:
- IP addresses
- You want to find IP addresses that match 0.0.0.0 to 999.999.999.999:
or'\b([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\b''\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b' - You want to verify that all values in a column have the format
of an IP address:
ipaddress matches_regex '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
- You want to find IP addresses that match 0.0.0.0 to 999.999.999.999:
- Specific words
- You are searching for a word:
'\b%word%\b' - You are searching for words that occur more than twice in a column:
'\b(\w+)\s+\1\b' - You are searching for pairs of words from the following list:
firstword, secondword, thirdword. You want to find the words that
are separated by at least one word, but up to five words:
'\b(firstword|secondword|thirdword)(?:\W+\w+){1,5}\W+ (firstword|secondword|thirdword)\b' - You are searching from one of the following words: firstword,
secondword, or thirdword.
'\b(?:firstword|secondword|thirdword)\b' - You are searching for a word that does not begin with uppercase
characters:
'\b(a-z)+(a-z,A-Z)*\b'
- You are searching for a word:
- Others
- Find names at the beginning of a line, which start with "Smi,":
firstnamecolumn matches_regex '^Smi' - Find all file names that begin with an uppercase A, followed by any number of characters ending with ".txt":
filenamecolumn matches_regex '^A.*\.txt$'- You are searching for an employee named Kristie Jones who has
accounts on various applications, in which she has used the first
names Kristie, Kristin, and Kristen as her first name:
firstname matches_regex ('Kristie'|'Kristin'|'Kristen')[ ]'Jones'
- Find names at the beginning of a line, which start with "Smi,":