regexec Subroutine
Purpose
Compares the null-terminated string that is specified by the value of the String parameter against the compiled basic or extended regular expression Preg, which must have previously been compiled by a call to the regcomp subroutine.
Library
Standard C Library (libc. a)
Syntax
Description
The regexec subroutine compares the null-terminated string in the String parameter with the compiled basic or extended regular expression in the Preg parameter that is initialized by a previous call to the regcomp subroutine. If a match is found, the regexec subroutine returns a value of 0. The regexec subroutine returns a nonzero value if it finds no match or it finds an error.
If the NMatch parameter has a value
of 0, or if the REG_NOSUB flag was set on the call to the regcomp subroutine,
the regexec subroutine ignores the PMatch parameter.
Otherwise, the PMatch parameter points to an array of at least
the number of elements specified by the NMatch parameter. The regexec subroutine
fills in the elements of the array pointed to by the PMatch parameter
with offsets of the substrings of the String parameter. The
offsets correspond to the parenthetic subexpressions of the original pattern parameter
that was specified to the regcomp subroutine.
The pmatch.rm_so structure is the byte offset of the
beginning of the substring, and the pmatch.rm_eo structure is one greater than the
byte offset of the end of the substring. Subexpression i begins at the
i th matched open parenthesis, counting from 1. The 0 element of the array
corresponds to the entire pattern. Unused elements of the PMatch parameter, up to
the value PMatch[NMatch -1], are filled with -1. If more than
the number of subexpressions are specified by the NMatch parameter (the
pattern parameter itself counts as a subexpression), only the first
NMatch -1 subexpressions are recorded.
When a basic or extended regular expression is being matched, any given parenthetic subexpression of the pattern parameter might match several different substrings of the String parameter. Otherwise, it might not match any substring even though the pattern as a whole did match.
The following rules are used to determine which substrings to report in the PMatch parameter when regular expressions are matched:
- If a subexpression in a regular expression participated in the match several times, the offset of the last matching substring is reported in the PMatch parameter.
- If a subexpression did not participate in a match, the byte offset in the
PMatch parameter is a value of -1. A subexpression does not participate in a
match if any of the following are true:
- An
*(asterisk) or\{\}(backslash, left brace, backslash, right brace) appears immediately after the subexpression in a basic regular expression. - An
* (asterisk), ? (question mark),or{ }(left and right braces) appears immediately after the subexpression in an extended regular expression and the subexpression did not match (matched 0 times). - A
|(pipe) is used in an extended regular expression to select either the subexpression that didn't match or another subexpression, and the other subexpression matched.
- An
- If a subexpression is contained in a subexpression, the data in the PMatch parameter refers to the last such subexpression.
- If a subexpression is contained in a subexpression and the byte offsets in the PMatch parameter have a value of -1, the pointers in the PMatch parameter also have a value of -1.
- If a subexpression matched a zero-length string, the offsets in the PMatch parameter refer to the byte immediately following the matching string.
If the REG_NOSUB flag was set in the cflags parameter
in the call to the regcomp subroutine, and the NMatch parameter
is not equal to 0 in the call to the regexec subroutine, the
content of the PMatch array is unspecified.
If the REG_NEWLINE flag was not set
in the cflags parameter when the regcomp subroutine
was called, then a new-line character in the pattern or String parameter
is treated as an ordinary character. If the REG_NEWLINE flag
was set when the regcomp subroutine was called, the new-line
character is treated as an ordinary character except as follows:
- A new-line character in the String parameter is not matched by a period
outside of a bracket expression or by any form of a nonmatching list. A nonmatching list expression
begins with a ^ (circumflex) and specifies a list that matches any character or collating element
and the expression in the list after the leading caret. For example, the regular expression
[^abc]matches any character excepta,b, orc. The circumflex has this special meaning only when it is the first character in the list, immediately following the left bracket. - A
^(circumflex) in the pattern parameter, when used to specify expression anchoring, matches the zero-length string immediately after a new-line character in the String parameter, regardless of the setting of theREG_NOTBOLflag. - A
$(dollar sign) in the pattern parameter, when used to specify expression anchoring, matches the zero-length string immediately before a new-line character in the String parameter, regardless of the setting of theREG_NOTEOLflag.
Parameters
| Item | Description |
|---|---|
| Preg | Contains the compiled basic or extended regular expression to compare against the String parameter. |
| String | Contains the data to be matched. |
| NMatch | Contains the number of subexpressions to match. |
| PMatch | Contains the array of offsets into the String parameter that match the corresponding subexpression in the Preg parameter. |
| EFlags | Contains the bitwise inclusive OR of 0 or more of the flags controlling the behavior of the
regexec subroutine capable of customizing. The EFlags parameter modifies the interpretation of the contents of the String parameter. It is the bitwise inclusive OR of 0 or more of the following flags, which are defined in the regex.h file:
|
Return Values
On successful completion, the regexec subroutine
returns a value of 0 to indicate that the contents of the String parameter
matched the contents of the pattern parameter, or to indicate
that no match occurred. The REG_NOMATCH error is defined in
the regex.h file.
Error Codes
If the regexec subroutine is unsuccessful, it returns a nonzero value indicating the type of problem. The following macros for possible error codes that can be returned are defined in the regex.h file:
| Item | Description |
|---|---|
REG_NOMATCH |
Indicates the basic or extended regular expression was unable to find a match. |
REG_BADPAT |
Indicates a basic or extended regular expression that is not valid. |
REG_ECOLLATE |
Indicates a collating element referenced that is not valid. |
REG_ECTYPE |
Indicates a character class-type reference that is not valid. |
REG_EESCAPE |
Indicates a trailing \ (backslash) in the pattern. |
REG_ESUBREG |
Indicates a number in \digit is not valid or is in error. |
REG_EBRACK |
Indicates a [ ] (left and right brackets) imbalance. |
REG_EPAREN |
Indicates a \ ( \ ) (backslash, left parenthesis, backslash, right
parenthesis) or ( ) (left and right parentheses) imbalance. |
REG_EBRACE |
Indicates a \ { \ } (backslash, left brace, backslash, right brace)
imbalance. |
REG_BADBR |
Indicates the content of \ { \ } (backslash, left brace, backslash, right
brace) is unusable (not a number, number too large, more than two numbers, or first number larger
than second). |
REG_ERANGE |
Indicates an unusable end point in a range expression. |
REG_ESPACE |
Indicates out of memory. |
REG_BADRPT |
Indicates a ? (question mark), * (asterisk), or + (plus sign) not preceded
by valid basic or extended regular expression. |
If the value of the Preg parameter to the regexec subroutine is not a compiled basic or extended regular expression that is returned by the regcomp subroutine, the result is undefined.
Examples
The following example demonstrates how the REG_NOTBOL flag
can be used with the regexec subroutine to find all substrings in a line that
match a pattern that is supplied by a user. (For simplicity, little error-checking is done in this
example.)
(void) regcomp (&re, pattern, 0) ;
/* this call to regexec finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0) ;
while (error = = 0) { /* while matches found */
<subString found between pm.r._sp and pm.rm_ep>
/* This call to regexec finds the next match */
error = regexec (&re, pm.rm_ep, 1, &pm, REG_NOTBOL) ;