regcomp() — Compile regular expression
Standards
Standards / Extensions | C or C++ | Dependencies |
---|---|---|
XPG4 |
both |
Format
#include <regex.h>
int regcomp(regex_t *_restrict_ preg, const char *_restrict_ pattern, int cflags);
General description
Compiles the regular expression specified by pattern into an executable string of op-codes.
preg is a pointer to a compiled regular expression.
pattern is a pointer to a character string defining a source regular expression (described below).
- REG_EXTENDED
- Support extended regular expressions.
- REG_ICASE
- Ignore case in match.
- REG_NEWLINE
- Eliminate any special significance to the newline character.
- REG_NOSUB
- Report only success or fail in regexec(), that is, verify the syntax of a regular expression. If this flag is set, the regcomp() function sets re_nsub to the number of parenthesized sub-expressions found in pattern. Otherwise, a sub-expression results in an error.
The regcomp() function under z/OS XL C/C++ will use the definition of characters according to the current LC_SYNTAX category. The characters, [, ], {, }, |, ^, and $, have varying code points in different encoded character sets.
Regular expressions
The functions regcomp(), regerror(), regexec(), and regfree() use regular expressions in a similar way to the UNIX awk, ed, grep, and egrep commands.
- Symbol
- Description
- .
- The period symbol matches any one character except the terminal newline character.
- [character–character]
- The hyphen symbol, within square brackets, means “through”. It fills in the intervening characters according to the current collating sequence. For example, [a–z] can be equivalent to [abc…xyz] or, with a different collating sequence, it can be equivalent to [aAbBcC…xXyYzZ].
- [string]
- A string within square brackets specifies any of the characters
in string. Thus [abc],
if compared to other strings, would match any that contained a, b,
or c.
No assumptions are made at compile time about the actual characters contained in the range.
- {m} {m,} {m,u}
- Integer values enclosed in {} indicate the number of times to
apply the preceding regular expression. m is
the minimum number, and u is the maximum
number. u must not be greater than RE_DUP_MAX
(see limits.h).
If you specify only m, it indicates the exact number of times to apply the regular expression. {m,} is equivalent to {m,u}. They both match m or more occurrences of the expression.
- *
- The asterisk symbol indicates 0 or more of any characters. For example, [a*e] is equivalent to any of the following: 99ae9, aaaaae, a999e99.
- $
- The dollar symbol matches the end of the string. (Use \n to match a newline character.)
- character+
- The plus symbol specifies one or more occurrences of a character. Thus, smith+ern is equivalent to, for example, smithhhern.
- [^string]
- The caret symbol, when inside square brackets, negates the characters within the square brackets. Thus [^abc], if compared to other strings, would fail to match any that contains even one a, b, or c.
- (expression)$n
- Stores the value matched by the enclosed regular expression in the (n+1)th ret parameter. Ten enclosed regular expressions are allowed. Assignments are made unconditionally.
- (expression)
- Groups a sub-expression allowing an operator, such as *, +, or [].], to work on the sub-expression enclosed in parentheses. For example, (a*(cb+)*)$0.
- Do not use multibyte characters.
- You can use the ] (right square bracket) alone within a pair of square brackets, but only if it immediately follows either the opening left square bracket or if it immediately follows [^. For example: []–] matches the ] and – characters.
- All the preceding symbols are special. You precede them with \ to use the symbol itself. For example, a\.e is equivalent to a.e.
- You can use the – (hyphen) by itself, but only if it is the first or last character in the expression. For example, the expression []--0] matches either the ] or else the characters – through 0. Otherwise, use \–.
Returned value
If successful, regcomp() returns 0.
If unsuccessful, regcomp() returns nonzero, and the content of preg is undefined.
Example
/* CELEBR07
This example compiles an extended regular expression.
*/
#include <regex.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
main() {
regex_t preg;
char *string = "a simple string";
char *pattern = ".*(simple).*";
int rc;
if ((rc = regcomp(&preg, pattern, REG_EXTENDED)) != 0) {
printf("regcomp() failed, returning nonzero (%d)", rc);
exit(1);
}
}