regcmp or regex Subroutine
Purpose
Compiles and matches regular-expression patterns.
Libraries
Standard
C Library ( libc.a )
Programmers
Workbench Library (libPW.a)
Syntax
Description
The regcmp subroutine compiles a regular expression (or Pattern) and returns a pointer to the compiled form. The regcmp subroutine allows multiple String parameters. If more than one String parameter is given, then the regcmp subroutine treats them as if they were concatenated together. It returns a null pointer if it encounters an incorrect parameter.
You can use the regcmp command to compile regular expressions into your C program, frequently eliminating the need to call the regcmp subroutine at run time.
Theregex subroutine compares a compiled Pattern to
the Subject string. Additional parameters are used to receive
values. Upon successful completion, theregex subroutine returns
a pointer to the next unmatched character. If theregex subroutine
fails, a null pointer is returned. A global character pointer, __loc1,
points to where the match began.
The regcmp and regex subroutines are borrowed from the ed command however, the syntax and semantics have been changed slightly. You can use the following symbols with the regcmp and regex subroutines:
| Item | Description |
|---|---|
[ ] * . ^ |
These symbols have the same meaning as they do in the ed command. |
- |
The minus sign (or hyphen) within brackets that are used with theregex
subroutine means "through," according to the current collating sequence. For example, [a-z] can be
equivalent to [abcd . . . xyz] or [aBbCc . . . xYyZz]. You can use
the - by itself if the - is the last or first character. For example, the character class expression
[ ] -] matches the] (right bracket) and - (minus) characters. The regcmp
subroutine does not use the current collating sequence, and the minus sign in brackets controls only
a direct ASCII sequence. For example, [a-z] always means |
$ |
Matches the end of the string. Use the \n character to match a new-line character. |
+ |
A regular expression followed by + (plus sign) means one or more times. For example, [0-9] + is equivalent to [0-9] [0-9] *. |
{ m} {m,} {m,
u} |
Integer values enclosed in {} (braces) indicate the number of times to
apply the preceding regular expression. The m character is the minimum number
and the u character is the maximum number. The u characters
must be less than 256. If you specify only m, it indicates the exact number of
times to apply the regular expression. {m,} is equivalent to
{m,u} and matches m or
more occurrences of the expression. The + (plus sign) and * (asterisk) operations are equivalent to
{1,} and {0,}, respectively. |
( . . . )$n |
This stores the value that is matched by the enclosed regular expression in the (n+1)th ret parameter. Ten enclosed regular expressions are allowed. Theregex subroutine makes the assignments unconditionally. |
( . . . ) |
Parentheses group subexpressions. An operator, such as *, +, or [
] works on a single character or on a regular expression that is enclosed in parentheses.
For example, (a*(cb+)*)$0. |
All of the preceding defined symbols are special. Precede them with a
\ (backslash) if you want to match the special symbol itself. For example, \$
matches a dollar sign.
/* . . . Your Program . . . */
malloc(n)
int n;
{
static int rebuf[256] ;
return ((n <= sizeof(rebuf)) ? rebuf : NULL);
} The regcmp subroutine produces code values that theregex subroutine can interpret as the regular expression. For instance, [a-z] indicates a range expression, which the regcmp subroutine compiles into a string containing the two end points (a and z).
Theregex subroutine interprets the range statement
according to the current collating sequence. The expression [a-z] can be equivalent either to
[abcd . . . Xyz] , or to [aBbCcDd . . . xXyYzZ], as long as the
character preceding the minus sign has a lower collating value than the character
following the minus sign.
The behavior of a range expression depends on the collation sequence. If
you want to match a specific set of characters, you should list each one. For
example, to select the letters a, b, or c, use [abc] rather than
[a-c].
- No assumptions are made at compile time about the actual characters that are contained in the range.
- Do not use multibyte characters.
- You can use the
](right bracket) itself within a pair of brackets if it immediately follows the leading[(left bracket) or[^(a left bracket followed immediately by a circumflex). - You can also use the minus sign (or hyphen) if it is the first or last character in the
expression. For example, the expression
[ ] -0]matches either the right bracket( ] ),or the characters - through 0.
Parameters
| Item | Description |
|---|---|
| Subject | Specifies a comparison string. |
| String | Specifies the Pattern to be compiled. |
| Pattern | Specifies the expression to be compared. |
| ret | Points to an address at which to store comparison data. Theregex subroutine allows multiple ret String parameters. |