regexec() — Execute Compiled Regular Expression

Format

#include <regex.h>
int regexec(const regex_t *preg, const char *string,
            size_t nmatch, regmatch_t *pmatch, int eflags);

Language Level

XPG4

Threadsafe

Yes

Locale Sensitive

The behavior of this function might be affected by the LC_CTYPE and LC_COLLATE categories of the current locale. This function is not available when LOCALETYPE(*CLD) is specified on the compilation command. For more information, see Understanding CCSIDs and Locales.

Description

The regexec() function compares the null-ended string against the compiled regular expression preg to find a match between the two.

The nmatch value is the number of substrings in string that the regexec() function should try to match with subexpressions in preg. The array you supply for pmatch must have at least nmatch elements.

The regexec() function fills in the elements of the array pmatch with offsets of the substrings in string that correspond to the parenthesized subexpressions of the original pattern given to the regcomp() function to create preg. The zeroth element of the array corresponds to the entire pattern. If there are more than nmatch subexpressions, only the first nmatch - 1 are stored. If nmatch is 0, or if the REG_NOSUB flag was set when preg was created with the regcomp() function, the regexec() function ignores the pmatch argument.

The eflags flag defines customizable behavior of the regexec() function:

errflag Description String
REG_NOTBOL Indicates that the first character of string is not the beginning of line.
REG_NOTEOL Indicates that the first character of string is not the end of line.

When a basic or extended regular expression is matched, any given parenthesized subexpression of the original pattern could participate in the match of several different substrings of string. The following rules determine which substrings are reported in pmatch:

  1. If subexpression i in a regular expression is not contained within another subexpression, and it participated in the match several times, then the byte offsets in pmatch[i] will delimit the last such match.
  2. If subexpression i is not contained within another subexpression, and it did not participate in an otherwise successful match, the byte offsets in pmatch[i] will be -1. A subexpression does not participate in the match when any of following conditions are true:
    • * or \{ \} appears immediately after the subexpression in a basic regular expression.
    • *, ?, or { } appears immediately after the subexpression in an extended regular expression, and the subexpression did not match (matched 0 times).
    • | is used in an extended regular expression to select this subexpression or another, and the other subexpression matched.
  3. If subexpression i is contained within another subexpression j, and i is not contained within any other subexpression that is contained within j, and a match of subexpression j is reported in pmatch[j], then the match or non-match of subexpression i reported in pmatch[i] will be as described in 1. and 2. above, but within the substring reported in pmatch[j] rather than the whole string.
  4. If subexpression i is contained in subexpression j, and the byte offsets in pmatch[j] are -1, then the offsets in pmatch[i] also will be -1.
  5. If subexpression i matched a zero-length string, then both byte offsets in pmatch[i] will be the byte offset of the character or null terminator immediately following the zero-length string.
If the REG_NOSUB flag was set when preg was created by the regcomp() function, the contents of pmatch are unspecified. If the REG_NEWLINE flag was set when preg was created, new-line characters are allowed in string.

Return Value

If a match is found, the regexec() function returns 0. If no match is found, the regexec() function returns REG_NOMATCH. Otherwise, it returns a nonzero value indicating an error. A nonzero return value can be used in a call to the regerror() function.

Example

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
 
int main(void)
{
   regex_t    preg;
   char       *string = "a very simple simple simple string";
   char       *pattern = "\\(sim[a-z]le\\) \\1";
   int        rc;
   size_t     nmatch = 2;
   regmatch_t pmatch[2];
 
   if (0 != (rc = regcomp(&preg, pattern, 0))) {
      printf("regcomp() failed, returning nonzero (%d)\n", rc);
      exit(EXIT_FAILURE);
   }
 
   if (0 != (rc = regexec(&preg, string, nmatch, pmatch, 0))) {
      printf("Failed to match '%s' with '%s',returning %d.\n",
             string, pattern, rc);
   }
   else {
      printf("With the whole expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[0].rm_eo - pmatch[0].rm_so, &string[pmatch[0].rm_so],
             pmatch[0].rm_so, pmatch[0].rm_eo - 1);
      printf("With the sub-expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
             pmatch[1].rm_so, pmatch[1].rm_eo - 1);
   }
   regfree(&preg);
   return 0;
 
   /****************************************************************************
      The output should be similar to :
 
      With the whole expression, a matched substring "simple simple" is found
      at position 7 to 19.
      With the sub-expression, a matched substring "simple" is found
      at position 7 to 12.
   ****************************************************************************/
}

Related Information