A regular expression is a string of characters designed to search for or replace another string of characters. At first glance, this appears to be a pretty basic function. Most users are familiar with simple search and replace functions in just about every graphical text editor or word processing application. If this basic search and replace functionality is compared to a calculator, regular expressions can be compared to a full-fledged computer. The power of using regular expressions for search criteria should not be underestimated.
Regular expressions are used by some of the most powerful UNIX-based command-line tools, including
awk (and some programming languages, including Perl). Learning how to use regular expressions is a required step in moving from a basic user of the UNIX command line to a true power user. There are a few different versions of regular expression syntax and multiple versions of
awk, so this tutorial focuses on the most common constructs that are fairly standard across each implementation. Don't forget to reference the
man pages your system to get specifics on syntax and command-line options.
Before exploring UNIX applications that use regular expressions, it is important to learn the basics. In this section, simply read along. Later you'll try some examples in
A regular expression uses strings of normal characters combined with special characters that indicate the criteria for the search. In the most basic case, no special characters are used at all. For instance, if you simply want to use the term
golf as your search criteria, you type the following:
This is a regular expression! It searches for all instances of the word
golf. Regular expressions are case sensitive, so this search finds all instances of
golf, but it will not find instances of
To search for both
Golf, you can use brackets (which are special characters in regular expressions) and list a string of individual characters to search for. This is akin to a search within a search (which is the magic behind regular expressions).
The same concept works for any list of characters -- it's not just used for case sensitivity. For instance, you might want to search for
gelf, a new sport that you made up:
Now imagine you have a third sport,
gilf, that you also want to check for. One method, using what you have learned so far, is to use
i in your search criteria. But as your search grows, you might want to find everything that starts with
g and ends with
lf with one character in between. To do this, use another special character, a period (.).
This finds all strings that begin with
g and end with
lf with one character in between. To expand your search to all strings that begin with
g and end with
f with two characters in between, you can use two periods: