Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

UNIX tips and tricks for a new user, Part 3: Introducing filters and regular expressions

Using grep, sed, and awk

Tim McIntire, Consultant, Freelance Writer
Photo of Tim McIntire
Tim McIntire works as a consultant and co-founder of Cluster Corporation, a market leader in HPCC software, support, and consulting. He also contributes periodically to IBM developerWorks and Apple Developer Connection. Tim's research, conducted while leading the computer science effort at Scripps Institution of Oceanography's Digital Image Analysis Lab, has been published in a variety of journals, including Concurrency and Computation and IEEE Transactions on Geoscience and Remote Sensing. You can visit TimMcIntire.net to learn more.

Summary:  Discover the power of UNIX® filters. In this tutorial, you'll learn about the grep family in depth, including the syntax of regular expressions in many UNIX utilities. You'll also find out more about the stream editor, sed, as well as examine the awk pattern scanning language through examples and explanations.

View more content in this series

Date:  12 May 2006
Level:  Intermediate PDF:  A4 and Letter (70 KB | 20 pages)Get Adobe® Reader®

Activity:  63740 views
Comments:  

Using awk on the command line

This tutorial started with a basic explanation of regular expressions and subsequently introduced grep and sed. grep is a powerful search utility, while sed is an even more powerful search and replace utility. awk takes the next step, using regular expressions in a full-fledged command-line programming language. Just like sed, when awk is used on the command line, it takes line-based input. awk interprets one line of input at a time, but unlike sed, it processes each piece of input on the line as variables that can be used as input and output for inline code.

It should be noted that AWK (capitalized) is a full-fledged programming language that you can use to write scripts (as opposed to just being used on the command line), but this tutorial focuses on awk, which is the command-line utility that interprets AWK commands on the fly.

By the way, if anybody is reading this and trying to think of real-world uses for everything you have learned, I just used grep to search through some old code for good awk examples:

grep awk */*.pl

Most system administrators or programmers find uses for these tools on a daily basis. Here are a few lines of my output:

Edaemon/m_checkcurrentdisk.pl:$freespace = `awk '(NR==1) 
   {print \$4 / 1024 / 1024}' grep.tmp`;
Edaemon/m_getdatetime.pl:$month = `awk '(NR==1) {print \$2}' datetime.txt`;
Odaemon/odaemon.beowulf.dvd.pl:$filesize = `awk '(NR==1) {print \$1}' temp.txt`;

These are good examples because they show a very basic use of awk. For your first try, make it even simpler. For your tests with awk, create the following files in an empty directory (the content of each file is irrelevant, and they can be empty):

Screenshot_1.jpg
Screenshot_2.jpg
Screenshot_3.jpg
awk.txt
regular.txt
sed.txt

Using the output of ls as input into awk

By default, awk reads each line in an input file and separates the content into variables determined by blank spaces. In a very simple example, you could use the output of ls as input into awk and print the results. This example uses ls with the pipe character (|) to send the output into awk:

ls | awk ' { print $1 } '

awk subsequently prints the first item on each line, which in this case is the only item on each line:

Screenshot_1.jpg
Screenshot_2.jpg
Screenshot_3.jpg
awk.txt
regular.txt
sed.txt

Using ls -l to generate multicolumn input for awk

That was really basic. For the next example, use ls -l to generate multicolumn input for awk:

ls -l

Implementations of ls vary a bit from system to system, but this is some example output:

total 432
-rw-rw-rw-   1 guest  guest  169074 Oct 15 14:51 Screenshot_1.jpg
-rw-rw-rw-   1 guest  guest   23956 Oct 15 20:56 Screenshot_2.jpg
-rw-rw-rw-   1 guest  guest   12066 Oct 15 20:57 Screenshot_3.jpg
-rw-r--r--   1 tuser  tuser     227 Oct 15 20:16 awk.txt
-rw-r--r--   1 tuser  tuser     233 Oct 15 19:35 regular.txt
-rw-r--r--   1 tuser  tuser     227 Oct 15 23:16 sed.txt

Note that the file owner is the third item on each line and the file name is the ninth item on each line (items are separated by spaces in awk by default). You can use awk to pull out the file owner and file name from this list by printing the third and ninth variable on each line. Here's how:

ls -l | awk ' { print $3 " " $9 } '

You'll notice the print command in awk has two quotes and an empty space in it. This is simply to print a space between the file owner and the file name in your output:

guest Screenshot_1.jpg
guest Screenshot_2.jpg
guest Screenshot_3.jpg
tuser awk.txt
tuser regular.txt
tuser sed.txt

You can put text in quotes between variables in an awk print statement.

Using regular expressions to specify lines

Now you've learned the basics of how to use awk, but isn't this tutorial about regular expressions? In awk, regular expressions are used heavily. The most common example is to precede an awk command with a regular expression that specifies the lines you want to operate on. As with sed, regular expressions in awk are encapsulated in forward slashes. For instance, if you only want to operate on files owned by tuser, you could use the following command:

ls -l | awk ' /tuser/ { print $3 " " $9 } '

The command produces this output:

tuser awk.txt
tuser regular.txt
tuser sed.txt

Changing file extensions

In another example, you might want to change the file extension of each of your text files without touching your image files. To do this, you'll want to separate your input variables with a period instead of a space, and then use a regular expression to indicate you only want to search for text files. To split variables based on a period, use the -F flag followed by the character you want to use in quotes. Try this example with the awk output piped to a shell (which will execute the awk generated commands):

s | awk -F"." ' /txt/ { print "mv " $1 "." $2 " " $1 ".doc" } ' | bash

A subsequent ls -l will show the new file names:

-rw-rw-rw-   1 guest  guest  169074 Oct 15 14:51 Screenshot_1.jpg
-rw-rw-rw-   1 guest  guest   23956 Oct 15 20:56 Screenshot_2.jpg
-rw-rw-rw-   1 guest  guest   12066 Oct 15 20:57 Screenshot_3.jpg
-rw-r--r--   1 tuser  tuser     227 Oct 15 20:16 awk.doc
-rw-r--r--   1 tuser  tuser     233 Oct 15 19:35 regular.doc
-rw-r--r--   1 tuser  tuser     227 Oct 15 23:16 sed.doc

Remember, these are the basics to get started with awk, but AWK is a full-fledged programming language capable of much more than the material presented in this tutorial. Take a look at the awk man page. If you want to learn even more, it would be wise to invest in a good book.

5 of 8 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX, Linux
ArticleID=181829
TutorialTitle=UNIX tips and tricks for a new user, Part 3: Introducing filters and regular expressions
publish-date=05122006
author1-email=tm@timmcintire.net
author1-email-cc=