Using awk on the command line
This tutorial started with a basic explanation of regular expressions and subsequently introduced grep and sed. grep is a powerful search utility, while sed is an even more powerful search and replace utility. awk takes the next step, using regular expressions in a full-fledged command-line programming language. Just like sed, when awk is used on the command line, it takes line-based input. awk interprets one line of input at a time, but unlike sed, it processes each piece of input on the line as variables that can be used as input and output for inline code.
It should be noted that AWK (capitalized) is a full-fledged programming language that you can use to write scripts (as opposed to just being used on the command line), but this tutorial focuses on awk, which is the command-line utility that interprets AWK commands on the fly.
By the way, if anybody is reading this and trying to think of real-world uses for everything you have learned, I just used grep to search through some old code for good awk examples:
grep awk */*.pl |
Most system administrators or programmers find uses for these tools on a daily basis. Here are a few lines of my output:
Edaemon/m_checkcurrentdisk.pl:$freespace = `awk '(NR==1)
{print \$4 / 1024 / 1024}' grep.tmp`;
Edaemon/m_getdatetime.pl:$month = `awk '(NR==1) {print \$2}' datetime.txt`;
Odaemon/odaemon.beowulf.dvd.pl:$filesize = `awk '(NR==1) {print \$1}' temp.txt`;
|
These are good examples because they show a very basic use of awk. For your first try, make it even simpler. For your tests with awk, create the following files in an empty directory (the content of each file is irrelevant, and they can be empty):
Screenshot_1.jpg Screenshot_2.jpg Screenshot_3.jpg awk.txt regular.txt sed.txt |
Using the output of ls as input into awk
By default, awk reads each line in an input file and separates the content into variables determined by blank spaces. In a very simple example, you could use the output of ls as input into awk and print the results. This example uses ls with the pipe character (|) to send the output into awk:
ls | awk ' { print $1 } ' |
awk subsequently prints the first item on each line, which in this case is the only item on each line:
Screenshot_1.jpg Screenshot_2.jpg Screenshot_3.jpg awk.txt regular.txt sed.txt |
Using ls -l to generate multicolumn input for awk
That was really basic. For the next example, use ls -l to generate multicolumn input for awk:
ls -l |
Implementations of ls vary a bit from system to system, but this is some example output:
total 432 -rw-rw-rw- 1 guest guest 169074 Oct 15 14:51 Screenshot_1.jpg -rw-rw-rw- 1 guest guest 23956 Oct 15 20:56 Screenshot_2.jpg -rw-rw-rw- 1 guest guest 12066 Oct 15 20:57 Screenshot_3.jpg -rw-r--r-- 1 tuser tuser 227 Oct 15 20:16 awk.txt -rw-r--r-- 1 tuser tuser 233 Oct 15 19:35 regular.txt -rw-r--r-- 1 tuser tuser 227 Oct 15 23:16 sed.txt |
Note that the file owner is the third item on each line and the file name is the ninth item on each line (items are separated by spaces in awk by default). You can use awk to pull out the file owner and file name from this list by printing the third and ninth variable on each line. Here's how:
ls -l | awk ' { print $3 " " $9 } ' |
You'll notice the print command in awk has two quotes and an empty space in it. This is simply to print a space between the file owner and the file name in your output:
guest Screenshot_1.jpg guest Screenshot_2.jpg guest Screenshot_3.jpg tuser awk.txt tuser regular.txt tuser sed.txt |
You can put text in quotes between variables in an awk print statement.
Using regular expressions to specify lines
Now you've learned the basics of how to use awk, but isn't this tutorial about regular expressions? In awk, regular expressions are used heavily. The most common example is to precede an awk command with a regular expression that specifies the lines you want to operate on. As with sed, regular expressions in awk are encapsulated in forward slashes. For instance, if you only want to operate on files owned by tuser, you could use the following command:
ls -l | awk ' /tuser/ { print $3 " " $9 } ' |
The command produces this output:
tuser awk.txt tuser regular.txt tuser sed.txt |
In another example, you might want to change the file extension of each of your text files without touching your image files. To do this, you'll want to separate your input variables with a period instead of a space, and then use a regular expression to indicate you only want to search for text files. To split variables based on a period, use the -F flag followed by the character you want to use in quotes. Try this example with the awk output piped to a shell (which will execute the awk generated commands):
s | awk -F"." ' /txt/ { print "mv " $1 "." $2 " " $1 ".doc" } ' | bash |
A subsequent ls -l will show the new file names:
-rw-rw-rw- 1 guest guest 169074 Oct 15 14:51 Screenshot_1.jpg -rw-rw-rw- 1 guest guest 23956 Oct 15 20:56 Screenshot_2.jpg -rw-rw-rw- 1 guest guest 12066 Oct 15 20:57 Screenshot_3.jpg -rw-r--r-- 1 tuser tuser 227 Oct 15 20:16 awk.doc -rw-r--r-- 1 tuser tuser 233 Oct 15 19:35 regular.doc -rw-r--r-- 1 tuser tuser 227 Oct 15 23:16 sed.doc |
Remember, these are the basics to get started with awk, but AWK is a full-fledged programming language capable of much more than the material presented in this tutorial. Take a look at the awk man page. If you want to learn even more, it would be wise to invest in a good book.




