Skip to main content

Tip: Reading text streams in chunks with head and tail

Get to know your textutils

Jacek Artymiak (jacek@artymiak.com), Freelance author and consultant
Jacek Artymiak works as a freelance consultant, developer, and writer. Since 1991 he's been developing software for many commercial and free variants of UNIX and BSD operating systems (AIX, HP-UX, IRIX, Solaris, Linux, FreeBSD, NetBSD, OpenBSD, and others), as well as MS-DOS, Microsoft Windows, Mac OS, and Mac OS X. Jacek specializes in business and financial application development, Web design, network security, computer graphics, animation, and multimedia. He's a prolific writer on technology subjects and the coauthor of "Install, Configure, and Customize Slackware Linux" (Prima Tech, 2000) and "StarOffice for Linux Bible" (IDG Books, 2000). Many of Jacek's software projects can be found at SourceForge. You can learn more about him at his personal Web site and contact him at jacek@artymiak.com.

Summary:  In this tip, Jacek introduces the head and tail commands, which can be useful for processing chunks of data from both static and dynamic files.

Date:  01 Nov 2002
Level:  Introductory
Activity:  359 views
Comments:  

Suppose you want to process only a part of some file, say a few lines from the start or end of it. What can you do? Use either head (which sends the first 10 lines to the standard output) or tail (which sends the last 10 lines to the standard output).

You can change the number of lines that these commands send to their standard output with the -n option (of course, your results will vary depending on the contents of your XF86Config file):


Listing 1. Sending a selected number of lines of XF86Config to standard output
		
		$ head -n 4 /etc/X11/XF86Config

# File generated by anaconda.

# **********************************************************************

# Refer to the XF86Config(4/5) man page for details about the format of

# this file.


$ tail -n 4 /etc/X11/XF86Config

Modes       "1600x1200"

ViewPort    0 0

EndSubsection

EndSection

What if you'd rather tell head or tail to use bytes instead of lines? You can use the -c option instead of -n. So, to display the first 200 characters, use head -c 200 file , or use tail -c 200 file to display the last 200 characters. If you follow that number with b (for blocks), it will be multiplied by 512. Similarly, k (for kilobytes) multiplies the given number by 1024, and m (for megabytes) multiplies the given number by 1048576 bytes.

Remember that there is an important difference between head file1 file2 file3 and cat file1 file2 file3 | head. The former will print the specified number of lines from each file separating them with the header beginning with ==> followed by the name of the file. The latter will print the specified number of lines from the input stream made up of the files listed after the cat command, but treated as one single file. It is possible to switch the file name headers off with the -q (for quiet) option. The opposite of -q is -v (for verbose).

If the files you process keep on growing during processing (for example, when you are telling head or tail to read data from a file that is still being written to by another command), using the -f option tells tail to keep on reading data from the specified file and feeding it to its own standard output. It is ignored if data is sent through a pipe. Therefore, cat file | tail -f will not work as expected, but tail -f file will.

(If more than one file is being read by tail, lines will be separated with the standard header, beginning with ==>, to indicate where they are coming from.)

This option is perfect for monitoring system logs, for example, tail -f /var/log/access.log executed in a separate terminal window (or on a separate console) will keep on printing new Apache access log entries as they are being added after every hit until you stop it with Ctrl-C.

By combining head and tail, you can read a chunk of data of the given length from the middle of the file! Here is how it is done: suppose that you want to read a chunk of 789 bytes starting from byte 1000 counted from the beginning of that file. You can solve that problem with cat file | head -c 1788 | tail -c 789.

Reversing files with tac

What if you wanted to reverse the order of lines in a file? That is the job of the tac command. (Note that tac is cat spelled backwards.) It reverses the order of the lines or fields in a list of files.

It does not reverse the order of files -- this you must do yourself by listing them in reverse order after the tac command. For an example of how tac works, compare results of ls -l | tail and ls -l | tail | tac on some files in your home directory.

Questions or comments? I'd love to hear from you -- send mail to jacek@artymiak.com.

Next time, we'll take a look at the sort and tsort commands. See you then!


Resources

About the author

Jacek Artymiak works as a freelance consultant, developer, and writer. Since 1991 he's been developing software for many commercial and free variants of UNIX and BSD operating systems (AIX, HP-UX, IRIX, Solaris, Linux, FreeBSD, NetBSD, OpenBSD, and others), as well as MS-DOS, Microsoft Windows, Mac OS, and Mac OS X. Jacek specializes in business and financial application development, Web design, network security, computer graphics, animation, and multimedia. He's a prolific writer on technology subjects and the coauthor of "Install, Configure, and Customize Slackware Linux" (Prima Tech, 2000) and "StarOffice for Linux Bible" (IDG Books, 2000). Many of Jacek's software projects can be found at SourceForge. You can learn more about him at his personal Web site and contact him at jacek@artymiak.com.

Comments



Trademarks

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11266
ArticleTitle=Tip: Reading text streams in chunks with head and tail
publish-date=11012002
author1-email=jacek@artymiak.com
author1-email-cc=