Cultured Perl: One-liners 101

Perl as a command-line utility

Those who use Perl as a programming language frequently forget that it is just as useful as a quick and dirty scripting engine for command-line operations. From the command line Perl can accomplish, in just a single line, tasks that require pages of code in most other languages. Join Teodor as he takes you through some useful examples.

Share:

Teodor Zlatanov (tzz@iglou.com), Programmer, Northern Light

Teodor Zlatanov Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, Unix system administration, CORBA, and project management. Suggestions and corrections are welcome over e-mail. Contact Teodor at tzz@iglou.com.



01 April 2001

Also available in Japanese

In order to complete this how-to, you'll need to have Perl 5.6.0 installed on your system. Preferably, your system should be a recent (2000 or later) Linux or Unix installation, but other operating systems may work as well. The examples all use the tcsh shell (though bash and others will work too). Although these examples may work with earlier versions of Perl, Linux, and other operating systems, if they fail, their failure to function should be considered an exercise for the reader to solve.

The first point I'd like to make is that quick and dirty solutions shouldn't be shunned by the experienced programmer. In other columns I have emphasized documentation and thoroughness. This column will concentrate on the dark side of programming, where documentation is optional and caffeine isn't. We've all been there.

The second point, just as important as the first, is that quick and dirty solutions are hard to do right. If you know how to document, test, and debug a complete script, you have a much greater chance of succeeding at one-liners. If you don't, this will be like trying to cut down a redwood tree with a herring (your skills being the herring).

For the first step, you should learn your shell's peculiarities, the way that Unix passes command-line arguments to Perl and Perl's interpretation of those arguments.

The essentials of the command line

In Unix you'll see the concept of an executable task, a process that is usually a program loaded into memory. Processes are started by other processes, except for the init process, which is usually started by the kernel, and sometimes kernel processes. As far as the user is concerned, starting a process requires a shell or a launcher. So when the user types "xeyes" at the command line of a shell, or selects the X Eyes application from a launcher menu (like the GNOME taskbar), the shell or the launcher creates a new process to run that program.

Processes get command-line arguments. So "perl" and "perl -w" for instance are two different invocations of the same program. Internally, Perl (similar to C) passes arguments to the script it interprets in the @ARGV array. But unlike C, Perl steals some of those arguments from the script for its own purposes. For instance, the script being interpreted does not see the "-w" parameter to the Perl interpreter, unless the script appears to want it. The shell separates arguments on space characters.

The "-e" argument to Perl tells it to take whatever follows the "-e" on the command line and run it as a script. The "-M" argument says to take whatever follows and import it as a module, like a "use ModuleName" in a regular script. See the perldoc perlrun page for further information on the switches that Perl has to offer from the command line.

Perhaps some examples would be best at this point. In the spirit of this column, let's use one-liners. The -MData::Dumper -e'print Dumper -@ARGV' part of the script simply prints out the contents of the @ARGV array.

Listing 1. Command-line arguments
# at the command line, type each line after the '>'
# and you'll get the output that 
# follows it

# print the @ARGV contents with no program arguments
> perl -MData::Dumper -e'print Dumper \@ARGV'
$VAR1 = [];

# print the @ARGV contents with arguments "a" and "b"
> perl -MData::Dumper -e'print Dumper \@ARGV' a b 
$VAR1 = [
          'a',
          'b'
        ];

# print the @ARGV contents with warnings on, and arguments "a" and "b"
> perl -w -MData::Dumper -e'print Dumper \@ARGV' a b 
$VAR1 = [
          'a',
          'b'
        ];

# print the @ARGV contents with arguments "a", "b", and "-w"

# note how the -w is not stolen by Perl if it follows arguments
# that Perl knows it doesn't want

> perl -MData::Dumper -e'print Dumper \@ARGV' a b -w
$VAR1 = [
          'a',
          'b',
          '-w'
        ];
Here is the final line that includes some <angle brackets>

You can pass as many arguments as you want to Perl, unless your shell limits their number or length. Opening the magical filehandle <> in Perl, opens every argument passed to Perl as a filename and reads in the contents of each file line by line. The $_ variable will hold each line, by default.

Shells make everything between quotes a single argument. That's why in Listing 1 we could say -e'print Dumper \@ARGV' and Perl saw it as a single one-liner script. Single quotes are better, because then you can use double quotes inside the one-liner. Double quotes in Perl serve to interpret everything between them. Perhaps another example will help to illustrate further:

Listing 2. Single vs. double quotes
# print the Perl process ID, followed by a newline
> perl -e'print "$$\n"'
2063

# error: the first two double quotes go together, the rest is passed
# to the script directly

> perl -e"print "$$\n""
Bareword found where operator expected at -e line 1, near "1895n"
        (Missing operator before n?)
syntax error at -e line 1, next token ???
Execution of -e aborted due to compilation errors.

Things are a little better in bash than tcsh, because bash allows the inside double quotes to be escaped with a \ character. But the shell still interprets $$ inside double quotes before it passes to Perl. The bottom line is, don't use double quotes to specify your -e one-line script argument. See perldoc perlrun for more details, but basically you should find out what works on your system and stick with that.

So far you have seen the -e and -M switches in action: import a module, and run a statement. Below I've listed a few other useful switches; the more complex ones have been omitted in the interest of sanity. See perldoc perlrun for the complete list and some usage ideas.

Cleanliness
SwitchPurpose
-wturn warnings on
-Mstrictturn the strict pragma on
Data
SwitchPurpose
-0 (that's a zero) specify the input record separator
-asplit data into an array named @F
-F specify pattern for -a to use when splitting (see perldoc -f split)
-i edit files in place (see perldoc perlrun for the many details)
-n run through all the @ARGV arguments as files, one by one, using <>
-p same as -n, but will also print the contents of $_
Execution control
SwitchPurpose
-especify a string to execute as a script (multiples add up)
-Mimport a module
-Ispecify directories to search for modules before standard places

File operations

Say you have a few files in a directory that need to be renamed in a specific way. For instance, all files that contain the word "aaa" should be renamed to have the word "bbb" instead. We will not use the Unix "mv" command, because Perl's rename() function renames file well enough (see perldoc -f rename for details on when rename() won't do a good job).

See Listing 3 for a one-line script that renames files from aaa to bbb.

The find . command prints out the list of all files and directories in the current directory and under. Give find the "-type f" parameter if you want only the files. Take the output of find, a list of files, and pass it to the one-liner.

The one-line script uses the -ne parameters, which means that it could be rewritten as:

Listing 4. Renaming files from aaa to bbb, decomposed
while (<>) 
{
 chomp;                                 # trim the newline from the filename
 next unless -e;                        # the filename ($_) must exist
 $oldname = $_;                         # $oldname is now $_
 s/aaa/bbb/;                            # change all "aaa" to "bbb" in $_
 next if -e;                            # the new filename mustn't exist
 rename $oldname, $_;                   # rename the old to the new name
}

As you can see, this is a fairly complex seven-line script. The -n switch simplified things. But still, you must know the $_ variable and the s/// and -e operators (see the perldoc perlop page for details). The File::Find standard Perl module could have been used to do the file find instead of the Unix find command, but then the script would have probably been too large to be a one-liner.

One-liners are a delicate balance between usefulness and obfuscation, and you have to be prepared to rewrite them as real scripts if necessary instead of keeping baby Frankenstein programs around.

Here's another example of file processing: look through a directory of MP3 files with a known naming structure and extract the album name. Let's assume that the name of the file is "Artist-Album-Track#-Song.mp3".

Listing 5. Finding album names for Artist-Album-Track#-Song.mp3
> find . -name "*.mp3" | perl -pe 's/.\/\w+-(\w+)-.*/$1/' | sort | uniq

This script is very simple. It relies on find's behavior to always print a "./" before each filename. It then substitutes $_ with only the album name, and the -p switch automatically prints the album names. Finally, sort and uniq in sequence ensure that repeated album names will be printed only once. All the find, sort and uniq invocations could have been done with Perl, but why bother when the operating system already has those written for us? It is interesting as an exercise, but in practice the one-liner would become 20-30 lines of unnecessary code.

Let's decompose the Perl script (in a simplified fashion, omitting some of the complexities of the -p switch):

Listing 6. Finding album names for Artist-Album-Track#-Song.mp3, decomposed
while (<>) 
{
 s/.\/\w+-(\w+)-.*/$1/;                 # extract the album name into $_
} continue
{
 print;                                 # print the album name
}

Again, note how Perl was an intermediate tool between find, sort and uniq. Don't try to write everything in Perl. You can, and sometimes you should, but one-liners are about reuse. Also, see how simple the regular expression is. Sure, we may get a few aberrant album names if the MP3 files are not named right, but is it worth the effort to perfect that regular expression? If you need to do that much work, you probably should be using a CPAN MP3 ID3 tag module instead of parsing filenames. Know when one-liners are becoming a nuisance, rather than a tool. This is what I meant earlier when I said that you should know Perl well before starting on the one-liners. Using all your tools in your programming approach will make you a good Perl programmer, and a good programmer altogether.


Data operations

The concepts above carry over to data manipulation as well. You should also remember the -i switch because it lets you edit a file in place, something that few tools can do at all. Here's how you would edit a file's contents, replacing every "aaa" with a "bbb":

Listing 7. Editing a file's contents to replace "aaa" with "bbb"
> cat test
aaa
bbb
ccc
ddd
aaa
> perl -pi -e's/aaa/bbb/' test
> cat test
bbb
bbb
ccc
ddd
bbb

We could use any regular expression instead of "aaa", of course.

Note that we use the -p switch to print $_ for every line. That's necessary because the output of the Perl script is what goes inside the file! This means we can do some interesting tricks. For instance:

Listing 8. Inserting line numbers in a file
> perl -pi -e'$_ = sprintf "%04d %s", $., $_' test

This script inserts 4-digit line numbers before every line in the file. If you get a headache looking at the syntax, focus on the nearest person and ask them if they know the joke about the two camels in the zoo. They'll hit you over the head with something heavy, distracting you from your headache for a while, after which you can get back to work.

Now for something more involved. We'll use Uri Guttman's excellent File::ReadBackwards module to look through a log file backwards for interesting events. (You have to install File::ReadBackwards from CPAN.) We'll search for the string "sshd" to see all notices from the sshd daemon.

Listing 9. Looking through a file backwards for sshd messages
> perl -MFile::ReadBackwards -e'foreach my $name (@ARGV) \
   { $f = File::ReadBackwards->new($name) || next;       \
     while( $_ = $f->readline ) {print $_ if m/sshd/}}'  \
  /var/log/messages

The \ characters at the end of each line tell the shell that more is coming; the line is not over yet. This 3-line script is about as large as a one-liner could be before you have to rewrite it as a real script. The same effect could be achieved with less code by saving all the lines in the file and then printing them backwards, but that's a lot less efficient than File:ReadBackwards, which actually will read the file backwards and stop at newlines. This efficiency is something you could not achieve easily from the command line.

But why stop here? Let's extract all IP addresses mentioned in the sshd log messages.

Listing 10. Looking through a file backwards for IPs in sshd messages
> perl -MFile::ReadBackwards -e'foreach my $name (@ARGV) \
   { $f = File::ReadBackwards->new($name) || next;       \
     while( $_ = $f->readline ) \
     {print "$1\n" if m/sshd/ && m/connection from\D*([\d.]+)/ }}' \
  /var/log/messages

This is getting ugly! We should be moving this into a real script just about now.

Note how the regular expression above captures only digits and dots, after "connection from" and a string of non-digits. This is not perfect, but it works just fine in the real world with IPv4 addresses. You should understand what's needed from your one-liner, and do exactly that. Don't over-engineer a throwaway script. You'll be sorry. On the other hand, know when a script will not be thrown away, and write your code accordingly!


A real-world example

My wife was renaming a bunch of pictures in Windows from our vacation. The filenames came out with names like "Our Christmas Tree.jpg" which is well and good. When I attempted to run indexpage.pl, a Perl script that creates HTML pages for image collections, the script failed. The author had not been cautious about filenames, and quotes and spaces caused problems.

Instead of fixing indexpage.pl myself (a fine exercise, but one I had no time for at 2 AM), I used a one-liner. See Listing 11 for a one-line script that renames JPG files.

It was tricky, because I could not use a single quote inside the script. I finally used the ASCII value for a single quote, 39, to put a single quote in the $quote variable, and use it indirectly in the substitution.

This printed out a series of "mv" commands that I could examine to make sure I was doing the right thing. Finally, I saved the commands to a file and used the shell "source" command to run every command in that file. Listing 12 shows the JPG renaming in action.

After the renaming, the indexpage.pl script ran fine.


Conclusion

I hope you have by now figured out that one-liners are not easy to get right. Perfect your scripting skills before you jump to the one-liners, or you may have a lot of trouble getting them right. Make sure you know your regular expressions, flow control, and default variable operations.

Balance power with legibility. One-liners should be thrown away, like prototypes. Otherwise you will see them again, like a nice poodle wandering off and coming back as Cujo.

A few exceptions to the general usage of one-liners are acceptable. They are throwaways, not pyramids.

You should never run a one-liner outright; always print out what it will do before you actually run the command. You'll save yourself a lot of white hairs.

Use your one-liner skills sparingly. It's better to err on the side of caution when dealing with such wild beasts.

Finally, have a lot of fun. One-liners are the best way to make Perl do the dirty work for you. Look on the Usenet newsgroups and mailing lists dedicated to Perl for ideas and critiques.

Resources

  • Read Ted's other Perl articles in the "Cultured Perl" series on developerWorks.
  • CPAN is the Comprehensive Perl Archive Network. It aims to contain all the Perl material you will need. As of January 2001 it contains 749 megabytes of information, and is mirrored at more than one hundred sites around the world.
  • Visit Perl.com for Perl information and related resources. This site contains all things of interest to the Perl community.
  • Programming Perl, 3rd Edition (O'Reilly & Associates 2000), by Larry Wall, Tom Christiansen, and Jon Orwant, is the best guide to Perl today, up-to-date with 5.005 and 5.6.0 now.
  • Unix Power Tools, 2nd Edition, Jerry Peek, Tim O'Reilly & Mike Loukides (O'Reilly & Associates 1997) is a great guide to getting started with Unix shells and related tools. A little dated, but still excellent.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11110
ArticleTitle=Cultured Perl: One-liners 101
publish-date=04012001