Skip to main content

The road to better programming: Chapter 4

Functional programming

Teodor Zlatanov (tzz@iglou.com), Programmer, Gold Software Systems
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Summary:  This series of articles on developerWorks comprises a complete guide to better programming in Perl. In this fourth installment, Teodor introduces functional programming and several essential Perl idioms important for Perl programmers looking for speed and elegance in their code, such as the map() and grep() functions, and the Schwartzian and Guttman-Rosler transforms.

Date:  01 Jan 2002
Level:  Introductory
Activity:  1741 views

Functional programming (FP) can be an approach, a solution, or a religion. I prefer it to be anything but a religion, because it should be just one more tool in a programmer's arsenal.

From the "comp.lang.functional FAQ"

Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands. The expressions are formed by using functions to combine basic values. A functional language is a language that supports and encourages programming in a functional style.

It's just as good if we fake it

As I often mention, avoid using a tool just because it's there and you can use it. Hammers work on nails; screwdrivers work on screws. Know all your tools and use the right one; your life will be much easier for it.

A simplistic view of functional programming is that it deals with applying functions to values, whereas procedural (traditional) programming deals with using functions for their side effects. The references cited in the comp.lang.functional FAQ (see the Resources later in this article) are the best starting point for understanding the methodology and intent of functional programming.

In Perl, a procedural language at heart, functional programming is only possible as an approach. The actual solution will commonly use the map() or grep() functions to simulate a functional solution. The functional programming approach is valuable for three reasons. First, FP presents the programmer with a new view of the problem, and possibly a better solution will come about because of that. Second, the Schwartzian and Guttman-Rosler transforms, and many other Perl idioms, are hard to use or understand without the aid of map() and grep(). Third, implementing some algorithms without map() and grep() can slow them down significantly because function invocation in Perl is fairly expensive.


The map() function

The map() function is like a rubber stamp applied to all the elements of a list (see "perldoc -f map" for more information). It consists of two parts: a block or an expression, and a list:


Listing 1. How map() works

# map {BLOCK GOES HERE} LIST;
map {$_++} @p;                          # increment every element of @p 
# map EXPRESSION, LIST;
map -f,@p;                              # file test every element of @p

The foreach() loop will usually do better than map() in benchmarks, so don't use map() for CPU-intensive calculations unless you test its performance. Do use it when it makes your code more elegant and simple without a loss of efficiency.

There are many neat things that map() can do. First of all, it can modify the array elements as it goes through them by changing the $_ variable. Inside the block or (less commonly) the expression, $_ is the current element of the list. You don't know which element you are looking at -- the whole point is to map one single function to all the elements independently. It is my experience that in about 80% of loops over an array you don't need to know the offset of the current element inside the array. Thus, map() can improve coding efficiency and style by forcing the programmer to think independently of array offsets.


Listing 2. foreach() vs. map()

# "normal" (procedural)  way
foreach (sort keys %ENV)
{
 print "$_ => $ENV{$_}\n";
}
# FP way
map { print "$_ => $ENV{$_}\n" } sort keys %ENV;

In Listing 2, you can see how the FP way is not fundamentally different, yet manages to convey a flow of functions from right to left. First the list of keys is obtained, then it is sorted, then the print() function is applied to each element of the sorted key list.


Listing 3. Modifying a list on the fly with map()

# these are the users
@users = ('joe', 'ted', 'larry');
# and this is an on-the-fly substitution of user names with hash references
map { $_ = { $_ => length } } @users;
# @users is now ( { 'joe' => 3 },
#                 { 'ted' => 3 },
#                 { 'larry' => 5 } )

Listing 3 demonstrates how the list passed to map() can be completely rewritten. In this case, the array @users contained only user names, but after map() was applied, the array contained hash references with one key-value pair, username => byte length of user name. How about quickly filling in file information?


Listing 4. Modifying a list on the fly with map(), part 2

use File::stat;
use Data::Dumper;
@files = ('/etc/passwd', '/etc/group', '/etc/fstab', '/etc/vfstab');
print Dumper 
     map { $sb = stat $_; 
           $_ = (defined $sb) ? 
                { 
                  name => $_,  
                  size =>$sb->size(), 
                  mtime => $sb->mtime() 
                } : 
                undef
         } @files;

In Listing 4 we create a list of files, and then in one statement create a list of hashes with entries for the name, size, and mtime (modification time) of each file. Furthermore, non-existent files get an "undef" reference instead of a hash reference. Finally, Data::Dumper is used to print out a nice view of the entire product list.

Needless to say, code like Listing 4 should be heavily documented for other people's sake. Don't try to cram it all into one line, either. Elegant code is just an ugly duckling without proper formatting and comments.


The grep() function

For anyone who has used UNIX, the grep() function is simple to learn and use. It acts just like the grep utility -- elements that satisfy a test pass through, while everything else gets dropped.

The syntax of grep() is just like map(). A block or an expression can be passed, and $_ is aliased to the current element under examination. It is not a good idea to modify elements of the list passed to grep(). That's what map() is for. Use grep() to grep, and map() to map. The only exception to this rule is if you must create temporary hash fields or array entries while sorting, but make sure you remove them afterwards.


Listing 5. How grep() works

# grep {BLOCK GOES HERE} LIST;
grep {$_ > 1} @p;                       # only accept numbers more than 1
grep {$_++} @p;                         # please don't do this
# grep EXPRESSION, LIST;
grep /hi/,@p;                           # only accept matching elements
grep !/hi/,@p;                          # do not accept matching elements

It can be very convenient to use grep() for quick filtering, but remember that a foreach() loop may be faster under some circumstances. When in doubt, benchmark.


Listing 6. Using grep() to filter out odd numbers

my @list = (1, 2, 3, 'hi');
my @results;
# the procedural approach
foreach (@list)
{
 push @results, $_
  unless $_%2;
}
# using grep - FP approach
@results = grep { !($_ % 2) } @list;

Here is another example. Say we need to look in a directory and retrieve all the file names from it:


Listing 7. Using grep() to get all the filenames in a directory

opendir(DIR, ".") || die "can't opendir: $!"; # get the directory handle
my @f = grep { /^[^\.]/ && -f } readdir(DIR); # filter only files into @f

Line 1 of Listing 7 just opens the current directory, or exits the program with the appropriate notice.

Line 2 invokes the readdir() function, which returns a list of filenames, and runs a grep() that filters out hidden files (filenames must not begin with a "." character) and non-file objects such as directories.

In two lines we do as much work as four or five lines of a foreach() loop might do. Don't forget to comment such tight code; the short comments shown are not sufficient for production code. Sometimes, grep() is used in scalar context (for instance, to test if Perl interpreters are running [the Proc::ProcessTable module is from CPAN]):


Listing 8. Using grep() to get all the Perl processes running

use Proc::ProcessTable;                 # get this module from CPAN
use strict;
my $table = new Proc::ProcessTable;
my @procs;
if (@procs = grep { defined $_->cmndline && 
                            $_->cmndline =~ /^perl/ } 
                @{$table->table})
{
 print $_->pid, "\n" foreach @procs;
}
else
{
 print "No Perl interpreters seem to be running.\n";
}

Here, we simultaneously assign the return from grep() to the @procs array, and we test to see if it contained any elements at all. If there were no elements matching the pattern, we print a message to that effect. The same code with a foreach() loop would probably take five or six lines. In case it hasn't been said enough, code like Listing 8 should be commented well enough that someone else could look at it and immediately know the intent and effect of that code. It's no use writing production code if you are the only one who will ever be able to read it.


Sorting with map: the Schwartzian and Guttman-Rosler transforms

The sort() function in Perl is "sort of" procedural, in that it takes a code block or a function name and uses it to sort all elements. The comparison function has to be written as if it were only looking at two elements -- it doesn't know which ones specifically out of the whole list. Like map() and grep(), sort() deals with references to the values being compared, so modifying them would modify the values being compared. Don't do this (for more information on the sort() function, see "perldoc -f sort").

Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:


Listing 9. The default sort()

@new_list = sort @old_list;             # sort @old_list alphabetically

The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.

Fortunately, this is easy to do and is in fact a common Perl idiom:


Listing 10. The numeric sort()

@old_list = (1, 2, 5, 200, '010');
@new_list = sort { $a <=> $b } @old_list; # sort @old_list numerically

I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.

If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.

To reverse the sorted list, you can either apply the reverse() function to the list after it's sorted, or you can change the sorting function. For example, the reverse sort of the one in Listing 10 would have the comparison code block be { $b <=> $a }.

See "perldoc perlop" for more information on the <=> and cmp operators, which are essential to all sorting in Perl. The cmp operators are what's used in the default search in Listing 9 behind the scenes.

Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:


Listing 11. The sort by a hash member

# create a list with two hash references
@old_list = ( { name => "joe" }, { name => "moe" } );
# sort @old_list by the value of the 'name' key
@new_list = sort { $a->{name} cmp $b->{name} } @old_list;

Now we get into the interesting stuff. What if it's expensive to obtain data from the objects being sorted? Say we need to apply the split() function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run a split() every time the comparison value is needed, and your co-workers would laugh at you. You could build a temporary list of the comparison values, but that's not so easy to do and can easily introduce bugs. You are probably better off using the Schwartzian transform, named after Randal Schwartz.

The Schwartzian transform looks like this:


Listing 12. The Schwartzian transform

# create a list with some strings
@old_list = ( '5 eagles', '10 hawks', '2 bulls', '8 cows');
# sort @old_list by the first word in each string, numerically
@new_list = map($_->[1], 
                sort( { $a->[0] <=> $b->[0] }
                       map( [ (split)[0], $_ ], 
                            @old_list)));

Look at it from right to left. First, we apply a map() to the @old_list list to create a new temporary list. Remember that map() can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from a split() of the string (this is the comparison value) and the string itself, for each string in @old_list. The result is a new list; no changes are made to @old_list.

Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.

Then, we perform another map() on the sorted list to reduce it back to just strings by mapping only the second array element into the current variable. $_->[1] means "rewrite $_ to be the value stored in the second object in the list that $_ refers to."

Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.

The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.


When to use FP

Once again, I will say this: know your tools. Functional programming is an excellent tool, as we have seen so far. It can simplify some pretty hairy problems and make others a little easier. So when should you use FP?

  • First of all, remember that in Perl, FP is only an approach. The actual solution will be procedural, even though it simulates a functional solution. The question is not when FP should be used, but how much it should be used, from "not at all" to "as much as possible."
  • Any time you need to do complex sorting, see if the Schwartzian transform or the Guttman-Rosler transform are appropriate. They are drop-in functional replacements for regular sorting.
  • If your functions are chained often, consider FP. For example, modification of a list in steps by several functions can probably be accomplished with an FP approach.
  • If you have a lot of temporary variables that are thrown away as soon as they are used, consider FP to decrease their number.
  • Filtering, sorting, and general transforming functions applied to lists or hashes are candidates for FP.
  • If your functions have a lot of side effects, and their parameters are more than a few, FP is probably not going to work too well.
  • Recursive algorithms can go either way with regard to FP. They are not clearly better or worse when done with the FP approach.
  • Avoid FP if performance is very important. Use the Benchmark module to check your approach -- sometimes FP will speed things up considerably (for example, the Schwartzian transform is significantly faster because of its cache of comparison values), but sometimes it will cause the performance of the code to drop significantly.
  • One-liners work well with FP.
  • Obfuscated Perl code has always favored grep() and map() as ways to obscure the actions of code. Unless you are writing obfuscated Perl code for a contest, don't use grep() and map() without at least some commenting.
  • Learn, practice, and use FP in your daily programming work. You will gain insight into all of your other code, see new ways ahead, and make life easier. Don't use FP just because it's there, but do use it because it works well for your specific problem.

Exercises

  1. Write a code snippet that uses map() to transform a list of user names into user IDs.

  2. Write a program that uses (1) to look up user IDs. Allow filtering of the user list by partial names.

  3. Write code to check if any processes owned by root are running (on a UNIX system).

  4. Benchmark the Schwartzian transform versus your own sorting code versus the Guttman-Rosler transform on a small data set. Use the Benchmark module to do this.

  5. Do (4) on a large data set. For instance, sort all the file names on your system by size. Look at the File::Find module for ideas.

  6. Read about Erlang, Scheme, and Haskell in the comp.lang.functional FAQ. Look at other FP languages, and see if any of them have neat ideas that you can use in Perl.

  7. Write a Perl version of the grep program. Did you think of the grep() function right away? You shouldn't use grep() in this case because you may have to process a large file, and there's no sense in keeping the contents of the whole file in memory just to run grep() on them. Think of a better solution.


Resources

About the author

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11184
ArticleTitle=The road to better programming: Chapter 4
publish-date=01012002
author1-email=tzz@iglou.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers