Functional programming (FP) can be an approach, a solution, or a religion. I prefer it to be anything but a religion, because it should be just one more tool in a programmer's arsenal.
It's just as good if we fake it
As I often mention, avoid using a tool just because it's there and you can use it. Hammers work on nails; screwdrivers work on screws. Know all your tools and use the right one; your life will be much easier for it.
A simplistic view of functional programming is that it deals with applying functions to values, whereas procedural (traditional) programming deals with using functions for their side effects. The references cited in the comp.lang.functional FAQ (see the Resources later in this article) are the best starting point for understanding the methodology and intent of functional programming.
In Perl, a procedural language at heart, functional programming is only possible as an approach. The actual solution will commonly use the map() or grep() functions to simulate a functional solution. The functional programming approach is valuable for three reasons. First, FP presents the programmer with a new view of the problem, and possibly a better solution will come about because of that. Second, the Schwartzian and Guttman-Rosler transforms, and many other Perl idioms, are hard to use or understand without the aid of map() and grep(). Third, implementing some algorithms without map() and grep() can slow them down significantly because function invocation in Perl is fairly expensive.
The map() function is like a rubber stamp applied to all the elements of a list (see "perldoc -f map" for more information). It consists of two parts: a block or an expression, and a list:
Listing 1. How map() works
# map {BLOCK GOES HERE} LIST;
map {$_++} @p; # increment every element of @p
# map EXPRESSION, LIST;
map -f,@p; # file test every element of @p
|
The foreach() loop will usually do better than map() in benchmarks, so don't use map() for CPU-intensive calculations unless you test its performance. Do use it when it makes your code more elegant and simple without a loss of efficiency.
There are many neat things that map() can do. First of all, it can modify the array elements as it goes through them by changing the $_ variable. Inside the block or (less commonly) the expression, $_ is the current element of the list. You don't know which element you are looking at -- the whole point is to map one single function to all the elements independently. It is my experience that in about 80% of loops over an array you don't need to know the offset of the current element inside the array. Thus, map() can improve coding efficiency and style by forcing the programmer to think independently of array offsets.
Listing 2. foreach() vs. map()
# "normal" (procedural) way
foreach (sort keys %ENV)
{
print "$_ => $ENV{$_}\n";
}
# FP way
map { print "$_ => $ENV{$_}\n" } sort keys %ENV;
|
In Listing 2, you can see how the FP way is not fundamentally different, yet manages to convey a flow of functions from right to left. First the list of keys is obtained, then it is sorted, then the print() function is applied to each element of the sorted key list.
Listing 3. Modifying a list on the fly with map()
# these are the users
@users = ('joe', 'ted', 'larry');
# and this is an on-the-fly substitution of user names with hash references
map { $_ = { $_ => length } } @users;
# @users is now ( { 'joe' => 3 },
# { 'ted' => 3 },
# { 'larry' => 5 } )
|
Listing 3 demonstrates how the list passed to map() can be completely rewritten. In this case, the array @users contained only user names, but after map() was applied, the array contained hash references with one key-value pair, username => byte length of user name. How about quickly filling in file information?
Listing 4. Modifying a list on the fly with map(), part 2
use File::stat;
use Data::Dumper;
@files = ('/etc/passwd', '/etc/group', '/etc/fstab', '/etc/vfstab');
print Dumper
map { $sb = stat $_;
$_ = (defined $sb) ?
{
name => $_,
size =>$sb->size(),
mtime => $sb->mtime()
} :
undef
} @files;
|
In Listing 4 we create a list of files, and then in one statement create a list of hashes with entries for the name, size, and mtime (modification time) of each file. Furthermore, non-existent files get an "undef" reference instead of a hash reference. Finally, Data::Dumper is used to print out a nice view of the entire product list.
Needless to say, code like Listing 4 should be heavily documented for other people's sake. Don't try to cram it all into one line, either. Elegant code is just an ugly duckling without proper formatting and comments.
For anyone who has used UNIX, the grep() function is simple to learn and use. It acts just like the grep utility -- elements that satisfy a test pass through, while everything else gets dropped.
The syntax of grep() is just like map(). A block or an expression can be passed, and $_ is aliased to the current element under examination. It is not a good idea to modify elements of the list passed to grep(). That's what map() is for. Use grep() to grep, and map() to map. The only exception to this rule is if you must create temporary hash fields or array entries while sorting, but make sure you remove them afterwards.
Listing 5. How grep() works
# grep {BLOCK GOES HERE} LIST;
grep {$_ > 1} @p; # only accept numbers more than 1
grep {$_++} @p; # please don't do this
# grep EXPRESSION, LIST;
grep /hi/,@p; # only accept matching elements
grep !/hi/,@p; # do not accept matching elements
|
It can be very convenient to use grep() for quick filtering, but remember that a foreach() loop may be faster under some circumstances. When in doubt, benchmark.
Listing 6. Using grep() to filter out odd numbers
my @list = (1, 2, 3, 'hi');
my @results;
# the procedural approach
foreach (@list)
{
push @results, $_
unless $_%2;
}
# using grep - FP approach
@results = grep { !($_ % 2) } @list;
|
Here is another example. Say we need to look in a directory and retrieve all the file names from it:
Listing 7. Using grep() to get all the filenames in a directory
opendir(DIR, ".") || die "can't opendir: $!"; # get the directory handle
my @f = grep { /^[^\.]/ && -f } readdir(DIR); # filter only files into @f
|
Line 1 of Listing 7 just opens the current directory, or exits the program with the appropriate notice.
Line 2 invokes the readdir() function, which returns a list of filenames, and runs a grep() that filters out hidden files (filenames must not begin with a "." character) and non-file objects such as directories.
In two lines we do as much work as four or five lines of a foreach() loop might do. Don't forget to comment such tight code; the short comments shown are not sufficient for production code. Sometimes, grep() is used in scalar context (for instance, to test if Perl interpreters are running [the Proc::ProcessTable module is from CPAN]):
Listing 8. Using grep() to get all the Perl processes running
use Proc::ProcessTable; # get this module from CPAN
use strict;
my $table = new Proc::ProcessTable;
my @procs;
if (@procs = grep { defined $_->cmndline &&
$_->cmndline =~ /^perl/ }
@{$table->table})
{
print $_->pid, "\n" foreach @procs;
}
else
{
print "No Perl interpreters seem to be running.\n";
}
|
Here, we simultaneously assign the return from grep() to the @procs array, and we test to see if it contained any elements at all. If there were no elements matching the pattern, we print a message to that effect. The same code with a foreach() loop would probably take five or six lines. In case it hasn't been said enough, code like Listing 8 should be commented well enough that someone else could look at it and immediately know the intent and effect of that code. It's no use writing production code if you are the only one who will ever be able to read it.
Sorting with map: the Schwartzian and Guttman-Rosler transforms
The sort() function in Perl is "sort of" procedural, in that it takes a code block or a function name and uses it to sort all elements. The comparison function has to be written as if it were only looking at two elements -- it doesn't know which ones specifically out of the whole list. Like map() and grep(), sort() deals with references to the values being compared, so modifying them would modify the values being compared. Don't do this (for more information on the sort() function, see "perldoc -f sort").
Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:
Listing 9. The default sort()
@new_list = sort @old_list; # sort @old_list alphabetically |
The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.
Fortunately, this is easy to do and is in fact a common Perl idiom:
Listing 10. The numeric sort()
@old_list = (1, 2, 5, 200, '010');
@new_list = sort { $a <=> $b } @old_list; # sort @old_list numerically
|
I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.
If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.
To reverse the sorted list, you can either apply the reverse() function to the list after it's sorted, or you can change the sorting function. For example, the reverse sort of the one in Listing 10 would have the comparison code block be { $b <=> $a }.
See "perldoc perlop" for more information on the <=>
and cmp operators, which are essential to all sorting in Perl. The cmp operators are what's used in the default search in Listing 9 behind the scenes.
Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:
Listing 11. The sort by a hash member
# create a list with two hash references
@old_list = ( { name => "joe" }, { name => "moe" } );
# sort @old_list by the value of the 'name' key
@new_list = sort { $a->{name} cmp $b->{name} } @old_list;
|
Now we get into the interesting stuff. What if it's expensive to obtain data from the objects being sorted? Say we need to apply the split() function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run a split() every time the comparison value is needed, and your co-workers would laugh at you. You could build a temporary list of the comparison values, but that's not so easy to do and can easily introduce bugs. You are probably better off using the Schwartzian transform, named after Randal Schwartz.
The Schwartzian transform looks like this:
Listing 12. The Schwartzian transform
# create a list with some strings
@old_list = ( '5 eagles', '10 hawks', '2 bulls', '8 cows');
# sort @old_list by the first word in each string, numerically
@new_list = map($_->[1],
sort( { $a->[0] <=> $b->[0] }
map( [ (split)[0], $_ ],
@old_list)));
|
Look at it from right to left. First, we apply a map() to the @old_list list to create a new temporary list. Remember that map() can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from a split() of the string (this is the comparison value) and the string itself, for each string in @old_list. The result is a new list; no changes are made to @old_list.
Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.
Then, we perform another map() on the sorted list to reduce it back to just strings by mapping only the second array element into the current variable. $_->[1] means "rewrite $_ to be the value stored in the second object in the list that $_ refers to."
Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.
The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.
Once again, I will say this: know your tools. Functional programming is an excellent tool, as we have seen so far. It can simplify some pretty hairy problems and make others a little easier. So when should you use FP?
- First of all, remember that in Perl, FP is only an approach. The actual solution will be procedural, even though it simulates a functional solution. The question is not when FP should be used, but how much it should be used, from "not at all" to "as much as possible."
- Any time you need to do complex sorting, see if the Schwartzian transform or the Guttman-Rosler transform are appropriate. They are drop-in functional replacements for regular sorting.
- If your functions are chained often, consider FP. For example, modification of a list in steps by several functions can probably be accomplished with an FP approach.
- If you have a lot of temporary variables that are thrown away as soon as they are used, consider FP to decrease their number.
- Filtering, sorting, and general transforming functions applied to lists or hashes are candidates for FP.
- If your functions have a lot of side effects, and their parameters are more than a few, FP is probably not going to work too well.
- Recursive algorithms can go either way with regard to FP. They are not clearly better or worse when done with the FP approach.
- Avoid FP if performance is very important. Use the Benchmark module to check your approach -- sometimes FP will speed things up considerably (for example, the Schwartzian transform is significantly faster because of its cache of comparison values), but sometimes it will cause the performance of the code to drop significantly.
- One-liners work well with FP.
- Obfuscated Perl code has always favored
grep()andmap()as ways to obscure the actions of code. Unless you are writing obfuscated Perl code for a contest, don't usegrep()andmap()without at least some commenting. - Learn, practice, and use FP in your daily programming work. You will gain insight into all of your other code, see new ways ahead, and make life easier. Don't use FP just because it's there, but do use it because it works well for your specific problem.
Write a code snippet that uses
map()to transform a list of user names into user IDs.Write a program that uses (1) to look up user IDs. Allow filtering of the user list by partial names.
Write code to check if any processes owned by root are running (on a UNIX system).
Benchmark the Schwartzian transform versus your own sorting code versus the Guttman-Rosler transform on a small data set. Use the Benchmark module to do this.
Do (4) on a large data set. For instance, sort all the file names on your system by size. Look at the File::Find module for ideas.
Read about Erlang, Scheme, and Haskell in the comp.lang.functional FAQ. Look at other FP languages, and see if any of them have neat ideas that you can use in Perl.
Write a Perl version of the grep program. Did you think of the
grep()function right away? You shouldn't usegrep()in this case because you may have to process a large file, and there's no sense in keeping the contents of the whole file in memory just to rungrep()on them. Think of a better solution.
- Read the previous chapters of The road to better programming.
- The comp.lang.functional FAQ offers a wide array of information on all aspects of functional programming languages, including general topics, such as history and motivation, journals and conferences, and schools and workshops; technical topics, such as performance and applications; and other resources, such as Web pages, research groups, and newsgroups. This is the best starting point for anyone interested in learning more about functional programming.
- Go to the Schwartzian transform for "Far More Than Everything You've Ever Wanted to Know About Sorting."
- Read A Fresh Look at Efficient Perl Sorting by
Uri Guttman and Larry Rosler.
- Read Teodor's other Perl articles in the developerWorks "Cultured Perl" series:
- A programmer's Linux-oriented setup
- Application configuration with Perl
- Automating UNIX system administration with Perl
- Debugging Perl with ease
- The elegance of JAPH
- Genetic algorithms applied with Perl
- One-liners 101
- Parsing with Perl modules
- Perl 5.6 for C and Java programmers
- Reading and writing Excel files with Perl
- Review of Programming Perl, Third Edition
- Small observations about the big picture
- Writing Perl programs that speak English
- Browse more Linux resources on developerWorks.
- Browse more Open source resources on developerWorks.
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management.
Comments (Undergoing maintenance)





