Perl-based approaches to application configuration range from XML or YAML to the Getopt module to Perl code in the configuration file. The downside of these approaches is, of course, that the Perl programmer will pick one of these and disregard the others.
Instead, programmers should strive to balance the complexity of the code with the needs of the users. Generally speaking, the simplest, most natural configurations are most difficult to support because natural language is simply too organic. Many users, especially Unix users, have been conditioned to believe that certain configuration file formats are "natural" and easy to learn but in fact those formats, while certainly well-established and terrific in their own right, are arcane to a newcomer. Compare, for example, the Unix /etc/passwd file format:
tzz:x:500:29:Ted Zlatanov,,,:/home/tzz:/bin/tcsh
with the fetchmail configuration file format:
# Configuration created Mon Mar 13 14:42:23 2000 by fetchmailconf
set postmaster "postmaster"
set bouncemail
set properties ""
set daemon 30
poll mail.server.net with proto POP3
user tzz there with password PASSWORD is tzz here options warnings 3600
A point often missed by beginner programmers is that configuration support is hard; you should spend time on it accordingly. In addition, thinking about the configuration of your program often forces you to codify what it can do and how the users can influence its actions. This is a useful line of thought, so I strongly encourage programmers to start writing their programs by locking down the configuration file format. (Of course, the next thing to do after codifying the program's configuration is to write the documentation -- yes, that's right, even before starting on the code. But this is a topic for another day.)
AppConfig -- a module that is an excellent piece of code, full of practical useful approaches to the common problem of configuring an application -- is a nice balance between power and simplicity, complete with the added benefit of a unified command line and configuration file parser.
I discussed AppConfig in a previous two-part series (see Resources), so we won't cover the basics of the module in this article. You should be comfortable with Perl and multi-level configurations which means that this article will probably only benefit new Perl programmers if they are willing to invest time in understanding the basic topics discussed (the links in Resources will help).
About AppConfig as compared with other modules. I am not dedicated to any particular Perl modules, including AppConfig. I use the module that fits the job. Thus, if AppConfig is not the best tool for a particular job, I won't attempt to drive a nail with a screwdriver. You shouldn't, either.
Observations on configurations
The essential skill necessary for writing configurations is to simplify and abstract the users' needs until they can be expressed in only a few narrow ways. For example, "delete this file" is the purpose of the rm command in Unix. The common options Unix users know are -r and -f; one removes a directory recursively (on an empty directory, it's actually faster than typing rmdir which is the proper command for directory removal) and the other option forces removal to never ask if the user's sure that's what he wants to do.
The options the users have can be expressed as "remove this, I mean it" and "remove everything starting from here." Note there is no option to make rm do something unusual such as to create a link. Of course, the most important option is the list of files to be erased; that list is assumed to be everything on the command line that doesn't start with a - (more or less; the actual rules are a little more complicated). A common prank is to make a file called -rf and another called * so when the poor user tries to remove them he ends up removing everything (because * will be expanded by the shell before rm ever sees it).
These nuances of the rm command show how even a simple, well-designed program with options that make sense can have subtleties that can be dangerous to the user. Yes, it's the shell's fault that * gets expanded. Yes, the -i switch is necessary for new users. The point is that options introduce complexity; the program (and the programmer) must be prepared to handle that complexity.
Thus, I present a simple rule, Zlatanov's Law of Configurations: You can have only two of these three -- simple code, simple configurations, or simple operation. This article will show you how to balance all three, but it can't change the underlying complexity of the user experience.
What are complex layered configurations?
Complex layered configurations are configurations that:
- Need extra processing.
- Have configurations nested more than one layer deep (up to one layer can be handled with the AppConfig hashes).
- Meet both conditions.
Notice that we are pushing AppConfig past the its bleeding edge when you do this. There are many reasons to use AppConfig (not the least of which is the simplicity of its configuration files and command-line options), but if you find yourself nesting configurations eight layers deep on a Saturday afternoon, it's time to look into XML, YAML, or another portable data format that was designed for this purpose.
The methods we're describing here are intended as an intermediate step before you move to a powerful data format such as XML. (XML has nothing to do with command-line switches, so you may have to do those differently, maybe with Getopt, which will add to the complexity of your program.)
What configurations need extra processing? Usually, configurations that are not entirely known before the program starts. For instance, the -f switch commonly used to tell a program to read a configuration file complicates the configuration because the file may override or supplement options given on the command line.
Layered configurations are those that have items with a complex structure. For instance, a program may need to be told the list of users that can use it and what privileges each one should be given (the sudo program is an example of such a program -- sudo has to keep track of users and groups, on what machines they are allowed to use sudo, and what privileges they can reach). So the configuration entries are user-or-group.machine(s).privilege(s) or three levels with multiple entries possible at each sub-level (sudo also supports options like NOPASSWD which are "sideways" appendages to the main configuration entry so they don't add layers).
The most important capability of AppConfig is that it unifies command-line options and configuration file options into one data structure. This can create complications, though.
Out of the proverbial CPAN box, AppConfig can do extra processing but you have to decide the order in which it will happen. To avoid surprises, start with command-line options and then read files specified by the command-line switches. This means that options in the configuration file will override the ones from the command line. Doing it the other way (just use the file switch from the command line, read the file, then process the other command-line switches) is also possible. We'll demonstrate both approaches.
For layered configurations, AppConfig needs to have custom functions installed to interpret hashes deeper than one level. You must beware of extending this too far; more than three to four levels become simply unmanageable (but then again, if you have more than three levels of layers in your configuration then AppConfig is almost certainly not the right tool). We'll offer an example of this approach too.
Command-line options, then files
To parse the command-line options first, you just call the args() method.
Listing 1. Calling args() to start with command-line options then files
#!/usr/bin/perl -w
use strict;
use AppConfig qw/:argcount/;
my $config = AppConfig->new();
$config->define(
CONFIG_FILE =>
{ ARGCOUNT => ARGCOUNT_ONE, ALIAS => 'F' },
DEBUG =>
{ ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'H' },
);
$config->args();
print "Debug is ", ($config->DEBUG())? 'on':'off', " before reading the file\n";
$config->file($config->CONFIG_FILE())
if $config->CONFIG_FILE();
print "Debug is ", ($config->DEBUG())? 'on':'off', " after reading the file\n";
|
The args() method consumes the contents of @ARGV that it believes are switches. For example:
- If
@ARGVcontains-aand thenhello, thenargs()may or may not consumehello. - If
-ais a boolean switch, thenhellowill not be consumed. - If
-ais a switch with arguments (such asARGCOUNT_LIST,ARGCOUNT_HASH,ARGCOUNT_ONE), thenhellowill be consumed. - If
-ais a hash switch, then justhellowill create a key ofhellowith anundefvalue.
Once the command-line options have told us that a particular file is to be read, I just call the file() method. If the file can't be found or read, AppConfig will print a warning by default but the program will go on. This is usually the right thing to do for the users, but if you want to be more strict about wrong options you can enable the PEDANTIC option explained in the manual.
Files before command-line options
Processing the configuration file before the command-line switches is tricky. You may think that you can just save @ARGV, restore it, and call args() again after reading the configuration file. This will create double entries for any lists given at the command line.
Let's say the option was -call to call someone (it's a list) and at the command line -call joe was specified. If you call args() twice on that, you'll get two calls to Joe. If you eliminate duplicates in the list, you may be breaking the intended use of making two calls to Joe!
In my opinion the easiest solution is to parse @ARGV manually as shown in the following listing.
Listing 2. Parsing @ARGV manually
#!/usr/bin/perl -w
use strict;
use AppConfig qw/:argcount/;
my $config = AppConfig->new();
$config->define(
CONFIG_FILE =>
{ ARGCOUNT => ARGCOUNT_ONE, ALIAS => 'F' },
DEBUG =>
{ ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'H' },
);
my $file_switch_found = 0;
foreach my $arg (@ARGV)
{
# note that this should come FIRST
# or it will pick up $file_switch_found from the case below
if ($file_switch_found)
{
$config->CONFIG_FILE($arg);
$file_switch_found = 0;
}
# and this should come SECOND
if (lc $arg eq '-f' || lc $arg eq '-config_file')
{
$file_switch_found = 1;
}
}
print "Debug is ", ($config->DEBUG())? 'on':'off', " before reading the file\n";
$config->file($config->CONFIG_FILE())
if $config->CONFIG_FILE();
print "Debug is ", ($config->DEBUG())? 'on':'off', " after reading the file\n";
$config->args();
print "Debug is ", ($config->DEBUG())? 'on':'off', " after processing the arguments\n";
|
Besides the fact that it's harder to maintain, this approach may also be more confusing to the users. It does work, though. Multiple -f switches will produce just one, the last one, as the file to be read.
A tricky point is that in the @ARGV loop, if you check for $file_switch_found second after the check for the -f switch, it will be on already during the processing of the -f switch itself. (Notice that this loop avoids knowledge of where it is in the @ARGV array -- this is an essential step to clean coding in Perl.)
As soon as you start keeping track of array offsets, your code will get much more complicated so any time you can avoid iterating through an array by index you should.
Interpreting multi-level hashes
Normally, if you say -hashoption a=b=c to AppConfig, it will assign the key "a" the value "b=c" immediately. I will show you how to create a deeply nested hash with arbitrary nested levels (although, again, I would caution you against going too deep with this approach instead of using something more appropriate like XML or YAML).
In addition, let's modify the state of AppConfig directly through the $state->{VARIABLE} hash. This may not work with versions of AppConfig after 1.56, depending on the AppConfig author's desire, but it's unlikely that the author would make such a change.
Listing 3. Interpreting multi-level hashes
#!/usr/bin/perl -w
use strict;
use AppConfig qw/:argcount/;
use Data::Dumper;
my $config = AppConfig->new();
$config->define(
NESTED =>
{
ARGCOUNT => ARGCOUNT_HASH,
ALIAS => 'N',
ACTION => \&nesting_action
},
);
$config->args();
print "Nested option is ", Dumper($config->NESTED());
sub nesting_action
{
my $state = shift @_;
my $vname = shift @_;
my $value = shift @_;
# AppConfig can handle this
return 1 unless ($value =~ m/=.*=/);
my @matches = ($value =~ m/(("[^"]+"|[^=]+))=?/g)
if $value;
my @m;
my $m = scalar @matches;
foreach (0..($m/2-1))
{
my $lost = shift @matches;
my $found = shift @matches;
$found =~ s/^"(.*)"$/$1/;
push @m, $found;
}
@matches = @m;
# we will always match this
my $firstkey = shift @matches;
# nothing to do if we can't get a first key
return unless defined $firstkey;
# now, put in the parsed value
$state->{VARIABLE}->{$vname}->{$firstkey} = {};
my $hash = $state->{VARIABLE}->{$vname}->{$firstkey};
while (scalar @matches > 2)
{
my $key = shift @matches;
$hash->{$key} = {};
$hash = $hash->{$key};
}
# note we could have zero @matches, so check first
if (scalar @matches > 1)
{
# we use pop and not shift because of the order of evaluation
$hash->{pop @matches} = pop @matches;
}
}
|
Of course, all the interesting code is in the nesting_action() function.
First, get all the matches with a regular expression that look pretty complex. Just for fun, allow the "=" character inside values as well. In those cases, the user has to surround the value with quotation marks and then can't use quotation marks inside the value (they can be used by themselves in a value, though). If you're wondering why the parsing is fairly simplistic, I didn't think code more complex than that would be fun or educational.
If there is no more than one "=" character in the user-given value, let AppConfig deal with it. Doing otherwise would complicate the code for no gain.
In Perl 6, the regular expression would be much simpler and the processing would be easier because this version distinguishes between grouping for matching and grouping for data extraction. Here, get two of each value and loop to extract only one value. Also, remove the surrounding quotation marks if present. The final result is in the @matches array.
When this action is called, the value is already stored in the $state->{VARIABLE} hash. I overwrote it with the first key, then I looped through the intermediate keys to insert more sub-levels (note that we leave the last sub-level for last).
Finally, when there are just two items left in the @matches array, insert them as a key-value pair in the current sub-level.
I used pop() instead of shift() at the end because of the order of
evaluation. I could have also called shift() twice in two separate lines and stored the results in temporary variables, but I thought this was a cleaner if slightly more advanced and less maintainable solution.
AppConfig shines as a way of configuring applications in Perl in the simple cases, but occasionally you need more power in command-line processing and configuration-file parsing. Instead of using data formats such as XML or YAML, you can use AppConfig with some extra work to achieve more advanced goals. This is especially true if you need to process complex command-line switches to create multi-level hashes.
Resist the temptation to make your configuration files simple Perl that can be evaluated. This is a dangerous and fragile solution -- a user error or insecure permissions can compromise your software. It's always better to do the extra work and validate your input than to run unknown and untrustworthy code.
If you find AppConfig useful, you should also look at the AppConfig::Std module on CPAN; it extends AppConfig with some common and useful options.
- Find
AppConfig 1.56, the Perl5 module for reading configuration files and parsing command-line arguments, in the CPAN archives. - The
XML::Simplemodule, which provides a simple API layer on top of an underlying XML parsing module, is also on CPAN. - CPAN.org offers a wonderfully complete module archive.
- Try the fetchmail Home Page for more on fetchmail, the full-featured, robust, well-documented remote-mail retrieval and forwarding utility intended to be used over on-demand TCP/IP links.
- YAML is a machine-parsable data-serialization format designed for human readability and interaction with scripting languages; it is optimized for data serialization, configuration settings, log files, and Internet messaging and filtering.
- "Application configuration with Perl" (developerWorks, October 2000) and "Application configuration with Perl, Part 2" (developerWorks, July 2002) both provide an overview to the
AppConfigmodule. - "Writing programs that speak English" (developerWorks, July 2002) is a good introductory article on using the
Parse::RecDescentmodule to create a user interface grammar in plain English. - Read all of Ted's Perl articles in the Cultured Perl series on developerWorks.
- Find more resources for Linux developers in the developerWorks Linux zone.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.
- Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Innovate your next Linux development project with IBM trial software, available for download directly from developerWorks.

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database architectures, and Unix system administration. Contact Ted with suggestions and corrections at tzz@bu.edu.