Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Cultured Perl: Complex layered configurations with AppConfig

Employing extra effort to make AppConfig process complex command-line switches

Teodor Zlatanov (tzz@bu.edu), Programmer, Gold Software Systems
Author photo
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database architectures, and Unix system administration. Contact Ted with suggestions and corrections at tzz@bu.edu.

Summary:  AppConfig shines as a way of configuring applications in Perl in the simple cases, but occasionally you need more power in command-line processing and configuration-file parsing. Instead of using data formats such as XML or YAML, you can apply a little extra effort and alter AppConfig so it can process complex command-line switches to create multi-level hashes.

Date:  31 Mar 2005
Level:  Intermediate

Comments:  

Perl-based approaches to application configuration range from XML or YAML to the Getopt module to Perl code in the configuration file. The downside of these approaches is, of course, that the Perl programmer will pick one of these and disregard the others.

Instead, programmers should strive to balance the complexity of the code with the needs of the users. Generally speaking, the simplest, most natural configurations are most difficult to support because natural language is simply too organic. Many users, especially Unix users, have been conditioned to believe that certain configuration file formats are "natural" and easy to learn but in fact those formats, while certainly well-established and terrific in their own right, are arcane to a newcomer. Compare, for example, the Unix /etc/passwd file format:

tzz:x:500:29:Ted Zlatanov,,,:/home/tzz:/bin/tcsh

with the fetchmail configuration file format:

# Configuration created Mon Mar 13 14:42:23 2000 by fetchmailconf
set postmaster "postmaster"
set bouncemail
set properties ""
set daemon 30
poll mail.server.net with proto POP3
       user tzz there with password PASSWORD is tzz here options warnings 3600

A point often missed by beginner programmers is that configuration support is hard; you should spend time on it accordingly. In addition, thinking about the configuration of your program often forces you to codify what it can do and how the users can influence its actions. This is a useful line of thought, so I strongly encourage programmers to start writing their programs by locking down the configuration file format. (Of course, the next thing to do after codifying the program's configuration is to write the documentation -- yes, that's right, even before starting on the code. But this is a topic for another day.)

AppConfig -- a module that is an excellent piece of code, full of practical useful approaches to the common problem of configuring an application -- is a nice balance between power and simplicity, complete with the added benefit of a unified command line and configuration file parser.

I discussed AppConfig in a previous two-part series (see Resources), so we won't cover the basics of the module in this article. You should be comfortable with Perl and multi-level configurations which means that this article will probably only benefit new Perl programmers if they are willing to invest time in understanding the basic topics discussed (the links in Resources will help).

About AppConfig as compared with other modules. I am not dedicated to any particular Perl modules, including AppConfig. I use the module that fits the job. Thus, if AppConfig is not the best tool for a particular job, I won't attempt to drive a nail with a screwdriver. You shouldn't, either.

Observations on configurations

The essential skill necessary for writing configurations is to simplify and abstract the users' needs until they can be expressed in only a few narrow ways. For example, "delete this file" is the purpose of the rm command in Unix. The common options Unix users know are -r and -f; one removes a directory recursively (on an empty directory, it's actually faster than typing rmdir which is the proper command for directory removal) and the other option forces removal to never ask if the user's sure that's what he wants to do.

A note about rm

It may surprise some novice Unix users to know that rm's default behavior is actually to never ask questions. The reason why they get a "yes/no" question is that whoever set up their Unix account turned on the -i switch by default. With -i, rm acts like a doting grandmother and conditions users to be a little lax about typing rm. There are many opinions on whether -i should be on by default -- mine is that it should be on for novice users but they should be told how to turn it off.

Some shells also catch the rm * command as a separate caution, but that's irrelevant to the -i switch issue.

The options the users have can be expressed as "remove this, I mean it" and "remove everything starting from here." Note there is no option to make rm do something unusual such as to create a link. Of course, the most important option is the list of files to be erased; that list is assumed to be everything on the command line that doesn't start with a - (more or less; the actual rules are a little more complicated). A common prank is to make a file called -rf and another called * so when the poor user tries to remove them he ends up removing everything (because * will be expanded by the shell before rm ever sees it).

These nuances of the rm command show how even a simple, well-designed program with options that make sense can have subtleties that can be dangerous to the user. Yes, it's the shell's fault that * gets expanded. Yes, the -i switch is necessary for new users. The point is that options introduce complexity; the program (and the programmer) must be prepared to handle that complexity.

Thus, I present a simple rule, Zlatanov's Law of Configurations: You can have only two of these three -- simple code, simple configurations, or simple operation. This article will show you how to balance all three, but it can't change the underlying complexity of the user experience.


What are complex layered configurations?

Complex layered configurations are configurations that:

  • Need extra processing.
  • Have configurations nested more than one layer deep (up to one layer can be handled with the AppConfig hashes).
  • Meet both conditions.

Notice that we are pushing AppConfig past the its bleeding edge when you do this. There are many reasons to use AppConfig (not the least of which is the simplicity of its configuration files and command-line options), but if you find yourself nesting configurations eight layers deep on a Saturday afternoon, it's time to look into XML, YAML, or another portable data format that was designed for this purpose.

The methods we're describing here are intended as an intermediate step before you move to a powerful data format such as XML. (XML has nothing to do with command-line switches, so you may have to do those differently, maybe with Getopt, which will add to the complexity of your program.)

What configurations need extra processing? Usually, configurations that are not entirely known before the program starts. For instance, the -f switch commonly used to tell a program to read a configuration file complicates the configuration because the file may override or supplement options given on the command line.

Layered configurations are those that have items with a complex structure. For instance, a program may need to be told the list of users that can use it and what privileges each one should be given (the sudo program is an example of such a program -- sudo has to keep track of users and groups, on what machines they are allowed to use sudo, and what privileges they can reach). So the configuration entries are user-or-group.machine(s).privilege(s) or three levels with multiple entries possible at each sub-level (sudo also supports options like NOPASSWD which are "sideways" appendages to the main configuration entry so they don't add layers).


AppConfig capabilities

The most important capability of AppConfig is that it unifies command-line options and configuration file options into one data structure. This can create complications, though.

Out of the proverbial CPAN box, AppConfig can do extra processing but you have to decide the order in which it will happen. To avoid surprises, start with command-line options and then read files specified by the command-line switches. This means that options in the configuration file will override the ones from the command line. Doing it the other way (just use the file switch from the command line, read the file, then process the other command-line switches) is also possible. We'll demonstrate both approaches.

For layered configurations, AppConfig needs to have custom functions installed to interpret hashes deeper than one level. You must beware of extending this too far; more than three to four levels become simply unmanageable (but then again, if you have more than three levels of layers in your configuration then AppConfig is almost certainly not the right tool). We'll offer an example of this approach too.

Command-line options, then files

To parse the command-line options first, you just call the args() method.


Listing 1. Calling args() to start with command-line options then files

#!/usr/bin/perl -w

use strict;
use AppConfig qw/:argcount/;

my $config = AppConfig->new();

$config->define(
            CONFIG_FILE       =>
            { ARGCOUNT => ARGCOUNT_ONE, ALIAS => 'F' },

            DEBUG        =>
            { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'H' },

             );

$config->args();

print "Debug is ", ($config->DEBUG())? 'on':'off', " before reading the file\n";

$config->file($config->CONFIG_FILE())
 if $config->CONFIG_FILE();

print "Debug is ", ($config->DEBUG())? 'on':'off', " after reading the file\n";

The args() method consumes the contents of @ARGV that it believes are switches. For example:

  • If @ARGV contains -a and then hello, then args() may or may not consume hello.
  • If -a is a boolean switch, then hello will not be consumed.
  • If -a is a switch with arguments (such as ARGCOUNT_LIST, ARGCOUNT_HASH, ARGCOUNT_ONE), then hello will be consumed.
  • If -a is a hash switch, then just hello will create a key of hello with an undef value.

Once the command-line options have told us that a particular file is to be read, I just call the file() method. If the file can't be found or read, AppConfig will print a warning by default but the program will go on. This is usually the right thing to do for the users, but if you want to be more strict about wrong options you can enable the PEDANTIC option explained in the manual.

Files before command-line options

Processing the configuration file before the command-line switches is tricky. You may think that you can just save @ARGV, restore it, and call args() again after reading the configuration file. This will create double entries for any lists given at the command line.

Let's say the option was -call to call someone (it's a list) and at the command line -call joe was specified. If you call args() twice on that, you'll get two calls to Joe. If you eliminate duplicates in the list, you may be breaking the intended use of making two calls to Joe!

In my opinion the easiest solution is to parse @ARGV manually as shown in the following listing.


Listing 2. Parsing @ARGV manually

#!/usr/bin/perl -w

use strict;
use AppConfig qw/:argcount/;

my $config = AppConfig->new();

$config->define(
            CONFIG_FILE       =>
            { ARGCOUNT => ARGCOUNT_ONE, ALIAS => 'F' },

            DEBUG        =>
            { ARGCOUNT => ARGCOUNT_NONE, DEFAULT => 0, ALIAS => 'H' },

             );

my $file_switch_found = 0;

foreach my $arg (@ARGV)
{
 # note that this should come FIRST
 # or it will pick up $file_switch_found from the case below
 if ($file_switch_found)
 {
  $config->CONFIG_FILE($arg);
  $file_switch_found = 0;
 }

 # and this should come SECOND
 if (lc $arg eq '-f' || lc $arg eq '-config_file')
 {
  $file_switch_found = 1;
 }

}

print "Debug is ", ($config->DEBUG())? 'on':'off', " before reading the file\n";

$config->file($config->CONFIG_FILE())
 if $config->CONFIG_FILE();

print "Debug is ", ($config->DEBUG())? 'on':'off', " after reading the file\n";

$config->args();

print "Debug is ", ($config->DEBUG())? 'on':'off', " after processing the arguments\n";

Besides the fact that it's harder to maintain, this approach may also be more confusing to the users. It does work, though. Multiple -f switches will produce just one, the last one, as the file to be read.

A tricky point is that in the @ARGV loop, if you check for $file_switch_found second after the check for the -f switch, it will be on already during the processing of the -f switch itself. (Notice that this loop avoids knowledge of where it is in the @ARGV array -- this is an essential step to clean coding in Perl.)

As soon as you start keeping track of array offsets, your code will get much more complicated so any time you can avoid iterating through an array by index you should.

Interpreting multi-level hashes

Normally, if you say -hashoption a=b=c to AppConfig, it will assign the key "a" the value "b=c" immediately. I will show you how to create a deeply nested hash with arbitrary nested levels (although, again, I would caution you against going too deep with this approach instead of using something more appropriate like XML or YAML).

In addition, let's modify the state of AppConfig directly through the $state->{VARIABLE} hash. This may not work with versions of AppConfig after 1.56, depending on the AppConfig author's desire, but it's unlikely that the author would make such a change.


Listing 3. Interpreting multi-level hashes

#!/usr/bin/perl -w

use strict;
use AppConfig qw/:argcount/;
use Data::Dumper;

my $config = AppConfig->new();

$config->define(
            NESTED       =>
            {
             ARGCOUNT => ARGCOUNT_HASH,
             ALIAS => 'N',
             ACTION => \&nesting_action
            },
             );

$config->args();

print "Nested option is ", Dumper($config->NESTED());

sub nesting_action
{
 my $state = shift @_;
 my $vname = shift @_;
 my $value = shift @_;

 # AppConfig can handle this
 return 1 unless ($value =~ m/=.*=/);

 my @matches = ($value =~ m/(("[^"]+"|[^=]+))=?/g)
  if $value;

 my @m;
 my $m = scalar @matches;

 foreach (0..($m/2-1))
 {
  my $lost = shift @matches;
  my $found = shift @matches;
  $found =~ s/^"(.*)"$/$1/;
  push @m, $found;
 }

 @matches = @m;

 # we will always match this
 my $firstkey = shift @matches;

 # nothing to do if we can't get a first key
 return unless defined $firstkey;

 # now, put in the parsed value
 $state->{VARIABLE}->{$vname}->{$firstkey} = {};
 my $hash = $state->{VARIABLE}->{$vname}->{$firstkey};

 while (scalar @matches > 2)
 {
  my $key = shift @matches;
  $hash->{$key} = {};
  $hash = $hash->{$key};
 }

 # note we could have zero @matches, so check first
 if (scalar @matches > 1)
 {
  # we use pop and not shift because of the order of evaluation
  $hash->{pop @matches} = pop @matches;
 }
}

Of course, all the interesting code is in the nesting_action() function.

First, get all the matches with a regular expression that look pretty complex. Just for fun, allow the "=" character inside values as well. In those cases, the user has to surround the value with quotation marks and then can't use quotation marks inside the value (they can be used by themselves in a value, though). If you're wondering why the parsing is fairly simplistic, I didn't think code more complex than that would be fun or educational.

If there is no more than one "=" character in the user-given value, let AppConfig deal with it. Doing otherwise would complicate the code for no gain.

In Perl 6, the regular expression would be much simpler and the processing would be easier because this version distinguishes between grouping for matching and grouping for data extraction. Here, get two of each value and loop to extract only one value. Also, remove the surrounding quotation marks if present. The final result is in the @matches array.

When this action is called, the value is already stored in the $state->{VARIABLE} hash. I overwrote it with the first key, then I looped through the intermediate keys to insert more sub-levels (note that we leave the last sub-level for last).

Finally, when there are just two items left in the @matches array, insert them as a key-value pair in the current sub-level.

I used pop() instead of shift() at the end because of the order of evaluation. I could have also called shift() twice in two separate lines and stored the results in temporary variables, but I thought this was a cleaner if slightly more advanced and less maintainable solution.


Conclusion

AppConfig shines as a way of configuring applications in Perl in the simple cases, but occasionally you need more power in command-line processing and configuration-file parsing. Instead of using data formats such as XML or YAML, you can use AppConfig with some extra work to achieve more advanced goals. This is especially true if you need to process complex command-line switches to create multi-level hashes.

Resist the temptation to make your configuration files simple Perl that can be evaluated. This is a dangerous and fragile solution -- a user error or insecure permissions can compromise your software. It's always better to do the extra work and validate your input than to run unknown and untrustworthy code.

If you find AppConfig useful, you should also look at the AppConfig::Std module on CPAN; it extends AppConfig with some common and useful options.


Resources

  • Find AppConfig 1.56, the Perl5 module for reading configuration files and parsing command-line arguments, in the CPAN archives.

  • The XML::Simple module, which provides a simple API layer on top of an underlying XML parsing module, is also on CPAN.

  • CPAN.org offers a wonderfully complete module archive.

  • Try the fetchmail Home Page for more on fetchmail, the full-featured, robust, well-documented remote-mail retrieval and forwarding utility intended to be used over on-demand TCP/IP links.

  • YAML is a machine-parsable data-serialization format designed for human readability and interaction with scripting languages; it is optimized for data serialization, configuration settings, log files, and Internet messaging and filtering.

  • "Application configuration with Perl" (developerWorks, October 2000) and "Application configuration with Perl, Part 2" (developerWorks, July 2002) both provide an overview to the AppConfig module.

  • "Writing programs that speak English" (developerWorks, July 2002) is a good introductory article on using the Parse::RecDescent module to create a user interface grammar in plain English.

  • Read all of Ted's Perl articles in the Cultured Perl series on developerWorks.

  • Find more resources for Linux developers in the developerWorks Linux zone.

  • Get involved in the developerWorks community by participating in developerWorks blogs.

  • Browse for books on these and other technical topics.

  • Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

  • Innovate your next Linux development project with IBM trial software, available for download directly from developerWorks.

About the author

Author photo

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database architectures, and Unix system administration. Contact Ted with suggestions and corrections at tzz@bu.edu.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=58476
ArticleTitle=Cultured Perl: Complex layered configurations with AppConfig
publish-date=03312005
author1-email=tzz@bu.edu
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).