Skip to main content

Cultured Perl: Writing Perl programs that speak English

Using Parse::RecDescent to create a simple and efficient command-line user interface

Teodor Zlatanov (tzz@bu.edu), Programmer, Gold Software Systems
Author photo
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, Unix system administration, CORBA, and project management. He can be contacted at tzz@bu.edu.

Summary:  Designing the user interface for a program can be difficult and time consuming. Teodor Zlatanov discusses how to use the Parse::RecDescent module to create a user interface grammar in plain English. He also shows how easy it is to change the grammar when features are added or removed from the program. The advantages and disadvantages of this approach are discussed and compared to a standard CLI parser and a GUI.

Date:  01 Aug 2000
Level:  Introductory
Activity:  1922 views

Cool user interfaces evolve with functionality

Because it is the initial gateway to a program, the user interface must be able to serve multiple purposes. It has to provide the user appropriate access to all the features of the program. It has to be extensible, in the almost certain case that more features are added to the program. It has to be flexible, accepting abbreviations and shortcuts for common commands. It should not have a cascade of menus or waterfall of words in which the user will get lost. All of these are admittedly complex constraints without one single perfect solution to them all. Many software product developers tackle the user interface last, adding it on as an afterthought. Others concentrate first on the user interface, letting functionality be only a consequence of the interface design choices. Neither approach is desirable. The user interface (UI) should evolve along with the functionality of a program, as two sides of the same coin.

Here we use a parsing-oriented approach to the user interface. And although this approach is adaptable to a GUI interface, GUI design is not discussed in this article. We will focus exclusively on a text-based UI. First I will briefly present the standard text UI design choices to familiarize you with the environment. Then we will look at a demonstration of a Parse::RecDescent solution, which actually proves to be flexible, intuitive, and easy to write!

Note: Some of the programs we discuss will require the Parse::RecDescent CPAN module in order to run.


A simple user interface done the old-fashioned Unix way

Unix users are intimately familiar with the text-based UI model. Let's first look at a simple implementation of this model for a fictitious Perl program. The standard Getopt::Std module simplifies the parsing of the command-line arguments. The program is only a demonstration of the Getopt::Std module (it does not do anything useful). Resources at the end of this article.


Command-line switches with Getopt::Std
		
#!/usr/bin/perl -w

use strict;				# always use strict, it's a good habit
use Getopt::Std;			# see "perldoc Getopt::Std"

my %options;
getopts('f:hl', \%options);		# read the options with getopts

# uncomment the following two lines to see what the options hash contains
#use Data::Dumper;
#print Dumper \%options;

$options{h} && usage();			# the -h switch

# use the -f switch, if it's given, or use a default configuration filename
my $config_file = $options{f} || 'first.conf';

print "Configuration file is $config_file\n";

# check for the -l switch
if ($options{l})
{
 system('/bin/ls -l');
}
else
{
 system('/bin/ls');
}

# print out the help and exit
sub usage
{
 print <<EOHIPPUS;
first.pl [-l] [-h] [-f FILENAME]

Lists the files in the current directory, using either /bin/ls or
/bin/ls -l.  The -f switch selects a different configuration file.
The -h switch prints this help.
EOHIPPUS
exit;
}       


A simple event loop

When command-line arguments are not enough, the next step is to write an event loop. Command-line arguments are still accepted in this scheme, and are sometimes sufficient on their own. An event loop, however, enables the user to invoke the program without any parameters and to see a prompt. The help command is usually available at the prompt and will print out more detailed help. Sometimes the help will even be a separate input prompt with an entire software subsystem dedicated to it.


Event loop with command-line switches

#!/usr/bin/perl -w

use strict; # always use strict, it's a good habit
use Getopt::Std; # see "perldoc Getopt::Std"

my %options;
getopts('f:hl', \%options); # read the options with getopts

# uncomment the following two lines to see what the options hash contains
#use Data::Dumper;
#print Dumper \%options;

$options{h} && usage(1); # the -h switch, with exit option

# use the -f switch, if it's given, or use a default configuration filename
my $config_file = $options{f} || 'first.conf';

print "Configuration file is $config_file\n";

# check for the -l switch
if ($options{l})
{
 system('/bin/ls -l');
}
else
{
 my $input; # a variable to hold user input 
 do
 {
  print "Type 'help' for help, or 'quit' to quit\n-> ";
  $input = ;
  print "You entered $input\n"; # let the user know what we got

  # note that 'listlong' matches /list/, so listlong has to come first
  # also, the i switch is used so upper/lower case makes no difference
  if ($input =~ /listlong/i)
  {
   system('/bin/ls -l');
  }
  elsif ($input =~ /list/i)
  {
   system('/bin/ls');
  }
  elsif ($input =~ /help/i)
  {
   usage();
  }
  elsif ($input =~ /quit/i)
  {
   exit;
  }
 }
 while (1); # only 'quit' or ^C can exit the loop
}

exit; # implicit exit here anyway

# print out the help and exit
sub usage
{
 my $exit = shift @_ || 0; # don't exit unless explicitly told so
 print <<EOHIPPUS;
first.pl [-l] [-h] [-f FILENAME]

The -l switch lists the files in the current directory, using /bin/ls -l.
The -f switch selects a different configuration file.  The -h
switch prints this help.  Without the -l or -h arguments, will show
a command prompt.

Commands you can use at the prompt:

 list:                   list the files in the current directory
 listlong:               list the files in the current directory in long format
 help:                   print out this help
 quit:                   quit the program
 
EOHIPPUS
 exit if $exit;
}     

One of three things will usually happen at this point:

  • The program's UI will get unbearably complex due to the many combinations of switches that can happen.
  • The UI will evolve into a GUI.
  • The UI will be redone from the ground up with at least some parsing capabilities.

The first option is too gruesome to contemplate. The second is not discussed here, but does present interesting challenges in backward compatibility and flexibility. The third option is the topic of the rest of this article.


A quick Parse::RecDescent tutorial

Parse::RecDescent is a module for parsing text. With a few simple constructs it can be used for almost any parsing task. The more advanced grammar constructs can be daunting, but they are not needed for most purposes.

Parse::RecDescent is an object-oriented module. It creates a parser object around a grammar. A grammar is a collection of rules in text form. The following example is a single rule that matches a word:


The word rule

word: /\w+/       

The rule matches a word character (\w) one or more times. The part that follows the colon is called a production. A rule must contain at least one production. A production consists of either other rules, or of things to match directly. The following example is a rule that can match a word, another rule (non-word), or an error (if the other two fail):


Alternate productions

token: word | non-word | 
word: /\w+/
non-word: /\W+/       

Each production can also contain an action, enclosed in braces:


An action in the production

print: /print/i { print_function(); }       

If the action is the last thing in the production, then the action's return code decides whether the production was successful. The action is a sort of null production that will always match unless it returns 0.

Multiple tokens can be designated with the (s) modifier:


One or more tokens in a production

word: letter(s)
letter: /\w/       

The (?) (0 or 1) and (s?) (0 to N) modifiers are also available as optional keywords.

Anything in the production can be accessed through the $item[position] or $item{name} mechanism. Note that in the second case, the name for the two words is the same, so positional addressing must be used. In the third case, the array of words is stored as an array reference in $item{word}. If optional items are used in a production, the array positioning scheme will definitely not work well. This scheme should be generally avoided in any case, because the by-name addressing will always be easier and simpler:


Using the %item and @item variables

print: /print/i word { print_function($item{word}); }
print2: /print2/i word word { print_function($item[1], $item[2]); }
print3: /print3/i word(s) { print_function(@{$item{word}}); }       

For more help on this topic, look carefully at the Parse::RecDescent perldoc page and the tutorials that come with the module.

Why Parse::RecDescent is a good user interface engine

  • Flexibility: rules can be added or removed easily and don't require tweaking other rules.
  • Power: rules can invoke any code and can recognize any text pattern.
  • Ease of use: it takes 5 minutes to put together a simple grammar.
  • Any front-end will work: the parser can be accessed as a regular Perl function, and can access other regular Perl functions and modules.
  • Internationalization: this is an often-overlooked UI design issue. Internationalization is easy when the parsing grammar is intended to easily accept multiple versions of a command.

Why Parse::RecDescent may not be a good UI engine

  • Speed: startup and parsing speed is not as good as a straightforward matching algorithm. This will improve with future releases of the module, and should be carefully considered in relation to the savings of quick prototyping, development, and release.
  • Module availability: Parse::RecDescent may not be available, due to OS or system administration problems. Talk to your local Perl guru.

A simple user interface with Parse::RecDescent

This script extends the simple event loop with switches to use Parse::RecDescent as the parsing engine. The most noteworthy advantage of this script is that matching statements no longer have to be executed. The grammar instead determines both the format of the user input and the actions to take upon encountering the input. The usage() function was also greatly improved because the need to handle two separate invocation modes was eliminated.

Note how the command-line arguments are passed directly to the parsing engine. This means the Getopts::Std module is not needed anymore, because Parse::RecDescent will do that job just fine. Parse::RecDescent could be similarly adapted to parse configuration files if they were sufficiently complex (the AppConfig CPAN module does a wonderful job for simple-to-medium complexity configuration files).

In the next section we will further extend the simple UI we have created. Note that the extension is easy to understand and modify, and that it can do the job just as well as the previous, non-parsing example (see the Event loop with command-line switches).

All the actions below end with '1;' because the last code in the action determines whether the action succeeded (0) or failed (1) as a return code. Actions are quite similar to functions in this respect. If an action fails, the whole production fails. So ending the action with '1;' ensures success.


A simple UI with Parse::RecDescent

#!/usr/bin/perl -w

use strict; # always use strict, it's a good habit
use Parse::RecDescent; # see "perldoc Parse::RecDescent"

my $global_grammar = q{
  input: help | helpquit | quit | listlong | list | fileoption |
         <error>

  help: /help|h/i { ::usage(); 1; }

  helpquit: /-h/i { ::usage(); exit(0); }

  list: /list|l/i { system('/bin/ls'); 1; }

  listlong: /-l|listlong|ll/i { system('/bin/ls -l'); 1; }

  fileoption: /-f/i word { print "Configuration file is $item{word}\n"; 1; }

  quit: /quit|q/i { exit(0) }

  word: /\S+/
};

{ # this is a static scope!  do not remove!
 # $parse is only initialized once...
 my $parse = new Parse::RecDescent ($global_grammar);

 sub process_line
 {
  # get the input that was passed from above
  my $input = shift @_ || '';
  # return undef if the input is undef, or was not parsed correctly  
  $parse->input($input)
   or return undef; 
  # return 1 if everything went OK
  return 1;
 }
}

# first, process command-line arguments
if (@ARGV)			       
{
 process_line(join ' ', @ARGV);
}

do
{
 print "Type 'help' for help, or 'quit' to quit\n-> ";
 my $input = <STDIN>; # a variable to hold user input 
 print "You entered $input\n"; # let the user know what we got
 
 process_line($input);
} while (1); # only 'quit' or ^C can exit the loop

exit; # implicit exit here anyway

# print out the help and exit
sub usage
{
 print <<EOHIPPUS;
first.pl [-l] [-h] [-f FILENAME]

The -l switch lists the files in the current directory, using /bin/ls -l.
The -f switch selects a different configuration file.  The -h
switch prints this help.  Without the -l or -h arguments, will show
a command prompt.

Commands you can use at the prompt:

 list | l           : list the files in the current directory
 listlong | ll | -l : list the files in the current directory in long format
 help | h           : print out this help
 quit | q           : quit the program
 
EOHIPPUS
}      


A complex user interface with Parse::RecDescent

We will now showcase specific abilities of the Parse::RecDescent grammar by adding on to the UI features from a simple event loop and a simple user interface. The new features we will be looking at are: optional command parameters, variable actions based on parameters, and internal grammar state variables.

Notice that comments are placed inside the grammar. This is perfectly fine, as long as the comments follow the Perl convention (everything from a lone '#' to the end of the line is a comment).

The set_type rule sets the $last_type variable to be equal to its parameter. It will not match unless "set type" or "st" is followed by a word.

The optional parameters to the list command imply that the command can list specific files, or all the files, depending on how the command was invoked. Since we pass the de-referenced array of parameter words directly to the '/bin/ls' command, it is not a problem if the array is empty. Particular care should be taken with this approach (and any approach that uses the system() function, backticks, or any user-provided input to do file operations). Running Perl with the -T (taint) option is highly recommended. If there is a possibility that that user input may be passed directly to the shell, then you really can't preform a careful examination for a potential security breach. See the perlsec page ('perldoc perlsec') for more information on this.

The order/dairy_order commands illustrate alternate versions of a command, based on the parameters the command is given. Because dairy_order comes before order, it is tried first. Otherwise order would also match any dairy_order. Keep the sequence of commands in mind when you design a complex grammar. Note also how numbers are optionally detected through a new rule. In this case, the grammar has condensed two versions of a command (with and without numbers) into one version that works both ways. The order and dairy_order commands could have been merged as well here by specifying that the parameter be either a dairy_product or a word.

In Boston, what most other English speakers call (milk) shakes are called "frappes."


A complex UI with Parse::RecDescent

#!/usr/bin/perl -w

use strict; # always use strict, it's a good habit
use Parse::RecDescent; # see "perldoc Parse::RecDescent"

my $global_grammar = q{
  
  { my $last_type = undef; } # this action is executed when the 
                             # grammar is created

  input: help | helpquit | quit | listlong | list | fileoption |
         show_last_type | set_type | order_dairy | order |
         <error>

  help: /help|h/i { ::usage(); 1; }

  helpquit: /-h/i { ::usage(); exit(0); }

  list: /list|l/i word(s?) { system('/bin/ls', @{$item{word}}); 1; }

  listlong: /-l|listlong|ll/i { system('/bin/ls -l'); 1; }

  fileoption: /-f/i word { print "Configuration file is $item{word}\n"; 1; }

  show_last_type: /show|s/i /last|l/i /type|t/ { ::show_last_type($last_type); 1; }
  
  set_type: /set|s/i /type|t/i word { $last_type = $item{word}; 1; }

  order_dairy: /order/i number(?) dairy_product
                { print "Dairy Order: @{$item{number}} $item{dairy_product}\n"; 1; }

  order: /order/i number(?) word
                { print "Order: @{$item{number}} $item{word}\n"; 1; }

  # come to Boston and try our frappes...
  dairy_product: /milk/i | /yogurt/i | /frappe|shake/i

  quit: /quit|q/i { exit(0) }

  word: /\S+/

  number: /\d+/
};

{ # this is a static scope!  do not remove!
 # $parse is only initialized once...
 my $parse = new Parse::RecDescent ($global_grammar);

 sub process_line
 {
  # get the input that was passed from above
  my $input = shift @_ || '';
  # return undef if the input is undef, or was not parsed correctly  
  $parse->input($input)
   or return undef; 
  # return 1 if everything went OK
  return 1;
 }
}

# first, process command-line arguments
if (@ARGV)			       
{
 process_line(join ' ', @ARGV);
}

do
{
 print "Type 'help' for help, or 'quit' to quit\n-> ";
 my $input = <STDIN>; # a variable to hold user input 
 print "You entered $input\n"; # let the user know what we got
 
 process_line($input);
} while (1); # only 'quit' or ^C can exit the loop

exit; # implicit exit here anyway

# print out the help and exit
sub usage
{
 print <<EOHIPPUS;
first.pl [-l] [-h] [-f FILENAME]

The -l switch lists the files in the current directory, using /bin/ls -l.
The -f switch selects a different configuration file.  The -h
switch prints this help.  Without the -l or -h arguments, will show
a command prompt.

Commands you can use at the prompt:

 order [number] product: order a product, either dairy (milk, yogurt, 
 frappe, shake), or anything else.
 set|s type|t word     : set the current type word
 show|s last|l type|t  : show the current type word
 list | l              : list the files in the current directory
 listlong | ll | -l    : list the files in the current directory in long format
 help | h              : print out this help
 quit | q              : quit the program
 
EOHIPPUS
}

sub show_last_type
{
 my $type = shift;

 return unless defined $type; # do nothing for an undef type word

 print "The last type selected was $type\n";
}
       


Parse::RecDescent: powerful, easy, and adaptable

The parsing capabilities of Parse::RecDescent are endlessly adaptable. Here they were shown to be capable of creating a UI parsing engine with several significant advantages over homegrown approaches. As with all tools as powerful as Parse::RecDescent, speed may be a concern. But the time you save in development and testing may significantly balance this out.

Parse::RecDescent greatly simplifies complex parameter lists and parsing of user input. Alternate versions of commands are easily accepted, which has the benefit of allowing for abbreviations and internationalization, among other things.

GUIs actually often have a Parse::RecDescent parser in the background. If you are designing a GUI like this, you can easily translate menu commands into grammar rules, especially since menus already have a tree-like structure that makes sure there are no overlapping commands. User input from the command line or from a separate field (an "expert" mode, perhaps) can be used in a GUI like this, making it even better from a usability and customization standpoint.

A Parse::RecDescent grammar is easy to understand. You don't need to know much to understand and extend the grammar, which can be very helpful when you're tackling large projects. Multiple parsers can be used with different grammars and purposes in a single program. (As we have seen, grammar can also come from a file or from an internal text string.)

Parse:RecDescent should always be treated as a power tool. It is too slow and unwieldy to make a difference in a small program. But for even moderately complex user input, the benefits are immediately visible in the better organization of code and functionality. Porting existing grammars (command-line switches or home-brewed functions) to Parse::RecDescent is easy, and writing new grammars is even easier. Every UI builder should find this power tool useful.


Resources

  • Read Ted's other Perl articles in the "Cultured Perl" series on developerWorks.

  • Visit CPAN for all the Perl modules you ever wanted. The Comprehensive Perl Archive Network is the best source of Perl modules. Automatic installation is also supported, so that modules can be added quickly and efficiently.

  • Go to PERL.com for Perl information. Start here if you are interested in the language, the people behind it, training, hiring, or Perl news.

  • Programming Perl 3rd Edition, by Larry Wall, Tom Christiansen, and Jon Orwant (O'Reilly Associates, 2000), is the best guide to Perl today, newly updated with 5.005 and 5.6.0 information. The third edition just came out and is a wonderful book, highly recommended.

  • Refer to the perldoc pages: Getopt::Std, Getopt::Long. Type 'perldoc Getopt::Std' or 'perldoc Getopt::Long' at your prompt to retrieve the documentation for these modules, which make parsing command-line arguments easy.

  • See the Parse::RecDescent documentation. The Parse::RecDescent module is complex and powerful. This article and my previous one on Parse::RecDescent present enough information to get you started with Parse::RecDescent. If you want to use the full features of the module, however, you must read the documentation and understand how those features may be harnessed.

  • Visit the home page of Larry Wall, the original Perl guru.

  • Meet other Perl Mongers at the Republic of Perl home page.

  • Go to the home page for the Perl Conference to find out about Perl gatherings and conferences.

About the author

Author photo

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, Unix system administration, CORBA, and project management. He can be contacted at tzz@bu.edu.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11027
ArticleTitle=Cultured Perl: Writing Perl programs that speak English
publish-date=08012000
author1-email=tzz@bu.edu
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers