There is no such thing as too much documentation. Being clear often means repeating yourself. Think of your code as something you present to the world. There are a lot of people in the world. The one comment you thought was redundant could make someone's day. It could even be your day, five years from now, when you are adding a new feature.
Use good planning when writing your programs. You don't have to determine every detail in advance, but you should break up the program into component parts, and use comments to fill in the gaps.
The following is my personal coding style. You may not like it, but try to look at it objectively and see what you can use for yourself or for your team.
First of all, think of the intended audience for the comments. Try to make the comments clear enough for a third-party consultant to follow. The more complex the code, the more comments you should add to clarify intent. Don't leave comments for later; make them a part of your thought process: problem, solution, comment, then debug. Especially important is creating comments before debugging. The comments in your own code will help you debug better and faster.
It is helpful sometimes to state not only the solution to the problem, but also the problem itself. For example:
Listing 1
# function: do_hosts # # purpose: to process every host in the /etc/hosts table and see if it # resolves to a valid IP # # solution: read the list of hosts as keys in a hash, then go through # the list of keys (hosts) and store the IP address for each host as # the value for that key, or undef() if it doesn't resolve properly. # Return a reference to the hash, or undef if the /etc/hosts file was # not accessible. |
Some prefer brevity:
Listing 2
# function: do_hosts: process every host in the /etc/hosts table and see if it # resolves to a valid IP; return a reference to the hash (key=host, # value=IP or undef), or undef if the /etc/hosts file was not accessible. |
And here's another way:
Listing 3
# do_hosts: returns a ref to hash of hosts (key=host, value=IP/undef) # from /etc/hosts |
All of the above ways are valid, depending on the complexity of do_hosts(). If the function is two lines, don't waste your time writing three paragraphs of comments. If it's several pages, however, don't be frugal with explanations.
Commenting the beginning of the program
The program should begin with a brief explanation of its purpose. Don't make people scroll down several pages to figure out what you were doing. If you are using a version control system such as CVS, place the appropriate headers at the beginning of the file, such as the Id header. Be concise. Two lines, four at most, should be sufficient to describe a program briefly. Give a contact name, e-mail, telephone number, or team contact.
Listing 4
#!/usr/bin/perl -w # whodunit.pl: A script to solve a murder mystery # by joe@shmoe.com $Id: whodunit.pl,v 1.92 2000/08/08 19:08:50 joe Exp $ |
The comment on the first line is a standard way on most UNIX systems to indicate which program runs when execution of the script occurs (everything after the '!' is considered the interpreter name). The -w flag signals to turn on warning -- always a good idea, even for an experienced programmer.
The second line (first comment line) is a brief description of the program and its purpose. The third line (second comment line) names the author, and gives an Id header that uniquely identifies the release date and version of the file. RCS and CVS specifically use the Id header, which updates automatically upon committing the script. For more on RCS and CVS, see Resources later in this article.
Commenting initialization sections
The initialization sections should be logically and physically separate from the beginning of the program, by virtue of extra comments or being at the beginning of the file, for example. Initialization sections, as opposed to the program's beginning described above, contain actual code that executes when the program starts up. In Perl, the initialization section should consist of the following (preferably in that order):
- Modules and pragmas
- Constants
- BEGIN/END/INIT/CHECK subroutines
- Initialization code
The use keyword in Perl directs the interpreter to either load a module or turn a pragma on ("no pragma" turns a pragma off). Pragmas nudge the interpreter in the right direction. For example, use utf8 tells the interpreter to prepare for UTF-8 encoded data files and streams.
It's good to line up the comments for each module horizontally and to have one comment per module or pragma:
Listing 5
use Data::Dumper; # for debugging printouts use strict; # be strict - pragma for the interpreter use POSIX; # use the POSIX functions |
After the first time you do this, it's just a matter of copy and paste to get the modules and pragmas into a new program. I recommend the "strict" pragma. Among other things, it will ensure that you are honest about declaring your variables, which in my experience is as much a source of bugs in Perl as memory allocation is in C/C++.
See all module and pragma documentation with the perldoc command. For example, perldoc strict tells all about the strict pragma -- what it does, how to use it, and so on.
Some editors have the nice ability to always place comments at a certain position (in Emacs, the indent-for-comment command does this automatically). Thoroughly familiarize yourself with your editor's commands. It is time well spent.
Although you can view constants as just another Perl pragma, they deserve their own section. Commenting for them should be like that of modules and pragmas, but it looks nice if the arrows line up as well:
Listing 6
use constant ALPHA => 1; # alpha code use constant BETA => 2; # beta code use constant GAMMA => 3; # gamma code use constant USER => 4; # user ID offset use constant GROUP => 5; # group ID offset use constant DEPT => 6; # dept. ID offset |
BEGIN/END/INIT/CHECK subroutines
Comment the BEGIN/END/INIT/CHECK subroutines (see perldoc perlmod for more information on them) just like regular subroutines. Creation can be anywhere in the file, and it's possible to define them multiple times. I recommend placing them at the beginning or end of the file, where finding them can be easy. Note that a one-line BEGIN function does not need extensive commenting.
Listing 7
# BEGIN: executed at startup, assigns 'root' to the USER environment
variable
BEGIN
{
$ENV{USER} = 'root';
}
|
Last in the initialization section comes the actual code. Again, line the comments up if possible within individual blocks.
Listing 8
$| = 1;
# auto-flush the output
$Data::Dumper::Terse = 1; # produce human-readable
Data::Dumper output
# define the configuration variables
my $config = AppConfig->new();
$config->define(
# list of undo commands
'UNDO' => { ARGCOUNT => ARGCOUNT_LIST },
# file to log data
'LOG_FILE' => { ARGCOUNT => ARGCOUNT_ONE },
);
$config->file(whodunit.conf'); # load the whodunit
configuration file
|
This initialization code turns auto-flushing on (so output will show up immediately), then tells the Data::Dumper module to produce human-readable output, and finally creates an AppConfig configuration.
Commenting regular code is pretty easy. Just line up the comments when possible, be concise, and don't be afraid to explain things in-depth when they are unclear.
Listing 9
print Dumper \%ENV; # print the full ENV hash
# get the environment variable names that begin with USER
@user_vars = grep(/^USER/, keys %ENV);
# print the values in all the variables that begin with USER, using a
# hash slice
print Dumper @ENV{@user_vars};
print "Done\n"; # print "done" message
# TODO: find better method of sorting variables
# TODO: use Data::Dumper with variable names
|
Note that comments begin either at column 0 or column 40. Consistency makes comments more readable. Also, multi-line comments are fine when necessary. You can also use comments to note where functionality is missing, buggy, or incomplete. The "TODO" word is helpful in case you want to look through all your code and see what things are still incomplete -- a quick grep command will print out all the TODO items.
There's no need to comment every single line of code, but keep in mind that comments are the single best resource when debugging or extending programs. Any other source of programmer documentation is likely to be one step behind the actual code, unless the programmer has been very diligent.
Commenting loops and conditionals
Loops and conditionals should be commented like regular code and functions. Numbering loops to identify them later is somewhat extreme. A better approach is to use a folding editor, which can show a whole loop as one line upon folding the loop (lines between folding marks are hidden but not gone). Think of folding marks like XML/HTML begin/end tags, which are possible to nest. Your favorite editor may support folding already. (X)Emacs does, either with Outline or with folding.el modes.
Listing 10
# go through all the numbers between 2 and 200, and print a message
# for each one
foreach my $counter (2 .. 200)
{
print "Whoa, the counter is $counter!\n";
}
|
Always state the purpose and bounds of the loop. For example, "count from 2 to 200" is fine, but "process array" is not. If logical conditions affect the bounds, state them as well, but not at the top of the loop. The summary at the top of the loop should not note exceptions to the general iteration, unless they are very important to the loop. Let discretion be your guide.
Commenting the final stages of the program
In many ways, the end of the program is the most boring. The work has been done, the data structures have gone to sleep (there is no memory deallocation that you need to worry about in Perl), and now the end is just a few lines away. Don't let this fool you -- the finishing lines of a program can be just as perilous as the rest. Comment the most trivial lines here because the first thing a debugging programmer does is look at the program's exit behavior.
Listing 11
# delete old files, warn if they can't be removed
foreach (@myfiles)
{
unlink $_ or warn "Couldn't remove $_: $!";
}
print "whodunit.pl is done!\n"; # tell the user we're done
exit; # exit peacefully
|
Writing POD documentation and help for the program
Plain old documentation (POD) is a way to document a Perl script inside the script itself. The perldoc perlpod command will tell you more about POD documentation and its syntax. Good POD documentation means that users can access help for your program quickly and efficiently. Take the time to learn the POD syntax; writing manuals will be much easier. In addition, POD is compatible with a variety of manual formatters, so you can generate a plain text file, a UNIX-style man page, and a professional-looking LaTeX file from the same documentation. POD is a fairly limited format, but perfectly sufficient for most documentation needs.
Generally, the following sections should be present in POD documentation: NAME, SYNOPSIS, DESCRIPTION, OPTIONS, RETURN VALUE, ERRORS, DIAGNOSTICS, EXAMPLES, ENVIRONMENT, FILES, CAVEATS/WARNINGS, BUGS, RESTRICTIONS, NOTES, SEE ALSO, AUTHORS, HISTORY (from perldoc pod2man, where you can find more information on each section; keep in mind that these are suggestions rather than imperatives).
Some programmers make the -h switch to their programs invoke perldoc on the program, so the POD documentation is printed out as if the user had typed perldoc whodunit.pl. The problem here is that a user doesn't want too much extra information from the -h switch. He just wants the synopsis and the list of options. Thus, it is better to write separate help handlers arising from the use of the -h switch:
Listing 12
# print_help: help handler, prints out help for whodunit.pl and exits
sub print_help
{
# print the help itself
print << EOHIPPUS;
This is help for the whodunit.pl program.
You can pass options to whodunit.pl as command-line arguments. For
example:
..../whodunit.pl -h
..../whodunit.pl -show suspects
List of options:
-h : print this help
-show : show the suspects, victims, or detectives (all of them if no
second argument is specified)
-quiet : print no information other than the killer's name
EOHIPPUS
exit; # do nothing else, just exit quietly
}
|
Note the documentation of print_help itself. Also, the
the appearance of the POD documentation and other online help is important. The first place a user goes is not the manual. It's much more convenient to use the -h flag, or look at the POD documentation. Note the alignment of the colons, the spacing between lines, and the overall neatness. Outward appearances do matter, often more than the actual functionality provided by the program. Well-written programs should have good documentation first and foremost.
Some programmers like to include POD documentation in their program instead of regular comments. Such POD comments begin with =pod on a line by itself (there are other options, explained in the perlpod documentation), and end with =cut on a line by itself. The =pod line tells the Perl compiler to stop interpreting everything until the =cut line, in effect excluding that block of text from the script itself. This is fine if your users are also programmers, but may confuse normal users who just want to look at the documentation for the script, not the comments for the code itself. This approach also scatters documentation throughout the code. Use it with restraint.
- Read the previous chapters of The road to better programming.
- Find out more about RCS at the RCS home page.
- Learn more about CVS at the CVS home page.
- Read about folding editors.
- Type "perldoc perlpod" at the command prompt for the perldoc perlpod page.
- Although the Java Code Conventions pertain to Java, these conventions (especially Chapter 5) are applicable in spirit to Perl as well. The Javadoc format is very effective in documenting code for other programmers, but is no substitute for user documentation. POD is well rounded but lacks API documenting features.
- Take a look at the CPAN documentation modules of CPAN.org.
- The Lip::Pod module and Literate Programming concept mix documentation and code freely in a self-documenting fashion. It has parallels to both POD and Javadoc documentation.
- The comp.programming.literate FAQ, by David B. Thompson, gives a basic description of literate programming and how the application of literate programming principles can improve the resulting code. He also presents a list of tools available to literate programmers.
- Here's some LP propaganda including an overview, examples, cool stuff, and history.
- Visit XEmacs LP mode for some original thoughts and observations on LP as well.
- Find links to all sorts of helpful resources at the literate programming site.
- Read Teodor's other Perl articles in the developerWorks "Cultured Perl" series:
- A programmer's Linux-oriented setup
- Application configuration with Perl
- Automating UNIX system administration with Perl
- Debugging Perl with ease
- The elegance of JAPH
- Genetic algorithms applied with Perl
- One-liners 101
- Parsing with Perl modules
- Perl 5.6 for C and Java programmers
- Reading and writing Excel files with Perl
- Review of Programming Perl, Third Edition
- Small observations about the big picture
- Writing Perl programs that speak English
- Browse more Linux resources on developerWorks.
- Browse more Open source resources on developerWorks.
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management.





