Clicking on a hypertext link while viewing a PDF file shouldn't be a security problem as long as you trust the viewer it invokes. But users of xpdf version 0.90 discovered that this assumption was an extremely bad one.
When an xpdf user clicked on a hypertext link, xpdf started up a viewer (Netscape by default) and sent the URL to the viewer. So far, so good. But the xpdf developers decided to start up the viewer by using the system() call. That was the bad idea.
In Unix-like systems, system() invokes the command shell, which then interprets the text sent to it. This means an attacker could then create a PDF file with bogus URLs that are interpreted specially by the shell. Then, if an attacker can convince someone to click on the hypertext link, they can make the person run any program they choose -- say to erase all files or to e-mail everything they can read to a foreign mail drop.
The solution: Calling out to secure components.
Some notes on reusing components
Application programs are practically never self-contained. They typically make calls to other components, components that might include the following:
- The underlying operating system
- Database systems
- Reusable libraries (including dynamically loaded libraries)
- Internet services (like DNS)
- Web services, and so on
Most of today's programming languages include large built-in libraries, some of which are self-contained and others that might be specifically designed to call out to external programs or services.
The reuse issue is absolutely critical -- it would be insane to write everything from scratch for each application, because it would take too long to redevelop each component. And there are potential security advantages to reuse -- developers can concentrate on making sure that their component doesn't have security vulnerabilities, thereby passing the benefit of a secure component onto many more systems. Even if vulnerabilities are found later, they can be fixed once and the fix applies to every application that uses the component.
There is, however, a dark side to making calls to other components: attackers might be able to exploit those calls.
The Open Web Application Security Project's (OWASP) "Top Ten Most Critical Web Application Vulnerabilities" identifies one of the most critical vulnerabilities as injection flaws. OWASP notes that Web applications pass parameters to external systems or the local operating system and it notes that:
If an attacker can embed malicious commands in these parameters, the external system may execute those commands on behalf of the ... application.
To help plan carefully when reusing components, here's a brief list of things to consider:
- Use only secure components and only in secure ways.
- Pass only valid data to a component and be sure that it will be interpreted as you expect. In particular, watch out for meta-characters (the cause of SQL injection, shell meta-character injection, format string, and Perl open() attacks).
- Check return values and handle exceptions.
- Protect data as it goes between your application and the component.
Let's examine these considerations in more detail.
Secure components, secure ways
The first step before reusing anything, even a small function, is to check its documentation to see if there's a security warning about it. For example, the C standard includes a gets() function, but the documentation about it will warn you that gets() is dangerous. The problem is that gets() doesn't protect itself from buffer overflows; an attacker can simply send more data than the buffer passed to gets() can store. That will let the attacker control internal data and possibly take over the program.
Here's another example: every language has a function that returns a "random" value that is actually predictable. These functions are helpful for simulations (so you can re-run the simulation), but they are extremely dangerous if you're using them to create secret keys for security. Why? They make it easy for an attacker to determine the key.
If you're using a widely-used component, consider doing a Web search to see if there are known security issues about it or common ways to unintentionally use it.
Be especially careful if you're trying to reuse an entire program intended for human interaction -- a good rule is that you shouldn't do that. Programs intended for human interaction try to be "helpful" and guess the user's intent. This "help" can also aid an attacker to create data to mislead the program.
Programs intended for human interaction often undergo subtle changes over time that are nice for users, but can confuse programs that try to call them. Also, these programs often have functions that let the user invoke other programs on their behalf. For example, text-editing programs such as vim (vi), emacs, and the ancient "ed" program include functions that let a user call out to the operating system command shell, creating an easy target for attacker exploitation. When text editing on a Unix-like system, it is safer to use one of the many commands intended for use by programs such as sed, tr, gawk, or perl.
The same is true for word processing, spreadsheet programs, and so on. Generally, human-interactive programs should be written so that they build on a set of other components that can be called by programs. That way, it's much easier to fix the user interface and it's easier for other programs to reuse the capabilities of the program as well.
Passing valid data avoids meta-character problems
If a component is being reused, it normally won't be tailored for your particular application. Instead, the reused component will sport a nice general interface with lots of capabilities. Attackers may try to provide data to your application that, when passed on to the reused component, will do what you don't expect.
It's important to know exactly what data is allowed to be sent to anything you reuse and make sure that you only send valid data. If there's any doubt at all, check the data before you send it to the component (in fact, it's not a bad idea to check anyway). If a number should be between 0 and 100, check that. Be sure you use the same datatypes as the component will use. If the component expects signed integers, make sure that you convert your data to signed integers and then check. In particular, if the component expects a signed integer with the smallest value of 0, actually check for that. In nearly all languages, an unsigned integer that's very large will be changed to a negative number if changed to a signed integer, because the sign bit is normally the most significant bit in a signed value.
When possible, make sure that the data you send won't be controlled by an attacker. In fact, where possible, make it a constant. The often-serious format string attack is based on the idea that an attacker can control the format used to display data. However, if the format is a constant in the program, there's nothing for the attacker to control. The gcc compiler option -Wformat-security will warn you of some cases where the code may be vulnerable to format string attacks.
You'll need to make sure that the component will interpret the data you send the way you expect, even if an attacker can influence the value of that data. This brings us to a very common problem -- meta-character vulnerability. Many sophisticated reusable components implement a complex command language of their own. An attacker can try to insert text that will be interpreted by the component to do something that you (the developer) didn't intend. Two of the most common components where this happens are the Unix command shell (/bin/sh) and SQL commands to database systems, so let's examine them.
The command shells of Unix-like systems, in particular the standard shell /bin/sh, are extremely useful. They make it easy to combine different programs together to do useful actions. It's not at all unusual to find that a single text line to a shell can do the same work as a hundred-line program.
The standard shell is a full programming language with many built-in capabilities designed to make it easy to integrate other programs. And many programming languages include built-in capabilities to invoke the standard shell -- C's system(3) and popen(3) functions, as well as Perl's backtick (`) operator invoke the standard shell. Users of only Windows and MS-DOS are often surprised how often shells are used in Unix-like systems, because COMMAND.COM (the Win/DOS equivalent) is quite limited and not capable of the same functions. But with power comes responsibility.
In particular, many characters have a special meaning to the shell, so a common attack trick is to try to get a program to re-transmit these special characters to the shell if it's ever called.
The best solution is to not call the shell directly from inside a program. Since it's easy to make a mistake, just don't do it. This means avoid using system() and popen() in secure programs, especially when what's being sent to these calls isn't constant. Don't write a setuid/setgid shell script; it won't work on Linux-based systems at all and it's a security vulnerability on some other Unix-like systems.
If you're careful, you can write shell scripts that aren't setuid/setgid. Shell scripts aren't as bad as other uses of the shell, because the script itself is static inside the file. Even this use of the shell has its disadvantages, though, because it's easy to do things in the shell that let an attacker control the program.
If you're writing a shell script, it's wise to write the full pathname of each program so that even if an attacker can manipulate the PATH, it's harder to assail. If there's a setuid/setgid program that calls it, make sure it cleans up the environment first (including setting a reasonable PATH value).
If you insist on calling the shell directly from a program, make sure that the environment is set up securely. Attackers like to play tricks with PATH, IFS, and other environment variables to cause trouble. Invoke commands with their full pathnames.
When invoking the shell directly, the biggest problem is if you send the shell data that is derived from untrusted data. If an attacker can send characters that have special meaning to the shell, there will be trouble. And there are a lot of such characters.
In the shell, double-quote (") ends another double-quote, semicolon (;) ends a command and starts a new one, and so on. For example, imagine that you have a C++ program with this code:
Listing 1. Vulnerable C++ code
#include <string> using namespace std; ... string command = "md5sum < " + filename + " > ./results"; system(command.c_str()); } |
The question to ask is can an attacker control the value of the variable filename? If the attacker can, there's a problem.
Let's look at an example of how code like this could be exploited. SLOCCount is a program I wrote that counts the number of source lines of code (therefore, SLOC) in programs (see Resources for a link). Some people think that I add code to SLOCCount that looks like the previous code. It's important to realize, though, that some people use SLOCCount to measure programs they didn't write. In fact, the program they're measuring might be written by an attacker. Since SLOCCount can be sent a program written by an attacker specifically to harm SLOCCount users, SLOCCount should try to defend its users from those attackers.
Now imagine what happens if an attacker names one of their files x ; ;rm -fr ~ -- such filenames are possible. That means that this system command will send the shell the following command:
md5sum < x ; rm -fr ~ > ./my-output
This would erase the entire home directory of the person running the SLOCCount. Ugh. Notice that although this program has no network interfaces, we still have to worry about security, because an attacker can provide some of its inputs.
There are other potential problems with this code because of how it stores its results. What happens if there's more than one program writing to my-output? Can an attacker manipulate the file, its directory, or any of its ancestor directories?
A simple solution to the meta-character problem is to restrict input to only characters that you know aren't meta-characters, such as A-Z, a-z, and 0-9. Sometimes you can do that.
But if you must use the shell and you can't limit the data that comes in, you need to escape every potential meta-character before sending it in a command to the shell. For the shell, insert the "\" character in front of every character that you aren't sure is okay to send to the shell as part of a command. In addition, don't allow NUL characters through at all; some shells don't handle it well as command input.
SQL injection is essentially the same problem as the shell meta-character one, but with an SQL interpreter instead of the shell.
In an SQL injection attack, a program creates an SQL command and sends it to an SQL interpreter. The program allows an attacker to include characters that change the meaning of that SQL command. The results of these attacks are wide-ranging -- they can reveal data you want to keep confidential, allow changing of arbitrary data, permit authenticating where it shouldn't, and tie up the database so that the system becomes unavailable.
SQL injection attacks are a vulnerability that tend to hurt high-value sites. If your site has enough data to justify use of an SQL database, then that data is probably valuable to someone for nefarious purposes. And while it's often easy to rewrite calls to the shell so that the shell isn't even involved, it's not practical to avoid SQL requests. Usually the whole point of the program is to work with the data stored in an SQL database.
Vulnerable code is easy to spot once you know what to look for. The wrong way to invoke SQL is to simply perform string concatenation (or interpolation) to create your SQL command using data that hasn't undergone extremely narrow screening. This Perl code is an example of the problem: $cmd = "SELECT salary,lastname,firstname FROM employee WHERE eid=" . $eid; followed by commands to prepare() and execute() the SQL command. Notice that we have simple string concatenation here ("." is the concatenation operation in Perl). If the variable $eid has a simple number, this should work as expected, but if an attacker arranges for $eid to have the value "5 OR 1=1", then $result will end up containing the selected columns from the entire table.
Other tricks include embedding ";" to run other commands, inserting an extra double-quote (") to escape out of a quoted string early, and so on. There are seemingly endless variations on this theme for attacks. Basically, if you're using simple string concatenation or string substitution to create SQL queries, it's likely you're headed for trouble.
There are two prongs of attack to check this potential trouble:
- Check any data before accepting it.
- Use routines that aren't vulnerable.
Before accepting any data, define a regular expression describing the format you want and reject anything that doesn't meet that format. If possible, make sure your format doesn't accept any characters that have syntactic meaning in SQL (such as double-quote and semicolon). If you can, limit the values to letters and digits. By not including whitespace characters and punctuation in the legal list, you can prevent potential avenues of manipulation.
All too often though, you have to accept data that if not carefully handled will cause problems; sometimes data gets through that shouldn't. With that in mind, never just concatenate text to create an SQL command. Instead, look in your library routines to find higher-level routines that will automatically protect you from such mistakes.
Different languages call these different things, such as "bound" or "placeholder" or "prepared" or "parameterized" SQL statements. For example, PHP has a bind_param method for correctly substituting user data. These routines should check to make sure the user-provided data is in the expected format (so that if you say it's a number, the routine won't accept anything else); then insert the data with the proper escaping so that it won't be misinterpreted by the SQL interpreter.
If this level of functionality is not available in the language of your choice, consider creating such routines or at least create specialized routines that check that values match expected formats before combining them with other strings to create the query. If you have to write these routines yourself, make sure that any data gets quoted correctly.
Following are some common examples to illustrate the vulnerabilities we've been discussing.
Unfortunately, passing data from attackers to libraries that interpret that data is common. In C, a common mistake is to pass attacker data into format string parameters (such as the first parameter of printf(3)). printf format strings can also write data (using the %n directive) and reveal arbitrary data, making this a critical vulnerability. Following is an example of this mistake:
printf(bad); /* DON'T DO THIS if attacker can control 'bad' */.
Format string attacks are usually targeted for C/C++ programs, but other languages have format strings, and for them you still need to make sure the attacker doesn't control them. For example, Python has a built-in "%" operator that does formatting (the argument before "%" is the format), so make sure the attacker can't control the format, say by making it a constant.
In Perl, a common mistake is to use the open() function in a way that gives the attacker complete control over the system. Perl's open() function is often far too loose if you actually want security -- for example, if an attacker can arrange for the filename to begin with a pipe symbol or begin/end with spaces, trouble often ensues because open() interprets certain characters specially.
Perl's built-in function also strips off beginning and ending whitespace regardless of whether or not it should. Generally, you're better off using sysopen() instead of open() in Perl (the manual pages perlopentut and perlfunc contain more information).
Be careful when invoking command-line programs. Most Unix-like programs use a leading dash (-) to indicate options. Windows programs usually use slash (/) but sometimes use dash to indicate options. If an attacker can get a leading dash or slash into something that will be passed down to a command-line program, it might get misinterpreted as an option.
Most languages other than C, including C++, can easily store strings with an embedded NUL character (character number zero). In contrast, the C language uses the NUL character to mark the end of a string. That means that most C routines can't handle a string with a NUL character in the middle. This difference seems minor, but consider this: Many libraries (including operating system calls) use C conventions, because practically anything can use the C conventions. So, if a string with an embedded NUL character is passed to a library using a C convention, the string will suddenly be "chopped off" at that first NUL. Attackers can sometimes use this and similar tricks to ensure that what the application program sees and what the called routine sees are different.
Sending data correctly isn't enough. You also need to make sure that you handle whatever comes back correctly too.
Correctly handling responses is actually one of the biggest problems when writing secure code in C (C also makes it hard to avoid buffer overflows). C doesn't include a built-in capability to handle exceptions, so by default it ignores any function returns if you don't do anything with them. Every time you call a function that can return an error, you need to be careful to check that your request produced the result you expected (or handle the error if it didn't). If an attacker could cause it to happen, plan on it occurring.
For example, read(2) can read less than the number of bytes requested and write(2) can write less than the number of bytes requested. Although some don't like it, I heartily recommend that gcc users turn on -Wreturn-type (part of -Wall), which will complain if a function returns a value that's thrown away. You can explicitly throw away values by prefixing it with "(void)", but think carefully before you do it.
This can be a pain at first -- some programmers don't realize that printf() and many other functions actually return a value. If you're worried about security, you should do it. It's not at all unusual for a truly secure program in C to be mostly error-handling code with a small amount of code to handle the "normal" processing.
Thankfully, most other programming languages include exception handling, which can make it somewhat easier to handle error cases. But don't be complacent -- you still have to know what exceptions can be thrown by what you're calling and how to handle them appropriately. In particular, make sure you release any locks held and avoid crashing an entire application due to data sent by an attacker.
Be careful if an attacker could influence the data that's returned. For example, if you call on a DNS resolver to get information (such as the canonical name of an IP address), remember that this data might be directly provided by an attacker. Treat it carefully -- don't assume it will be any particular format or size and don't trust it unless there's a reason to trust it.
Be careful when you handle arbitrary data, just as though it were input (because it is). It might have the NUL character, invalid characters, or anything else that could cause problems.
Protect data between application and component
Ensure that an attacker can't interfere with or read the data going between your application and the component you're invoking. This isn't usually a problem when making a system call to the underlying operating system (at least directly).
In particular, if you run a setuid/setgid program, the mechanisms that let users redirect library calls are disabled -- and as we already discussed, setuid/setgid programs need to erase their environments so that whatever they call will be protected.
But those mechanisms won't handle the wilds of the Internet. If you're invoking a call to a Web service, even if you trust the Web service you need to make sure that an attacker can't interfere with the communication between your application and the service you're using. Ask yourself if you want to prevent an attacker from reading (confidentiality), changing (integrity), and/or halting (availability) the service.
Typical solutions for maintaining confidentiality and integrity involve using pre-existing secure protocols and cryptographic algorithms. Don't re-invent your own protocols and algorithms. Common secure protocols include TLS/SSL, OpenSSH, and IPSec. They will typically employ cryptographic algorithms such as RSA, SHA-1, AES, and Triple-DES.
Create a secure connection to the service that encrypts and authenticates, so you at least have a way to get confidentiality and integrity for the connection to the service. The first step is knowing that you need these protocols and algorithms; setting it up is for another article.
Clearly, you can't have a secure program if you're depending on other components in an insecure way. With the information in this article, I've provided ways in which you can counter many common attacks. But even when you invoke other components securely, the way you send output back can cause the whole house of cards to topple.
Not surprisingly, attackers have found ways to exploit program outputs. We'll see how to counter such attacks in the next article.
-
Read David's other Secure programmer columns on developerWorks.
-
David's book Secure Programming for Linux and Unix HOWTO (Wheeler, March 2003) gives a detailed account on how to develop secure software, including a whole chapter on safely calling out to other programs.
- CVE entry CVE-2000-0727 contains more information about the xpdf vulnerability.
-
The Open Web Application Security Project (OWASP) has a list of what it believes are the"Top Ten Most Critical Web Application Vulnerabilities," including "injection flaws" (the kind discussed in this article).
- For a description of how to bind SQL statements in PHP using the
bind_parammethod and related statements, see the the PHP manual section on mysqli_stmt_bind_param. -
The whitepaper SQL Injection: Are Your Web Applications Vulnerable? by SPI Dynamics has much more information about SQL injection.
-
SLOCCount counts the physical source lines of code in a program.
-
Building secure software: Selecting technologies (developerWorks, February 2002) includes a list of common choices to guarantee security from the start of a program design.
-
The IBM Research Security group site provides a roadmap to the many various security projects undertaken by IBM.
-
Developing secure programs (developerWorks, August 2003) explains how to write secure applications.
-
Server clinic: Practical Linux security (developerWorks, October 2002) outlines a number of ways to keep user accounts clean and safe.
-
Software security principles (developerWorks, October 2000) offers 10 important points to keep in mind when designing and building a secure system.
- Find more resources for Linux developers in the developerWorks Linux zone.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.

David A. Wheeler is an expert in computer security and has long worked in improving development techniques for large and high-risk software systems. Mr. Wheeler is the author of the book Secure Programming for Linux and Unix HOWTO and is a validator for the Common Criteria. He also wrote the article "Why Open Source Software/Free Software? Look at the Numbers!" and the Springer-Verlag book Ada95 The Lovelace Tutorial, and is the co-author and lead editor of the IEEE book Software Inspection An Industry Best Practice. This article presents the opinions of the author and does not necessarily represent the position of the Institute for Defense Analyses. You can contact David at dwheelerNOSPAM@dwheeler.com (after removing "NOSPAM").