In this second part of a five-part series, we continue to introduce the new features of PHP V5.2, this time focusing on input filtering.
Accepting user input or any other data from an untrusted source is one of the most common risks a PHP developer can take when developing applications. You often need to bring in data from unknown sources to make your applications work, but this presents an opportunity for hackers to insert arbitrary code or otherwise exploit your application. From PHP V5.2 onward, the input filtering extensions will be enabled by default to make it easier to take measures against this type of activity. The input filtering extensions provide a set of functions to parse and check input before you use it in your functions.
We will look at reasons for why you would use these functions for parsing and checking input, instead of hand-coding, and we cover some basic examples of how to use these new functions.
Input is what most applications are all about. Our applications take in massive amounts of information and crunch it with the power of the microprocessor. We can control where the data comes into our applications, but we cannot control the intent of the user who inputs this data. PHP was originally developed as an easy way to develop scripts to collect input from HTML forms. It has grown from this seed into much more, but the fact remains that we use it to gather and manipulate data from many untrusted sources. Even the lonely data-entry clerk must be considered an untrustworthy source, even though he may unintentionally send our application some troublesome data.
Security is an issue that is currently hot and growing hotter with each passing revision of hardware and software. We routinely pay lip service to it, but because the matter is complex, we tend to fail in protecting our systems where it matters -- at the points of input. Methodologies are numerous and complex to implement. We have tight deadlines to get our applications completed and fulfill our project manager's deadlines.
One way to protect yourself from malicious code is to make sure that what you expect as input is actually what you receive as input. In this article, we will look at an example that stems from a user's input of inappropriate JavaScript and show you how to strip those tags out of the input, returning exactly what you expect.
The input filtering extension has been a part of PHP since V5.0 to make security easier for you to include. This release of PHP V.2 marks the first time that it is enabled by default and ready to use out of the box.
You may have heard the term input validation in your programming experience. It is a critical part of the application development process to make sure the incoming data is correct in context and content. Programmers have the power of regular expressions and tests to see if a value meets certain criteria, such as when you require an e-mail address and need to make sure the input is a properly formatted e-mail address. This can also help if you want to exclude free e-mail addresses like Hotmail or Gmail. You would use a regular expression to block any e-mail address that contained "hotmail.com" or "gmail.com."
This type of handling basic checking to make sure the data fits certain criteria, is normally considered validation. In this case, you need to be sure you don't have to clean up data, or that you are actually getting the data you are looking for. The goal with validation is to avoid some simple errors, such as trying to put a NULL into a field that doesn't accept them.
Errors caused by incorrect or maliciously crafted data can have devastating consequences. You will need to focus on a more complete way of filtering this information that allows you to cover all of the possible tests per data type, without monotony or repetitious tasks. All too often, you might overlook a test or write an incomplete regular expression. The filter extension helps provide more complete assessment of input while reducing repetitive code writing efforts. In this case, filtering distinguishes itself from validation by being more complete with a security focus.
Filter types and how to choose
Let's take a closer look at the details of the extension. The filter extension has two types of filters: sanitizing and logical.
Sanitizing filters simply allow or disallow characters in a string and return a cleansed string back as a result. No matter what data format you put into these functions, they will always return a string. For certain types of exploits, this is crucial, as you can stop a user from sending inappropriate input and causing an unexpected result. For instance, a user could spot that the input for a text block is being echoed on the following page and take advantage of that echo. If you sanitize the input, you would remove all the hazardous portions of the input.
Logical filters perform tests on the variables and provide a true or false result based on the test. You can then use the result to make decisions on what to do with the data or how to address the user. A simple example of this is when you are validating age. The logical test can also test against a Perl-like regular expression.
Let's take a look at some ways you can use these functions, in both the filtering and validation scenarios, to get an idea of how filtering can be included in your applications.
Let's start by using a cleansing application of the filter extension to filter out unwanted code blocks.
In this example application, you have a simple form that accepts three answers to three questions. The form itself has no validation or filtering employed. While looking at the source code, an evil user believes this is true and decides to test the form by writing a text block that includes a JavaScript call that throws an alert. This will prove him correct if the alert pops up, telling him that anything he enters will indeed be accepted and used by the application with no filtering.
Listing 1. A simple input form
<html>
<body>
<p>What is your name?</p>
<form name="form1" method="get" action="filteringexample1a.php">
<p>
<input name="1" type="text" id="1">
</p>
<p>What is your favorite color?</p>
<p>
<input name="2" type="text" id="2">
</p>
<p>What is the airspeed of an unladen swallow?</p>
<p>
<textarea name="3" id="3"></textarea>
</p>
<p>
<input type="submit" name="Submit" value="Submit">
</p>
</form>
</body>
</html>
|
Listing 1 is a common HTML form that accepts user input. You ask the user for some basic information and simply repeat it back, without any other operations. In Figure 1, you can see that the user is already inputting some information and getting ready to send the JavaScript code, which will send an alert if no sanitization is done.
Figure 1. The input form
Next, you will create a script to catch the form variables and print them out in another page.
Listing 2. The PHP form
<body> <?php echo "You are " . $_GET['1'] . ".<br>\n"; echo "Your favorite color is " . $_GET['2'] . ".<br>\n"; echo "The airspeed of an unladen swallow is " . $_GET['3'] . ".<br>\n"; ?> </body> |
Now you can see what will happen when you simply regurgitate this input without using filtering of any kind.
Figure 2. The output from the form
As you can see, the evil user has executed some arbitrary code. Granted, it was only on the client side, which amounts to the evil user's own PC and doesn't represent a threat to your application in this case, but it is certainly not valid input.
Listing 3 illustrates a simple example of sanitization using the FILTER_SANITIZE_STRING option for filter_var().
Listing 3. Additions to the receiving PHP script
<?php echo "You are " . filter_var($_GET['1'], FILTER_SANITIZE_STRING) . ".<br>\n"; echo "Your favorite color is " . filter_var($_GET['2'], FILTER_SANITIZE_STRING) . ".<br>\n"; echo "The airspeed of an unladen swallow is " . filter_var($_GET['3'], FILTER_SANITIZE_STRING) . ".<br>\n"; ?> |
As you can see, you begin to use the filter_var() function to clean up the input and make it usable and safe. In this case, you use the option FILTER_SANITIZE_STRING, which takes input and removes any HTML tags and optionally encodes or removes special characters.
Because it strips the HTML tags, the attempt to run a JavaScript fails, and you get a much more appropriate result from the script.
Figure 3. Sanitized output from the revised PHP script
The filter_var() function greatly reduces the amount of coding compared to if you were to try to do this manually with a regular expression and string functions. In that case, you would have to write several more lines of code and an explicit regular expression that would excise only HTML code blocks, as well as parse the entire block of entered text to ensure that no HTML tags slipped past. It is far easier instead to use the provided function.
User input -- Highly illogical
Some input is numeric or has a specified pattern In these cases, you can use the logical filtering options and check data against an expression or other logical tests. If the expression reads true, the data will come through the filter. If the data proves false against the logical test, it will not come through the filter. First we will set up a simple logical test that checks to make sure the speed of a swallow falls within a specified range, since the airspeed of a swallow may vary. Once the test is set up, we will add some code to tell the user if the answer was incorrect. A correct answer should simply be repeated.
Listing 4. Addition to the receiving PHP script
<?php
echo "You are " . $_GET['1'] . ".<br>\n";
echo "Your favorite color is " . $_GET['2'] . ".\n\n";
$minSpeed = 12;
$maxSpeed = 13;
$airspeed = filter_var($_GET['3'], FILTER_VALIDATE_INT, array("options" =>
array("min_range"=>$minSpeed, "max_range"=>$maxSpeed)));
echo "The airspeed of an unladen swallow is " . $airspeed . ".\n\n";
?>
|
Listing 4 will check the answer to the unladen swallow question against the minimum and maximum range of speeds. This could have simply made sure it was really an INT variable, but we also added a range in $minSpeed and $maxSpeed. If a user answer is not in the range of acceptable values, the result will end up FALSE. This can then be tested and the proper answer returned to the user. In this case, we will be completely blunt.
Listing 5. Addition to the receiving PHP script
<?php
echo "You are " . $_GET['1'] . ".<br>\n";
echo "Your favorite color is " . $_GET['2'] . ".\n\n";
$minSpeed = 12;
$maxSpeed = 13;
$airspeed = filter_var($_GET['3'], FILTER_VALIDATE_INT, array("options" =>
array("min_range"=>$minSpeed, "max_range"=>$maxSpeed)));
If ($airspeed === FALSE){
Echo "<h1>WRONG!</h1>";
}else{
echo "The airspeed of an unladen swallow is " . $airspeed . ".\n\n";
}
?>
|
We have successfully tested the incoming data for compliance with the logical test and have either shown the user the correct result or some negative feedback. Either way, we have used the filters to ensure that the correct response is given to the input.
Part 2 of this five-part "What's new in PHP V5.2" series focused on input filtering. We looked at the differences between simple input validation to catch simple errors that don't require any cleaning up, and input filtering designed to catch potentially more devastating input errors and problems. We then looked at the two types of filter extensions, sanitizing and logical, and walked through examples of both. In "What's new in PHP V5.2, Part 3," we look at the new JavaScript Object Notation (JSON) extension, which provides PHP developers with better support for Ajax applications using JSON.
Learn
-
Read "What's new in PHP V5.2, Part 1" and Part 3 of this series.
-
Read the PHP 5.2 release announcement.
-
For a good article about using regular expressions in PHP, see "How to use regular expressions in PHP."
-
Get the PHP.net filter functions documentation.
-
Visit PHP.net for PHP documentation.
-
For additional useful examples of filtering and validation, read "Spam filtering techniques."
-
For an article discussing Register Globals and security issues, see "Auditing PHP, Part 1: Understanding register_globals."
-
For further discussion about PHP application security, see "Locking down your PHP applications."
-
"A step-by-step how-to guide to install, configure, and test a Linux, Apache, Informix, and PHP server" contains a section on compiling the PHP interpreter for Linux®.
-
Learn how to migrate code developed in PHP V4 to V5 in "A PHP V5 migration guide."
-
For tutorials on learning to program with PHP, check out the developerWorks "Learning PHP" series.
-
Planet PHP is the PHP developer community news source.
-
PHP.net is the resource for PHP developers.
-
Check out the "Recommended PHP reading list."
-
Browse all the PHP content on developerWorks.
-
Expand your PHP skills by checking out IBM developerWorks' PHP project resources.
-
To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
-
Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
-
Visit Safari Books Online for a wealth of resources for open source technologies.
Get products and technologies
-
Innovate your next open source development project with IBM trial software, available for download or on DVD.
Discuss
-
Participate in developerWorks blogs and get involved in the developerWorks community.
-
Participate in the developerWorks PHP Developer Forum.
Comments (Undergoing maintenance)





