I know there is a big need for it, customers are always asking: “How long does the scanner usually run?”, “What hardware I need?”, “What is the best product?” but can you do it?
Well let’s take a look at what we need for
a solid benchmark. First let’s consider that there are a whole lot of
challenges associated with benchmarks in general, see this Wikipedia
article. Challenges can be summarized
Some industry standards exist to mitigate benchmarking challenges. The wiki article lists BAPCo, EEMBC, SPEC and TPC. However after looking at each I couldn’t find one to match Web Application Security Scanners fully. I turned then towards OWASP (The Open Web Application Security Project) and WASC (Web Application Security Consortium) but I could not find any standardized benchmarking guidelines on their sites. The closest thing was the Web Application Security Scanner Evaluation Criteria on the WASC site however this document is more a list of features that one needs to check for, when purchasing a scanner and doesn’t contain any scoring or testing methodology instructions.
It seems that the field of benchmarking Web Applications Scanners is uncharted and kind of like the Wild West at the moment. I wonder why such a benchmarking standard was not yet put together. Was it because it was hard to achieve or because nobody had time to do it? Maybe is both.
Let’s discuss some of the challenges that a benchmark standard is trying to resolve from the standpoint of Web Application Security Scanning.
Identifying the right measurements
When using a scanner “Coverage” is the first thing that comes to my mind. Security is a very sensitive field. You want to make sure that you scan all the pages and that you catch all the vulnerabilities.
While I don’t see a problem with identifying this aspect I do see a big challenge with measuring it. Although OWASP provides a list of the top ten web application security issues the list is very high level and they also provide the following disclaimer: “This document is first and foremost an education piece, not a standard”.
What I am looking for is a detailed official list of all the types of vulnerabilities out there; I dare to say, going to the variant level. To my knowledge such a list has not been yet put together.
The next in line is “Performance”. Working in the support team for Rational AppScan two
years ago I would often hear the question “How long does a scan usually take?”
so I know that the question is impossible to answer. I know it all depends on:
the number of pages, the number of parameters and cookies, the response time
from the server and many other factors... In the end even provided that you get
all the variables is very difficult to get an accurate result. And while a
scanner might finish faster than another you can’t really use that
I’m sure there could be many other
measurements to add, “Accuracy” and ”Ease of use” come to mind, however first
and foremost “Coverage” and “Performance” are the most important things to test
for in my opinion and
Getting a correct test bed
Then there’s the problem of creating a test application that resembles a real world application. I will give a small example here.
One of our customers has created a little test page for evaluating security products. On this page he has skilfully inserted an SQL Injection vulnerability. The page would return 1 if the attack was successful and 0 if it wasn’t. After testing his page with our product the client told us that our product could not find the issue. Indeed that was true because the difference in the response was of only one character.
On real world sites regular responses to the same page often vary by more than one byte (imagine a page that displays the current time). Our product uses a similarity factor in its comparisons to avoid false positives. Since the responses in question were 99.9% similar the vulnerability was not found. Of course configuring the tool with the similarity factor to 100% allowed us to find the vulnerability however would have caused a lot of false results when our customer started to test actual sites.
So this is how using the wrong test application can create an incorrect evaluation.
You also need to create test applications that are vulnerable to every single variant out there in order to properly test for the most important measurement: the Coverage. Such an endeavour I imagine is very difficult to acquire and impossible too at the moment since a list with all the vulnerabilities doesn’t exist.
Following a scientific method
In my previous example I discussed configuration. The evaluator did not realize the challenge in discovering the vulnerability and was not aware of the configuration necessary to find it. That’s why when engaging in a benchmarking effort you need to be very familiar with the aspects of web application security testing and have intricate knowledge of all the products used.
However can an impartial observer be familiar enough with each product in order to know for sure that they have created the optimal configuration? To be fair you would need to make sure that each vendor is involved in the benchmarking effort or to insure the evaluator is a certified user of each product.
The problems here are that vendors are not
usually invited to benchmarking due to concerns that they will influence the
A scanner is like a car and to perform well it needs a good driver. However although most cars are alike since they are regulated by standards, security scanners are not really all alike since there’s no standard on how a scanner should function. So you need a good driver for each scanner.
Assuming that you did acquire good drivers
for the benchmark you now need a scoring card and methodology for each test
that you perform.
So what can you do?
Make sure that the scanner works on your site and in your environment. No matter how good the scanner is on a benchmarking site it might completely fail to scan your web applications properly.
Work closely with the vendor in configuring your scans. If you are first seeing the product you probably don’t know how to use it. So it’s good to get advice from the experts. Not only that you will have conducted a fair evaluation but you also get to learn the product in the process.
If you don’t have enough time to evaluate all products out there, select the market leaders. Imagine that many other people have tested these products before you, and market leaders won thousands of evaluations on real world applications. In the end that’s the real benchmark.