Skip to main content

User experience, not metrics part 6: What is an outlier and how do I account for one

Scott Barber, Performance testing consultant, AuthenTec

Currently, Scott Barber serves as the lead Systems Test Engineer for AuthenTec. AuthenTec is the leading semiconductor provider of fingerprint sensors for PCs, wireless devices, PDAs, embedded access control devices and automotive markets. He is also member of the Technical Advisory Board for Stanley-Reid Consulting, Inc.

With a background in consulting, training, network architecture, systems design, database design and administration, programming, and management, Scott has become a recognized thought leader in the field of performance testing and analysis. Before joining AuthenTec, he was a software testing consultant, a company commander in the United States Army and a government contractor in the transportation industry.

Scott is a co-founder of WOPR (the Workshop on Performance and Reliability), a semi-annual gathering of performance testing experts from around the world, a member of the Context-Driven School of Software Testing and a signatory of the Agile Manifesto. He is a discussion facilitator for the Performance and VU Testing forum on Rational DeveloperWorks and a moderator for the performance testing and Rational TestStudio related forums on QAForums.com. Scott speaks regularly at a variety of venues about relevant and timely testing topics. Scott’'||CHR(59)||'s Web site complements this series and contains much of the rest of his public work. You can address questions/comments to him on either forum or contact him directly via e-mail.

Summary:  If a Web page that normally loads quickly takes longer than expected during your performance test, it will give you an unrepresentative measurement known as an outlier that can skew your results. This article shows you how to identify outliers and eliminate them from your test results.

Date:  30 Jun 2005
Level:  Introductory
Activity:  1045 views

This article was orignally published in December, 2001 and refers to Rational Suite TestStudio

Everyone who uses the Internet on a regular basis knows that once in a while a Web page that normally loads quickly takes longer than expected. If this happens during your performance test, it will give you a single unrepresentative measurement known as an outlier that can skew your results and make them inaccurate. This article shows you how to use Rational TestStudio to identify outliers and how to eliminate them from your test results and recreate performance report output in Microsoft Excel.

This is the sixth article in the "User experience, not metrics" series, which focuses on correlating customer satisfaction with your Web site application's performance as experienced by users. Here's what the series has covered so far:

Part 5, on placing timers in scripts and understanding what those timers measure, was the first of three articles that focus on capturing and interpreting Web site response times during a load test and interpreting the patterns formed by those times. Starting with Part 8, we'll discuss how to organize these time measurements and their interpretations into easy-to-understand reports for project stakeholders. The concepts and methods discussed in this series are based on Noblestar's years of experience and have been derived through dozens of performance engineering projects that all included actual performance tuning of systems.

Before reading this article, you should know how to create scripts with delays and timers to represent your user community model. This article is intended for all levels of IBM® Rational Suite® TestStudio® users and may also prove useful to managers of projects where performance testing will occur.

What's an outlier?

If we were to ask statisticians what an outlier is, they would tell us that any measurement that falls outside of three standard deviations, or 99% of all collected measurements, is an outlier. One standard deviation is the variance in measurements that encompasses 68% of all measurements. For example, if your sample of 100 measurements has an average of 10 seconds, and if 68% of all measurements fall between 7 seconds and 13 seconds, the sample has a standard deviation of 3 seconds. Three standard deviations in this case would encompass a range of 1 second to 19 seconds, and any measurement of less than 1 second or greater than 19 seconds would be an outlier.

The problem with this definition is that it assumes that your collected measurements are distributed normally, which is rarely the case for Web-based page response times. A more useful definition of an outlier for our purposes can be found in the StatSoft Inc. Statistics Glossary:

"Outliers are atypical (by definition), infrequent observations; data points which do not appear to follow the characteristic distribution of the rest of the data. These may reflect genuine properties of the underlying phenomenon (variable), or be due to measurement errors or other anomalies which should not be modeled."

This definition is illustrated by the examples in Figure 1. The datapoints depicted in red that don't fall along the trend lines are outliers by this definition.

Visual examples of outliers
Figure 1: Visual examples of outliers

There's still a problem with this definition for our purposes, and it has to do with statistical significance. In the first graph in Figure 1, we see that three of eleven datapoints qualify as outliers. If you've ever taken a statistics course, you'll realize that we can't dismiss as outliers nearly 30% of all collected datapoints. To determine how many datapoints we can reasonably dismiss as "atypical infrequent observations," we must first determine if the sample of measurements is statistically significant.

In Part 8 of this series we'll discuss in detail how to ensure statistical significance, but for now we can take a common-sense approach. Because it's easy to add iterations to our tests to increase the total number of measurements collected, we can ensure that we have at least 100 measurements of each timer before attempting to determine if there are outliers. Then, if we see evidence of outliers, we can re-execute the tests and compare them. If the majority of the measurements are the same, plus or minus the outliers, the results are likely to be statistically significant.

Now, assuming we have a statistically significant sample of measurements, we can continue our discussion on outliers. I submit that there's no set number of outliers that can be dismissed but rather a maximum percentage of the total observations. If we apply the spirit of the two definitions that we've discussed, we come to the conclusion that up to 1% of the total measurements, or those that are beyond the third standard deviation, are significantly outside the rest of the measurements and can be considered outliers.


Identifying outliers in performance test results

Now that we have a working definition of an outlier, let's discuss how to identify them. Creating a scatter chart -- one variety of the Response vs. Time charts available in IBM® Rational Suite® TestManager® -- makes it easy to identify outliers. To display our measurements in this kind of chart, we execute a test and then click the Resp vs. Time button. Once a chart is displayed, we choose View > Settings from the menu bar to get the window shown in Figure 2. We click the Scatter radio button in the Graph Type list and then click OK to display our scatter chart.

Choosing a scatter chart in TestManager
Figure 2: Choosing a scatter chart in TestManager

The vertical axis of the scatter chart in Figure 3 measures response time in milliseconds, and the horizontal axis measures the number of milliseconds since the start of the test run. Each red dot represents a command ID or timer measurement. This example shows the results from a performance test consisting of 100 measurements against the Noblestar.com Web site taken by each of two timers (Home Page and Page1), which you may recall from Part 5.

Response vs. Time scatter chart
Figure 3: Response vs. Time scatter chart in TestManager
(click here to enlarge)

According to our definition of an outlier, we can identify up to 1% of our total measurements, which in our example case amounts to one measurement, as outliers. And indeed, there is one measurement on the chart that's "significantly outside the rest of the measurements." With all other transactions taking less than 15 seconds, the one that took nearly 250 seconds definitely qualifies as an outlier.


Eliminating outliers from test results

Now that we've identified an outlier in our performance test results, we want to eliminate this outlier so it doesn't skew the results. TestManager doesn't give us the ability to delete individual measurements from our results, so instead we need to copy the raw data into Microsoft Excel, delete the outlier, and recreate our graphs and tables there. If you're familiar with Excel, the instructions that follow may be obvious to you. If you're new to Excel, I encourage you to follow along with results of your own. We'll be expanding on the use of Excel in later articles when we discuss reporting. If you use a spreadsheet program other than Excel, the concepts in this section will still apply, even if the specific steps don't.

Copying data into Excel and deleting the outlier

Before we put our data into Excel, we need to decide exactly which data we want from TestManager. Since we're only interested in the data related to timers and not all of the individual command IDs, we'll limit the results in the table to just timers.

  1. Choose View > Settings from the TestManager menu bar and click the Select Command IDs tab. By default, all timers and command IDs appear in the Selected list on the right-hand side.
  2. Click the << button to move all of the timers and command IDs to the Available list on the left-hand side.
  3. Select each timer we want to include, in our case Home Page and Page1, and click the > button to move them back to the Selected list (see Figure 4). Then click OK.
Selecting our timers in TestManager
Figure 4: Selecting our timers in TestManager

Now we return to the scatter chart, where we see that only the timers are included in the table below the chart (see Figure 5). The three columns that we need to copy in order to recreate this chart are Cmd ID (the timer or command ID name), Ending TS (ending time stamp, or the point in time when the response has been completely received), and Response (the total response time measured by the timer).

first three columns highlighted
Figure 5: Our scatter chart data for timers only

To copy the data, we proceed as follows:

  1. Highlight the first three columns of the table, right click, and choose Edit > Copy from the menu bar.
  2. Close TestManager and launch Excel.
  3. Select the top left cell in the open Excel worksheet, right-click, and choose Edit > Paste from the menu bar.

We now have our data in an Excel worksheet. The easiest way to find the outlier is to sort the data by response time.

  1. Highlight the three columns and choose Data > Sort from the menu bar.
  2. Choose Column C from the "Sort by" pull-down list and click the Descending radio button, as shown in Figure 6. Then click OK.
Sorting by column C in descending order
Figure 6: Choosing to sort by column C in descending order

When we execute this sort in our example case (see Figure 7), we immediately find the outlier that's so apparent in the scatter chart shown in Figure 3. Then we simply delete the row containing the outlier, and we're ready to recreate our charts.

outlier highlighted in Excel
Figure 7: Example data sorted in Excel, outlier highlighted

Recreating the scatter chart

Recreating the scatter chart in Excel is really quite easy.

  1. Highlight columns B and C, then choose Insert > Chart? from the menu bar.
  2. On the Standard Types tab, in the "Chart types" list, select XY (Scatter), and click Finish.

Figure 8 is the scatter chart created with our example data.

Scatter chart without outlier
Figure 8: Scatter chart of our data without the outlier
(click here to enlarge)

Comparing this chart to Figure 3, you see that the scale on the left-hand side only goes up to 12,000 milliseconds (12 seconds) instead of 250,000 milliseconds (250 seconds), the dots are blue instead of red, and there's a box on the right-hand side that says "Series1." Otherwise the two charts are the same. In future articles, you'll learn how to clean this chart up and make it easier to understand, as shown in Figure 9.

Cleaned up Scatter chart
Figure 9: Scatter chart customized
(click here to enlarge)

Recalculating performance report values

Figure 10 shows the performance report output from our example test run as it appears in TestManager before eliminating the outlier. (I modified the settings of this chart to include just the two timers we've been discussing in the same way the scatter chart settings were modified.) Note that the standard deviation for Home Page is greater than the average, a sign that the sample is mathematically skewed and deserves a closer look to determine validity. We're going to recalculate the average, standard deviation, minimum, maximum, and percentile values in Excel now that we've eliminated the outlier.

Performance report output
Figure 10: Performance report output in TestManager
(click here to enlarge)

Since we haven't eliminated any outliers from the Page1 measurements, we won't be doing any calculations with those measurements, but we'll later transfer the calculations for them shown in Figure 10 to our new table. To calculate the new Home Page values, we use the values that we've already pasted into Excel. First we need to separate the Home Page measurements from the Page1 measurements.

  1. Highlight the three columns of data in the open worksheet, choose Edit > Copy, open a new worksheet, and paste the values into the top left cell.
  2. Highlight the three columns of data in the new worksheet and choose Data > Sort.
  3. Choose Column A from the "Sort by" pull-down list and click the Ascending radio button. Then click OK.
  4. Scroll down the page, highlight all of the rows that start with Page1, and delete them, since we won't be using them for this set of calculations.

It will help us stay organized if we add column heads and row labels before starting the calculations. Once we've done this, the first 10 rows of the worksheet for our example test run will look like Figure 11.

First 10 rows of Excel worksheet
Figure 11: First 10 rows of Excel worksheet
(click here to enlarge)

Now for the calculations. We follow the same basic procedure for determining all 10 values.

  1. Place the cursor in the cell next to Home Page below NUM.
  2. Choose Insert > Function from the menu bar.
  3. Select Statistical from the "Function category" list.
  4. The first function we'll need is COUNT, so select this from the "Function name" list, as shown in Figure 12. Then click OK.
Choosing a function for the calculations
Figure 12: Choosing a function for our calculations

When the next screen appears (Figure 13), all we have to do is highlight all of the values in column C and click Enter or OK.

Specifying values for the COUNT calculation
Figure 13: Specifying values for the COUNT calculation

We follow exactly the same procedure for calculating the MEAN, STD DEV, MIN, and MAX. The function names for finding those values are AVERAGE, STDEV, MIN, and MAX, respectively.

All of the values in the worksheet are in milliseconds. To convert to seconds as in the TestManager output, we simply need to divide the results by 1000.

  1. Put the cursor in the cell under the MEAN heading. In the formula bar, the function ? in our case, "=AVERAGE(C1:C99)" ? will appear rather than the value.
  2. Add "/1000" to the formula ? so that in our case it looks like "=AVERAGE(C1:C99)/1000" ? and click Enter.
  3. Make this same modification to every function other than COUNT.

To calculate the percentile values, we follow much the same procedure, except this time we use the PERCENTILE function. (In case you're not familiar with what the percentile calculations represent, here's an example: If we have 100 measurements ordered from greatest to least and we count down the 5 largest measurements, the next largest measurement is in the 95th percentile of those measurements. For our purposes, this would be read as "95% of all users would experience a response time of this value or less under the same conditions as the test execution.")

  1. Place the cursor in the cell below 50th.
  2. Choose Insert > Function from the menu bar.
  3. Select Statistical from the "Function category" list.
  4. Select PERCENTILE from the "Function name" list, then click OK.
  5. On the next screen (see Figure 14), with the cursor in the Array field, highlight the response time values once again (in our case C1:C99), and in the K field enter the decimal number representing the percentile we want to calculate (for example, the K value for the 50th percentile is 0.5). Then click OK.
Values for the PERCENTILE calculations
Figure 14: Selecting values for the PERCENTILE calculations

Now we can copy the results for Page1 directly from the performance report output table in TestManager and paste them into our new worksheet. To complete our worksheet, all we have left to do is format the cells to the proper number of decimal places ? in our case, two.

  1. Highlight all of the cells with times in them, not cells with labels or in the NUM column, and choose Format > Cells from the menu bar.
  2. In the Format Cells window, under Category, select Number and ensure the number 2 is in the "Decimal places" field, as shown in Figure 15.
Formatting cells with two decimal places
Figure 15: Formatting cells with two decimal places

The resulting table is shown in Figure 16. Notice what a difference it makes in the values for the Home Page timer to eliminate just one value. For example, the average is now 4.53 seconds, where it was 6.93 before, and the standard deviation is 1.47 seconds instead of 23.92. If we were to return to the statistician who gave us the definition of an outlier involving three standard deviations, that statistician would tell us that we now have a statistically valid sample.

Performance results output table
Figure 16: Performance results output table in Excel

Recreating the performance response chart

The last step in our journey to recreate the output automatically generated by TestManager is to recreate the graph that's the top portion of Figure 10.

  1. Copy to a new Excel worksheet the corrected performance results output table, but without the NUM, MEAN, and STD DEV columns.
  2. Highlight this entire table and choose Insert > Chart from the menu bar.
  3. On the Standard Types tab, in the "Chart types" list, select Column Chart, and click Finish.

For our example test run, this results in the chart in Figure 17.

Performance report output chart
Figure 17: Performance report output chart from our test run
(click here to enlarge)

Again, aside from different colors and values (based on the removal of the outlier), the only difference from the TestManager output that we see is the box on the right-hand side with a key to the colors.

If you're interested, you can look at the Excel spreadsheet I created while writing this article.


Now you try it

To try the method of eliminating outliers from performance test results outlined in this article, simply look at any scatter chart from a previous test execution and see if you have any outliers. When you find one that does (and if your testing experience is anything like mine, it won't take long), follow the steps in this article to eliminate the outlier and recreate the tables and graphs in Excel.


Summing it up

You've seen just how much a single unrepresentative measurement can skew the results of a performance test. To avoid this problem, you could execute the test repeatedly until you got lucky enough to capture measurements with no outliers whatsoever, but I've shown you an easier way. After creating a scatter chart in IBM® Rational Suite® TestManager® to help you identify outliers, you can copy your data into an Excel worksheet to actually eliminate the outlier and recreate the performance report output.


Related Resources

Mathematical equations for traditional statistical analysis are available from the Web site posted by the faculty of health sciences of the University of Ottawa, Canada, for a biomedical data analysis and measurements course.


About the author

Currently, Scott Barber serves as the lead Systems Test Engineer for AuthenTec. AuthenTec is the leading semiconductor provider of fingerprint sensors for PCs, wireless devices, PDAs, embedded access control devices and automotive markets. He is also member of the Technical Advisory Board for Stanley-Reid Consulting, Inc.

With a background in consulting, training, network architecture, systems design, database design and administration, programming, and management, Scott has become a recognized thought leader in the field of performance testing and analysis. Before joining AuthenTec, he was a software testing consultant, a company commander in the United States Army and a government contractor in the transportation industry.

Scott is a co-founder of WOPR (the Workshop on Performance and Reliability), a semi-annual gathering of performance testing experts from around the world, a member of the Context-Driven School of Software Testing and a signatory of the Agile Manifesto. He is a discussion facilitator for the Performance and VU Testing forum on Rational DeveloperWorks and a moderator for the performance testing and Rational TestStudio related forums on QAForums.com. Scott speaks regularly at a variety of venues about relevant and timely testing topics. Scott’'||CHR(59)||'s Web site complements this series and contains much of the rest of his public work. You can address questions/comments to him on either forum or contact him directly via e-mail.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=4240
ArticleTitle=User experience, not metrics part 6: What is an outlier and how do I account for one
publish-date=06302005
author1-email=dwinfo@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers