When stress test runs are analyzed, the main metric that is currently measured is the average (hereafter called the mean) transaction rate per second. While this metric is of some value, in assessing the success of a stress test run, this metric reveals only a partial story of the larger test run.
By applying these statistical methodologies:
- Modeling the distribution of the dataset.
- Measuring the dispersion of the dataset.
- Applying the median, lower and upper quartiles as measures of location.
Employing these methods, you can enhance your understanding of the data. These methodologies are explained in detail below.
Modeling the distribution of the dataset: Modeling the dataset gives us a high level view of how our data is structured. Having a high level view of our data is most desirable as it enables you to:
- Determine how the data is distributed
- Examine the levels of dispersion
- Identify potential skew and outliers.
In an ideal situation, the majority of our data would be located around a central point, with little spread of data from this central point. This distribution is typically referred to as the normal distribution.
Measuring the dispersion: The dispersion gives us a measure of the range, or spread of data within a given dataset. In ideal conditions, it is preferable to have the majority of our data as close to a central location (e.g. mean or median) as possible. In the example below, we can use a boxplot graph to measure the level of dispersion within a dataset. In ideal circumstances we can see the following:
- The spread of data is quite short (from 1 to 1.8 secs).
- The box length is quite short, indicating little dispersion.
- The central line (median) is almost centrally located within the box indicating a very small degree of skew.
- The graph contains a lack of outliers. (Values far from most others in a dataset).
The dispersion boxplot in Figure one includes other important measurements, including the lower and upper quartiles (if the median splits the data half, the quartiles split the data into quarters), and the lower and upper adjacent value (Adjacent values are observations which are within 1.5 times the lower and upper quartiles).
Figure 1. Boxplot showing a low level of dispersion
Under less ideal conditions we can see that the spread of data varies from one second to 2.8 seconds), and the box length is of moderate length, indicating some level of dispersion). In addition, the median is located to the left of the box indicating some degree of skew, and the graph contains two outliers at 2.9/3.2 seconds, respectively.
Figure 2. Boxplot showing a moderate level of dispersion
Using the median, lower and upper quartiles as measures of location: In this boxplot, the mean and standard deviations are measures of location within a dataset. However, these parameters are less resistant to the presence of outliers. Thus, if outliers are present within the dataset the mean and standard deviation are located accordingly.
Using the median and interquartile range as measures of location can help provide a more robust measure of the location of our data. These metrics are more resistant to the presence of outliers.
Before we start the modeling process, it is important to make clear that our aim is not to find a model that is an exact fit. Instead, the goal is to create a model which best represents our data. As you embark on employing a modeling process, there are a number of factors that deserve consideration.
Our first task is to decide which type of model to use. As we are measuring the duration of each transaction in seconds with a high level of precision (two decimal places) and the fact that our measurements span five days, it is reasonable to assume that we will use a continuous model.
Now that we have chosen our model, we need to decide what continuous distribution to use. Before we can attempt to answer this question, a visual inspection of our data is required. Figure three below shows the histogram of our dataset.
Figure 3. Histogram of transaction response times.
Looking at the histogram, we can see the majority of our data is located around a central peak of 0.14 seconds. However there is some notable right skew.
As the dataset is not in a symmetrical shape, we can clearly rule out that a normal distribution model would not be a suitable choice. We will now turn our attention to other continuous distributions.
A distribution from the “heavy-tailed” family of distributions may make a suitable choice due to the long right tail.
Figure 4. Histogram of transaction response times with log normal curve fitted.
In fact, Figure four shows a histogram of our dataset fitted against a three parameter lognormal distribution curve. While the curve does not fit the data exactly, we can say that the data is a reasonable fit — however, we can see that a number of outliers are still present in the dataset. Note for the purpose of clarity only response times from 0 to 1 second have been shown.
In conclusion, our dataset is asymmetrical and contains considerable degree of right skew, due to the presence of outliers in the dataset. When the dataset was fitted against a three parameter lognormal distribution curve, the data was a reasonable fit.
The next stage of our analysis is to determine the dispersion of our dataset. When we talk about dispersion we are interested in how the data is spread throughout the dataset. Wide dispersion is indicative of a large spread of values, whereas small dispersion indicates data grouped around a central tendency.
There are a number of ways to measure dispersion:
- Standard deviation
There is a good chance that you may have encountered all of these methods during previous analysis work. However, none of these three methods are particularly resistant to outliers. An alternative approach is to use the interquartile range and median, which are more resistant to outliers.
The interquartile range is a measure of dispersion in a dataset. It is the difference between the upper and lower quartiles. The upper and lower quartiles are defined as follows: if the median is the middle data point (50% through the dataset), than the lower quartile is the first quarter data point (i.e. at 25% through the dataset) whereas the upper quartile is the third quarter data point (i.e. at 75% through the dataset).
The best way to illustrate the interquartile range is to create a boxplot graph of our dataset.
Figure 5. Boxplot of the full dataset
Figure five shows a boxplot graph of our dataset. Of note is that the slim grey box on the left hand side is the actual boxplot. The majority of the graph is composed of outliers that range from 0.33 seconds to five seconds. The reason why the graph displays these values as outliers is that due to the calculations that take place behind the scenes these values are deemed too distant from the majority of the data. Why is this the case?
Looking at our full dataset we have roughly two million observations in total, 1.8 million observations (90%) are under 0.33 seconds, while the remainder, two hundred thousand observations (10%) are above 0.33 seconds. Overall, our data is highly dispersed with values ranging from 0.03 seconds to 5 seconds.
So what conclusions can we draw from the composition of the 10% of outliers? We can see that over the duration of the workload not all transactions performed as expected. These poor performing transactions have added unwanted skew into our distribution. Further examination of the transactions contained within the 10% group should be undertaken to determine if there are any underlying trends or patterns at work here.
Now let’s turn our attention to the majority of our transactions and determine how this data is distributed.
Figure 6. Boxplot of our dataset with outliers removed
As mentioned previously, a boxplot is a graphical method of illustrating the dispersal in our dataset. Figure six shows a boxplot of our data with the outliers removed. That is all values greater than 0.33 seconds. We have determined that there are a number of outliers in our dataset, so for clarity we have removed them from any subsequent graphs.
In terms of dispersion, there are three points to note about this graph:
- The interquartile range (IRQ): The IRQ is defined as the distance from the first to the third quartile, or, in other words, this is the width of the box. As we can see the box ranges from 0.11 to 0.20 seconds; this is quite a short duration (only 0.9 seconds and gives us firm evidence of low dispersion in the dataset when the outliers are removed.
- Spread of the adjacent values: The upper and lower adjacent values are defined as values which are within 1.5 times the IQR for both the upper and lower end of the box. These adjacent values show us the spread of values from 0 to 24% and 76 to 100%; in summary, they cover all values outside of the IQR that are not deemed outliers. As we can see the lower adjacent value (which ranges from 0.03 to 0.11 seconds) this value is much shorter than the upper adjacent value (which ranges from 0.20 to 0.33) seconds. As the upper adajacent value is longer than that of the lower adjacent value, there is visual evidence that more values are located in the upper end of the dataset., The position and size of the upper adajacent value is an indicator of skew, in particular right-skew.
- Observing skew: Finally, we will look at how symmetrical the dataset is. As we have seen when we graphed the distribution of the dataset (see Figure five) and when we modeled the dataset we did not find evidence of symmetry. Further evidence for this hypothesis can be obtained by examining the position of the median in the boxplot. We can see that the median is not in the center of the boxplot — in fact, it is positioned slightly to the left hand side of the box, indicating that there is a right skew in the dataset. The actual skew has been calculated as 0.65, which confirms this observation. Note: when outliers are included the skew was found to be 6.09.
In summary, we have measured the dispersion of our dataset by graphical means called a boxplot. The boxplot of the entire dataset showed there was significant right skew, a high level of dispersion and sizable quantity of outliers within our dataset. With further analysis of these outliers we saw a significant number of poorly performing transactions. Further investigation as to why these transactions performed slowly coupled with remedial action is warranted.
When the outliers were excluded, a new boxplot was created. After studying the position of the interquartile range and the adjacent values, we confirmed both visually and through data analysis that there was a low dispersion in this subset of data. After studying the symmetry of the boxplot, there is a small amount of right skew in our dataset.
In our final section, we will look at how the current measures of location; that is, the mean and standard deviation and discuss how susceptible they are to dispersion. We will also discuss alternate measures of location, the median and quartiles and discuss how they are more resistant to dispersion.
Currently in our workload analysis, one of the measures of success is whether our mean transaction rate met a specified target. However, there is a disadvantage of relying on the mean and / or the standard deviation as a measure of location. As we have seen previously, our dataset contains a degree of dispersion due to the number of outliers present.
So how do these outliers and dispersion affect the values of the mean and standard deviation? Looking at Table one below, we can see the effect outliers in a dataset have on the mean and standard deviation.
Table 1. Mean and standard deviation calculations
Looking at the calculations in Table one, we can see that the mean including outliers is 60% greater than the value of the mean when the outliers are excluded. For the standard deviation the effect is more pronounced, we can see that the standard deviation is roughly six times greater when outliers are included. Clearly, we can see that the presence of outliers impacts these measures of location.
Table 2. Median and IQR calculations
|Dataset||Observations||Median||Inter Quartile Range|
Looking at the calculations in Table two, we can see that the median for both datasets is almost identical, likewise for the interquartile range where the values are also very similar. What conclusions can we draw from this? There is no doubt that both the mean and standard deviation are effected by the presence of outliers, while the median and IQR are much less impacted.
In datasets where outliers are evident, the mean and standard should not be used exclusively as measures of location. Instead the median and Inter quartile range should be calculated as they are more resistant measures of location.
The purpose of the article was to introduce and apply some simple statistical analysis methods to the results of a workload run. By using a combination of graphical and numerical methodologies, we are able to make additional qualitative assessments on the dataset.
In our first section we discussed the idea of how to fit a distribution model to our dataset. This gives us a high level view of how our data is distributed.
When our dataset was plotted, we observed that our dataset was asymmetrical and contained a long right hand tail, indicating positive skew. A log normal distribution curve was fitted against the dataset, and while the fit was not exact it was a reasonable fit and therefore not implausible.
In the second section, we measured the level of dispersion of the dataset by graphical means. By using a boxplot graph we were able to illustrate a significant level of right (positive) skew in the dataset. The right skew was composed of a number of outliers, or in real terms a number of poorly performing transactions. Additional investigation as to why these transactions, performed so badly relative to the majority of the dataset is required.
In the final section we showed by example that both the mean and standard deviation are less resistant to dispersion within a dataset. We introduced two additional measures of location, the median and inter quartile range.
By applying both the median and interquartile range as measures of location, we can see that they are more resistant to dispersion. Ideally, all four measures of location should be used when analyzing a dataset with known dispersion.
In an ideal situation we would hope that our data is gathered around a central point (that is, both mean and median are equal) with low levels of dispersion and minimal skew. As we have shown from the analysis of our dataset, that this was not the case.
- Explore the Rational Performance Tester page on IBM® developerWorks® for links to technical information for software developers and testers and to related IBM software.
- Explore the Rational Performance Tester Information Center.
- Learn more on the Rational Performance Tester product page.
- Learn about other applications in the IBM Rational Software Delivery Platform, including collaboration tools for parallel development and geographically dispersed teams, plus specialized software for architecture management, asset management, change and release management, integrated requirements management, process and portfolio management, and quality management.
- Visit the Rational software area on developerWorks for technical resources and best practices for Rational Software Delivery Platform products.
- Explore Rational computer-based, web-based, and instructor-led online courses. Hone your skills and learn more about Rational tools with these courses, which range from introductory to advanced. The courses on this catalog are available for purchase through computer-based training or web-based training. Additionally, some "Getting Started" courses are available free of charge.
- Subscribe to the Rational Edge newsletter for articles on the concepts behind effective software development.
- Subscribe to the IBM developerWorks newsletter, a weekly update on the best of developerWorks tutorials, articles, downloads, community activities, webcasts and events.
Get products and technologies
- Download the fully enabled, free trial version of IBM Rational Performance Tester.
- Download trial versions of IBM Rational software.
- Download these IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Tivoli®, and WebSphere®.
- Join the Performance Testing forum, where you can share you questions and knowledge about IBM performance testing products, including IBM Rational Performance Tester (now integrated with IBM Performance Optimization Toolkit). General performance testing, VU scripting, and load testing topics are also discussed in this forum.
- Check out developerWorks blogs and get involved in the developerWorks community.
Jonathan Dunne has 7 years experience working in Dublin's System Test Team, using the IBM Rational Performance Tester solution as part of the system test infrastructure. Dunne’s career includes significant Java 2 Platform releases including: Workplace Collaborative Learning, LotusLive Engage, IBM Lotus Quickr and .Lotus connections. In addition, Dunne has worked with the National University of Ireland, Maynooth on Network Impairment research projects, and he currently studying Statistics and Probability with the Open University.
Morten Kristiansen is the technical lead on the SVT team responsible for Lotus Connections. He started his career at IBM in 2000 with a senior position in the Lotus Customer Support Organization. Morten's other interests include automation and virtualization. He works to implement process improvements in both his own direct team and his extended team. When not busy with the challenges of being a technical leader in IBM, he is the proud father of two boys.
Amarenda Darisa has 4 years experience working in Dublin's System Test Team, using the IBM Rational Performance Tester solution as part of the system test infrastructure. Darisa’s career includes significant Java 2 Platform releases including: Workplace Collaborative Learning, IBM Lotus Connections, Websphere Portlet Factory and Industry Models.