Skip to main content

skip to main content

developerWorks  >  WebSphere  >

Comment lines: Joey Bernal: Revisiting performance fundamentals

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Anthony (Joey) Bernal (abernal@us.ibm.com), Sr. Consulting IT Specialist, IBM 

22 Aug 2007

Journal icon There are basics you need to know before you start testing or optimizing for performance. Here's a primer for beginners, and a refresher for everyone else.

From IBM WebSphere Developer Technical Journal.

Thinking about performance

I have been focusing lately on portal performance and on helping new consultants on our team understand how to approach and think about performance. In my opinion, this involves thinking end-to-end in terms of how performance measurement and capacity planning can be handled for portal projects and environments. In so doing, it is just as important to think about the overall methodology and approach as it is to understand how to solve specific problems. Of course, when you are under pressure to solve a high profile problem, then all bets are off.

In this column, I will talk about a few random topics that I have been examining that might give you a little bit better understanding of what goes on during the performance testing process. Even though I will only be scratching the surface here, this might provide a good primer on some fundamental performance topics for beginners, and a timely review of the basics for everybody else.



Back to top


Performance terminology

Performance Analysis for Java Web Sites is one of my favorite resources for understanding and explaining performance measurement basics. Someone asked me recently if I thought this book was dated, since it has been out for several years. Regardless, the information is as valid today as it was a few years ago. The basics haven't changed (the tools have, of course), and having the right foundation is important for communicating with our clients and among ourselves.

I try to be pretty consistent with my terms, and tend to repeat them over and over again within specific performance situations. Usually within a day or two, everyone starts using the same terminology. Not to simplify it too much (there is much more then I can describe here), but there are three specific measurements that can help cement your thinking with the right vocabulary. People new to performance terminology often get these confused or have different ideas about what they mean. Let's clear it up:

  • Load is the amount of pressure against a Web site. This always makes me think of a water hose, either turned down to a trickle or turned all the way up to full blast. With Web sites, we talk about load in terms of concurrent users, which does not necessarily mean that every user is requesting a page at the exact same moment, which is a common misconception. It is better to think about load over time; for example, a number of users accessing the site within a specific time frame, perhaps over five minutes or per hour.

  • Response time is the time it takes for the portal or site to respond to the request. This is really end-to-end time from the browser's perspective, and does not normally include time spend by the browser generating or displaying the page. Consider that response time generally will change (it will probably increase) as the load against the site increases, potentially increasing to the point where it is unacceptable to users. Response time is one measurement that gets a lot of attention, and tuning your portal to provide a consistent response time range with the expected volume of user load is your ultimate goal. Response time goals are chosen to follow industry standards; for example, a goal for the site might be to respond to 95% of page requests within five seconds.

  • Throughput is the rate at which a portal can respond to requests. Generally, we think of this as either the hit rate or page rate of the system, with the page rate measurement being more consistent. Throughput, coupled with response time and a model of your users' activity, can help you determine how many users your system can handle (load) within a given timeframe. Throughput is often measured in relation to load, determining where the boundaries of the system might be as user load continues to grow.

These three main terms, working together, will help you begin to understanding how your portal might be performing. (See Resources for more information on these and other performance terms.)



Back to top


Modeling user behavior

User behavior is modeled using a script that is run by the load generator. The load generator initiates virtual users (vusers), with each vuser running through the script as written. Determining how users will access and use the site is probably one of the most challenging parts of the performance evaluation process. Often you simply do not know what users will do or how often they will access the site, and so sometimes it ends up really being a best guess.

When a new portal is being built to replace an existing Web site, you can often look at existing analytics to try and understand how users might behave. However, even this is not enough since your expectation is that usage will increase with your new portal; you hope that users will flock to the portal like bees to honey once they see the value that it brings to their daily work lives. It is true that sometimes the information provided by your existing analytics is enough to start with, but worst case scenarios are often considered to ensure that the portal can handle any expected load. Actually, I’m sometimes grateful when the expected user load has been overestimated, because the site will survive easily when it goes live (although getting the site to reach those load values often requires a serious effort). It is when the load has been underestimated or when no testing has been performed at all that things can get ugly.

Spend time thinking about what users might do on your site and model worst case scenarios to inject into the testing process. If you were part of the design team, it is natural to become accustomed to the portal and lose sight of what an end user might think is important. User Acceptance Testing is one part of the project lifecycle where a fresh viewpoint can be injected into the project, but sometimes this is conducted too late in the lifecycle to make a meaningful difference. Consider soliciting input from outsiders early in the design cycle to help you understand what choices users might make. For worst case scenarios, be sure to include such common occurrences as a user not logging out of the site, large downloads or uploads, complex search queries, complex report generation, and so on, as they apply to your site.

Think time is another aspect of modeling user behavior. In practice, users tend to take a few seconds between page requests, whether it be to read information on the panel, fill out a form, decide what to do next, and so on. Therefore, you might model some users to move quickly to a specific page and then take more time to perform some action that is a part of their job, or you might have users linger on the home page for a few seconds before entering a search request. A load test generator takes think time into account as it runs virtual users (more on this later) through its script. A general rule is that increasing think time will decrease the load a generator will place on the system; decreasing think time will shorten the time needed to complete a user script and therefore increase overall load.



Back to top


Simple calculations

I mentioned throughput and how page rate provides a consistent measurement for many of the performance tests that you will run. Other ways of looking at throughput, such as hit rate or request rate, can be confusing because people often think about them in different ways. A page generally consists of perhaps dozens of requests (hits) as the browser downloads image after image, along with other static components embedded within the page. End users don’t really care about hits in this manner. Getting half the page doesn't count, users are only satisfied when they get the entire page. Measuring the page rate abstracts you from some of the confusion and enables you to get a clear picture of what is going on with your portal.

Most of these measurements are simple calculations, but often they depend upon several factors that can change based on the load and response time of your server. Let's begin with just the base page per second rate calculation and then look at an example. The basic calculation is:


Figure 1. Generic pages per second request rate formula
Figure 1.  Generic pages per second request rate formula

To use this calculation, you need to know a few key values. First is the number of users that you expect will access the server within a given time period; this is the load you are putting on the portal. For example, suppose you have 45 users (or vusers, as the case may be) that can each run through the script within a 5 minute time period. This tells you that:

45 users in 5 minutes = 45 * 12 (the number of 5 minute time periods in an hour) = 540 users in one hour

If each virtual user goes through a script that access 8 pages, then you can determine the page rate:


Figure 2. Example pages per second rate
Figure 2.  Example pages per second rate

How do you determine think time? Often it is a matter of what task the user is going to perform. You want to try and model a user's behavior as much as possible in the test, but sometimes you will need to adjust based on technical limitations. For example, if you think that you are going to have 100 users on the system, but only 50 virtual users are available in your load generator, you can often adjust the test or decrease think time to mathematically apply the required user load. Based on the number of virtual users that you do have available and the average known response time of the site, you can see that:


Figure 3. Generic think time calculation
Figure 3.  Generic think time calculation

To help you calculate the necessary think time so that you can drive sufficient load against the site, assume the following values:

  • Average response time per page request: 3 seconds
  • Throughput: 21 pages per second
  • Available virtual users: 120

The results:


Figure 4. Example think time calculation
Figure 4. Example think time calculation

To drive this type of load, you would need an average of 2.7 seconds of think time between each page request. Perhaps you see an interesting point here: that with 120 virtual users and a 3 second response time, you cannot test for a rate of more then 40 pages per second.

Another approach is to get an estimate of the number of users a system can handle based on the available page per second rate of the system, if known.


Figure 5. Generic user load formula
Figure 5.  Generic user load formula

Consider a site that averages 25 page requests per second. After some consideration, you have estimated that each user will access 14 pages when they enter the portal. From there, you can determine the number of users that the system can handle within a given time frame:


Figure 6. Example user load calculation
Figure 6.  Example user load calculation

Hopefully, you can see how these and other calculations come into play when determining some of the parameters for your testing scenarios. It can be pretty easy to use some of these formulas to derive other values that you might also need. (Remember to recalculate your data back and forth from seconds to hours, depending upon the audience for your results.)



Back to top


The thing about garbage collection

As you can imagine, running out of memory is a bad thing. It's also pretty simple to see the effect running out of memory has on your system: everything stops. Often when servers run out of memory, a memory leak is often the first thing considered, but a very real possibility is that your application might just be using a lot of memory. A memory leak is really identified when a system cannot recover used memory after the load is removed. A simple test for memory leaks is to leave the system running after the last test of the day; if the system has recovered to the original state by the next day, then you can often rule out a leak. If you make this a habit during your testing cycle, you can see if code changes introduce leaks over time that you may not be testing for.

But what if it's not a leak and your application is using up a lot of memory?

Garbage collection (GC) will usually kick in when a memory allocation failure occurs. GC is important for measuring memory usage within a JVM. While actually a very simple set of data, it is one of the key measurements that you can obtain on a server. (To gather this information about memory usage and garbage collection, turn on Verbose GC logging within your application server.) However, too much garbage collection can and will have an adverse effect on your application's performance. Dividing the time spent in garbage collection by the total time of your test can give you with a rough idea of how much time is spent in GC. For example, if analysis of your verbose GC log indicates that total GC time within a 1 hour period is 1402119 ms, then you can determine that:

1402 sec / 3600 sec = 39% (total time spent in GC within a 1 hour period)

The lower the resulting number, the better optimized your system should be. A good guideline might be to shoot for about 10% of your time spent in garbage collection -- but don't let this number become "law" within your environment, because there are other factors to consider:

  • Heavy garbage collection

    Knowing how much time your system spends in GC is all well and good, but how does more time spent doing garbage collection actually affect the performance of your system? Here is an example that shows the effect heavy garbage collection has in a real situation.

    Figure 7 shows a system under load and the garbage collection results during the test. The test is pretty simple: the script simulates three pages that are requested within a few seconds of each other, and the load generator iterates through several thousand users over the space of about an hour.



    Figure 7. Heavy GC example
    Figure 7.  Heavy GC example

    The script ramps up slowly, but right away starts taking up memory and generating lots of transient objects. As new requests come in, memory allocation failures result, as GC kicks in pretty often to clear up space for the new requests.

    The results from the test are shown in Figure 8. Overall response time was 1.4 seconds with a standard deviation under 2 seconds. Maximum response time was over 27 seconds, which could be painful for any users who have to wait that long. The test was able to request a total of 95,000 pages overall.



    Figure 8. Heavy GC page summary
    Figure 8.  Heavy GC page summary

    The individual page response breakdown is shown in Figure 9, with data that is pretty consistent with the overall summary above. The test script made these page requests:

    1. Load the initial logon page.
    2. Logon and display the home page.
    3. Request the MyWorkplace page.


    Figure 9. Heavy GC performance summary
    Figure 9. Heavy GC performance summary

    In the application tested, the home page and MyWorkplace page each display a number of custom portlets that pull data from a back end database. You might notice that there is no logoff action within this script. This is common in many testing scenarios, enabling user data to hang around in memory until the user session times out, or until any caches in use get refreshed. This often provides a more realistic picture of what might happen in production, where users may simply close their browser instead of formally logging off.

  • Adding some breathing room

    Now, let's take a look at another run of the exact same test. Here, you can see the overall GC percentage is much, much lower than before. This can result from one of several actions:

    • Perhaps you were able to increase the available memory within the JVM. This is often a good place to start, but at some point it becomes non-productive -- or just not possible, if you max out the JVM.
    • Another approach might be to perform some optimizations within the code itself to reduce the amount of memory that the application uses.


    Figure 10. Light GC example
    Figure 10. Light GC example

    (In the actual test on which this information is based, I may have actually been able to go a little higher, but I think I was getting CPU bound on the load generation servers. This is definitely something you would want to watch for during a test run.)

    In Figure 11, the overall average response time dropped by slightly more then half, with the standard deviation well under 1 second. This is a significant increase in performance and makes it very plain how too much GC within the JVM can have a negative impact on performance.



    Figure 11. Light GC page summary
    Figure 11. Light GC page summary

    Clearly, the individual page times have also improved. Not only have the averages decreased, but the standard deviation has decreased to below 1 second.



    Figure 12. Light GC performance summary
    Figure 12. Light GC performance summary

    One additional thing to notice in Figure 12 is the "per second" rate for the run. Notice that it is about 3 additional pages per second greater then the earlier test. This small amount scaled over several servers can result in dozens (or hundreds) more users. This is a real indication of how too much GC can truly affect performance.



Back to top


What's the deal with standard deviation?

I mentioned standard deviation a few times above with regard to garbage collection, and this is something that is often overlooked when evaluating performance results. Actually, a colleague and I recently discussed the importance of standard deviation and how it is often simply ignored. The standard deviation, generally identified with the lower case Greek letter for sigma or σ, can help you understand the bigger picture of what is going on within a test. What the standard deviation really tells you is that some set of users were + or - the test average (or arithmetic mean).

Normally distributed data assumes that about 68% of the values in the sample are within 1 standard deviation of the mean. Additional standard deviations can give you a higher percentage or confidence in the results of your analysis. A few of these confidence intervals are as follows:

Standard deviationValue
σ68.26894921371%
95.44997361036%
99.73002039367%

2σ is generally close enough considering the load you are probably generating during your test runs.

As an example, take a look again at the values for Logon in Figure 12. Using the average and the standard deviation, you can interpret that 68% of the attempts were between .2 and 1.3 seconds. This is important, because it shows that looking at just the average is not enough, especially if your standard deviation is a large number. This illustrates that the average is not really what many of your users are getting for a response time. Some receive much faster replies, but an equal number receive much slower replies, which may be above your expected value.

Actually calculating standard deviation can be a little challenging, but luckily many performance tools can help. Even if your tool does not provide these results, you can often dump the raw data into a spreadsheet and calculate it with a formula. Two standard deviations can give you a much clearer picture of your results, and even though many tools do not provide this information easily, interpreting the results is, fortunately, much easier.



Back to top


Conclusion

As performance testing tools get smarter, we are relieved from doing much of the manual work that used to be required to obtain performance data. Still, it is important to understand the performance basics to help you grow your knowledge base, and to help you better understand and communicate the results that are provided. If your tool does not provide the results you need and some manual intervention is necessary, then hopefully this information has provided some good starting points.

I do want to mention that I do not consider myself a performance expert. Maybe I know a little more then the other guy, but compared with folks who make performance their specialty, I have a long way to go. Honestly, I am not sure I would even want to specialize in performance, with all that goes on in this space. The point is that a basic understanding of performance testing and analysis is important to the day-to-day practitioner who wants to improve the projects that they work on and lead. We should all strive to deepen our skills in this area and continue to improve the general body of knowledge.



Back to top


Acknowledgements

Thanks to Aaron Lieber, Mike Cunningham, and Stacy Joines for teaching me pretty much everything I know, and for making sure I don't make a complete fool of myself most of the time.



Resources

Learn

Discuss


About the author

Author photo: Joey Bernal

Anthony (Joey) Bernal is a Sr. Consulting IT Specialist with IBM Software Services for Lotus (ISSL). Having worked with WebSphere Portal since the initial 1.1release, he has an extensive background in the design and development of portal applications, and has led the implementation of many projects using IBM WebSphere Portal. Joey is an accomplished author, speaker, and instructor in various topics concerning WebSphere Portal and related technologies. He is a co-author of the book Programming Portlets.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top