Using engine room and kernel IO data in performance test analysis

Performance testing has multiple phases, starting from requirement analysis, moving to workload design, creating, and running test scripts, and then the most important phase: analysis of the results. Prashant Chaudhary describes some of the common decisions in using IBM® Rational® Performance Tester software and describes the value of the information and insights that you can gain from using engine room and kernel IO data.

Prashant Chaudhary (prashantchaudhary@dc.ibm.com), Senior System Analyst, IBM

author photoPrashant Chaudhary is a senior system analyst on the IBM Global Process Services Delivery team, in India. He is responsible for enterprise application performance testing using IBM Rational Performance Tester, as well as functional testing and functional automation using IBM Rational Functional Tester tools. He has experience in network sniffing and analysis of web traffic, web log analysis, web logic server administration, and performance tuning using Quantify. Prashant is an active member of various forums and communities related to performance testing. He holds IBM professional certification for both Rational Functional Tester for Java and Rational Performance Tester V8.



17 April 2013

Purposes and decisions in performance testing

There are several decisions that anyone using IBM® Rational® Performance Tester software must make:

  • How to set different workload models for performance testing
  • How to determine the virtual user count that the workbench machine can support for test scripts
  • Big test scripts that lead to out-of-memory errors
  • How to analyze performance test reports to find performance bottlenecks

The sections that follow describe ways to resolve those dilemmas.


Plan the workload

Design workload models according to the needs of your project. Usually, there is a baseline test plus workload models for load, stress, and endurance tests. Other than that, you can add new workload models based on your need. These are the basic models:

Load test
Load tests are usually performed on a load that is expected, so we know that the application can sustain it or that it was designed to handle such loads. For example, if you are adding n number of users, what is the TPS (transactions per second) and throughput reached?
 
Stress test
A stress test puts more user load than expected on the application to test the response with an excessive load. In stress testing, we usually go for 150% of expected load to try to find the system break point and, at the same time, the behavior for several important parameters (response time, server throughput, and so forth).
 
Endurance test
We analyze application response in a test run of long duration (usually several days) when server CPU utilization reaches a high value. An endurance test is carried out to understand application server and database server behavior when putting a continuous load on them for long test run. At the same time, we monitor server resources so we can analyze which operating system parameters are behaving abnormally.
 
Baseline test
This involves running a script with a single user and multiple iterations (usually10 iterations), just to get an idea of what the average response time of a web page is when the page is accessed 10 times. We perform baseline testing to find a value to reference during load testing, so we can reach a conclusion considering baseline data for single user.
 
Spike test
Spike testing is done by suddenly increasing the number of user or the load generated by users by a very large amount and observing the behavior of the system. The goal is to determine whether performance will suffer, the system will fail, or it can handle dramatic or sudden changes in load. A good example of this is considering a portal application for a company that sells something. On announcing any discount, there is sudden increase in user traffic. Some systems crash during such sudden increases. In spike testing, we also decrease the user load and analyze the system's behavior during a sudden decrease.
 
Failover test
Basically, failover is switching to a backup or standby when a previously active application, system, hardware component, or network fails. The goal is to find out how it recovers and handles user requests.
 
Negative test
This is to test application response and behavior when some abnormal changes in the application's configuration is made at the application server or database server level.

Create the test script and test schedule

Test script creation holds the key to the mechanism in which the performance test gets executed. Therefore, it is important for you to know the checklist items that you need to take care of while creating test scripts, how to optimize them for optimal test playback, and so forth.

When creating test scripts, always keep in mind that if the scope of a scenario is big, it is best to break it into smaller chunks. Otherwise, it will impact the test schedule and can result in an out-of-memory error.

Edit the test script and create the schedule:

  1. After recording is finished, review the recorded script and edit it if necessary.
  2. Check for data pooling, and set the data pooling according to the requirement.
  3. Create a schedule for a warm-up run.

Do a warm-up run to calculate memory use of scripts

When your scripts are ready, run your scripts with 10 users and post the execution by following these steps:

  1. Go to the deployment_root directory (under the Workspace folder).

    Note: If you specify "Run this group on the following remote locations," the kernelio.dat file will be under the deployment_root directory in the remote location that you use.
  2. Search for the kernelio.dat file for warm-up run.
  3. Open the kernelio.dat file in Notepad.
  4. Check the maximum heap value. Depending on the heap value, you can analyze your workstation's capacity for the performance test run for that particular script.

    Example: If your warmup run is for 10 users and the maximum heap value is 13285376 bytes, you can calculate the number of virtual users depending on the capacity of RAM available on your local (workbench) machine.

See Figure 1 for various parameters that the kernelio.dat file captures during a test run.

Figure 1. Kernelio.dat files with various attributes
capture of screen output

Do a quick configuration check before running the schedule

  1. Before every test run, clean the cache memory and deployment root files.
  2. Turn on the Show Heap Status option on the host controller: Windows > Preferences > General > Show heap status. You can use this option to keep track of the Java virtual machine (JVM) status and, if you find that it is increasing, you can clean it by clicking the garbage collector icon (see Figure 2).
Figure 2. Garbage collector icon
Icon looks like garbage can
  1. Make sure that there are no connections in CLOSE_WAIT or TIME_WAIT. You can check by using the netstat command before you start the tests from the agent.
  2. For a long test run, set a high value for the statistics sample interval. Setting low value can make operating system and JVM resources use more memory to retrieve test run statistics.

Tip:
We have seen that Rational Performance Tester performs optimally with a 10-second sampling interval for a 1-hour test run duration. You can use this as a guide for computing the sampling interval while configuring schedules.

Figure 3. Statistics showing sample interval
Schedule Element Details, Statistics tab

For a test run with a large number of virtual users, use a different agent machine to emulate high user load rather than putting a large load on a single workstation.


Schedule and run the test

  1. After you have finished all checks and configurations, start running the schedule and monitor performance tests. If you find any error notices, stop the run and check the configuration again according to what the error description reports.

Also while running a test with a large number of virtual users from multiple agent machines, you can see statistics for different agents. This is known as engine room data. It provides a different set of data that helps us analyze agent machine performance statistics.

To view the engine room data:

  1. Run a schedule that includes agents.
  2. After the schedule starts, open a browser on any of the agent systems and point to http://port_number/. For this example, we used 1903 as the port (http://1903/), but that might differ on your system. To check the port number, go to deployment root folder, and open the rptport file in Notepad. The rptport file includes the name of the port used for communication.
Figure 4. The rptport file, where the port number is mentioned
rptport file in directory, DAT file type

As Figure 5 shows, the engine room data is displayed in four sections:

  • Engine Counters (various counters, such as CPU, JVM Heap, I/O, bytes sent or received, KB in use, KB per user, virtual tester count)
  • Subsystems
  • Runner
  • Actions
Figure 5. Engine room data
links to the 4 sections and Engine Counters detail

You can collect information about the current number of users and the state of engine threads. The state should be in the OK, WORKING, or IDLE state. Deadlocked threads can be the underlying reason for agents stopping during a run.
The current states of all active actions are also displayed. For larger runs, you can focus on one of the users and track any type of unexpected behavior by using engine room statistics.

Analyze reports

Now comes the most important part in performance testing, which is analysis of test results. Rational Performance Tester generates several reports for scheduled runs. You can also create customized reports. The most important part here is what reports and counters to look for to be able to reach conclusions.

Figure 6 shows the page performance report metrics, which are often used for analysis.

Figure 6. Sample page with mean and standard deviation data
IBM Rational Performance Tester sample page performance metrics

While analyzing response time data about the web pages that you tested, focus on these points:

  • Check for normally distributed data (mean and standard deviation > 3). From the metrics columns under the Page Performance tab to calculate normal distribution, find the arithmetic mean, or average (in Figure 5, the column heading is Response Time [ms] Average [for Run]) and the corresponding standard deviation in the adjacent column.
  • Analyze the performance within time ranges (steady state data only).
  • Use percentile reports for HTTP page response times. By default, Rational Performance tester shows 85, 90, and 95 as percentile reports. The following example will help you understand the concept of percentile reports.

Percentiles example

If you have a 95th percentile report for a scenario with an average response time of 1200 ms. This means that the response time for 95% of the users is less than or equal to 1200 ms. You can interpret other percentile reports similarly.

A normal distribution of web page data points is considered good performers, but web pages with high standard deviations are weak.

For example, suppose that one of your requirements is to find the user count at which the server stops responding. The easiest way to do this is to add the active user count in the hit rate on one axis and the time on the other, and then find the time where the hit rates flatten. By read the number of active users and merging, you can find the active user count where the server stops responding to client requests.

In Figure 7, the active user counter is added in the Page Throughput report, so we can analyze page attempt rate and hit rate with respect to active users.

Figure 7. Merging user counter results in a graph generated by Rational Performance Tester
Page hit rate graph with additional counter

You can add different counters in other graphs, too, for better analysis of test runs.

When we analyze results of a performance test run, we usually analyze average response time, and more often we find data set whose standard deviation is high or not following normal distribution are indicators of poor performers.

Apart from the response time analysis, we usually analyze throughout (server), such as finding out the server-page throughput during analysis of reports. Also, reports are customizable and within a report, you can add many counters to relate the different data sets to predict or analyze results

Analysis of reports in performance testing is a broad concept, because performance of an application has several dependencies, ranging from application code, server operating system, network, some client script (slowness in JavaScript, for example). Therefore, when we see any bottleneck or other issue, we do isolation testing to reproduce the issue.

Summary

While doing performance testing, always plan testing according to proven workload models, and optimize memory use by observing certain key features and files. By using the kernelio.dat file and engine room data, you can use your workbench and agent machine memory optimally.

You also need to know what to analyze in performance reports and graphs. Using normal distribution and merging of different counters in existing graphs provides more insight.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=870746
ArticleTitle=Using engine room and kernel IO data in performance test analysis
publish-date=04172013