The purpose of this document is to help guide Performance evaluators with designing a successful Performance, Scalability, and Stability test with IBM Cognos Business Intelligence (BI). It is a very high level approach outlining some key principles of Performance, Scalability, and Stability.
This guide will refer to IBM Cognos BI but the concepts can be applied to many multi-user, multi-server software applications. It should be taken from the perspective of a potential customer who may be evaluating IBM Cognos BI as the vendor of choice for their BI needs.
Exclusions and Exceptions
This document is only meant to be a guide and factors such as time constraints, coverage “creep”, hardware issues, can, and likely will, affect the scope of what is actually tested.
It should also be stated that the examples used in this document do not refer to the historical or current scalability characteristics of the IBM Cognos BI product and are purely there for illustrative purposes.
This document assumes that hardware stability is not an issue.
Validating Software Performance
For many organizations consuming business intelligence, software performance can be thought of as validating that the IBM Cognos BI product meets certain performance criteria. Performance criteria may include things such as 'execute 1000 specific scheduled reports in less than an hour', or 'maintain an average transaction time of under 5 seconds at 100 simultaneous users', or something as simple as 'be as fast as the legacy BI software'.
The most prominent criteria are typically dependent on the function a person performs within an organization:
- A Business Analyst will typically be concerned with how much time certain tasks take.
- A Server Administrator will typically be concerned with how much server resources the software requires.
One User Performance Testing
From the Business Analyst perspective, this test is typically the easiest to test and requires the least amount of external tools to execute. The point of this test is to measure the software responsiveness of end-user actions.
The Business Analyst running the test should have the following tools available to them:
- A test plan outlining the user gestures to measure and the expected result or success criteria.
- A method to accurately measure the user gestures in seconds (see Appendix A).
- A place to accurately record and distribute the test results.
A Server Administrator will typically use any one user testing to gather a baseline of the server resources the application consumes when it is relatively quiet. The Server Administrator should have server resource monitoring running on the servers while the Business Analysts test is running. Server resources of most interest are the “Big 3” of memory, central processing unit (CPU), and network utilization. See Appendix B for a short list of useful tools for monitoring server resources.
Multi User Performance Testing
Multi-user testing requires a significant amount of planning and having the right set of tools for the job. The Performance Team executing the test should have the following tools available to them:
- A test plan outlining the user gestures to measure and the expected result or success criteria.
- A method to accurately measure the user gestures in seconds. This will require a user load generating tool (see Appendix A).
- A process to accurately record and distribute the test results.
- Subject Matter Experts (SMEs) on hand to help diagnose complex issues. For example, a database administrator for SQL query tuning or an expert in application server tuning.
Server Administrators should give access to their performance team to execute or have server resource monitoring running on the servers while the Business Analyst multi-user tests are running. Server resources of most interest are the Big '3' of memory, CPU, and network utilization. For multi-user tests, the Server Administrator should work directly with the performance testers to ensure the correct commands for monitoring server resources are executed while the tests are running. The resource monitoring should be started and stopped in step with the multi-user tests otherwise correlating the multi-user tests with resource usage can become difficult. See Appendix B for a short list of useful tools for monitoring server resources.
The question may be asked “That’s fine, monitor memory, CPU, and network utilization but what should I be looking for”. Generally speaking, the server resources can be approaching the limit, but not surpassing it, such that product performance is suffering. For example:
- Say the CPU is approaching 99-100% utilization. This may not be an issue if: the server is dedicated to the software currently being tested, the CPU Queue Length is not growing, and the user end experience is well within the success criteria.
High CPU utilization, if not balanced across all BI Application Tier servers, may also point to a load distribution imbalance through the BI dispatchers which can be solved with tweaks in BI process capacity.
- Assume the physical memory is totally exhausted and virtual memory usage is extremely high. This can point to a severe issue if at the same time memory is exhausted the end user transaction times suffer. If extra memory is available, adding it to the system can often solve this problem. Also, shutting off unnecessary programs on the server can free up extra memory resources as well.
- Say the network utilization is at 100%. This is an issue since the network has now become a bottleneck in the system.
As a final note, the computers generating the load can also run out of memory, CPU, and network. Keeping a lightweight resource monitor on them is useful as well to ensure the load generator does not become a bottleneck.
Multi User Sensitivity Testing
Sensitivity testing is all about how work being done by one part of the BI system or community affects another part of the BI system or community. The same tools used for performance testing apply but the intent is different.
For example, make the assumption that the BI community works 24 hours a day and that non-interactive or batch reporting is executed periodically during the work day on the same servers used by interactive users.
The question arises... what effect does the non-interactive reporting have on the performance of interactive reporting users and vice versa. Both may have important deadlines and cannot be delayed by the other. Sensitivity testing can address this.
The key here is to first run each aspect alone to get an accurate baseline. That means that, using this example, the interactive users test is run in a vacuum. Similarly, the non-interactive tests are run without interference from the interactive reporting users.
Subsequent runs are then run to include both interactive and non-interactive tasks to measure the impact of each other. How much the both interactive and non-interactive tasks affect each other can be controlled to make trending more apparent. Take the following sensitivity test cases and measure the performance:
- Full Non-Interactive baseline
- Full Interactive baseline
- Full Non-Interactive with Full Interactive
- 50% Non-Interactive with Full Interactive
- Full Non-Interactive with 50% Interactive
While the percentages can be changed, sensitivity testing of this kind can provide indications of how different BI activities can affect each other from a performance point of view.
Sensitivity testing can be very different depending on the goal but the principles are the same, separate the testing then bring them together.
Validating Software Scalability
Another aspect of solution testing is scalability testing. Scalability tests often answer the questions people ask after an initial performance test has been completed. Things such as:
- What happens when I add more users to this system? How does that affect the performance of my current users?
- What happens if I add more servers to the system to host the software? Do my users see any improvement?
- If I want to double the user community, can I expect they will experience the same performance if I simply double my hardware supporting the Report Service?
Methods to test these questions can be summed up in the following three sections which outline three principles of scalability.
Predictable scalability can be defined as the ability to add additional users to a static hardware environment and experience a predictable, proportional, and linear change in performance. As an example, if a customer doubles the amount of users on the system they may find user end performance to degrade by 50%. Inversely, if they reduce the amount of users on the system by half they may experience user end performance gains of 50%.
General instructions to test predictable scalability:
- Configure the software application on specific hardware.
- Run a load test at a “small” load for the system for a given timeframe, say 20 users for a smaller server. Record the end-user responsiveness and server resources information.
- Run another load test at a “medium” load for the system for a given timeframe, say 40 users for a smaller server. Record the end-user responsiveness and server resources information.
- Run a third set of data points at a “high” load for the system for a given timeframe, say 60 users for a smaller server. Record the end-user responsiveness and server resources information.
- Review and collate the data. The end-user responsiveness and server resources should grow in a predictable fashion.
This is not to say that IBM Cognos BI scales linearly as users are added to a static system. There are tuning settings that can help. However, if the system is already bottlenecked on a specific and static resource then adding or reducing users to the system may result in a measureable change to legacy user experience.
Predictable Scalability Example
Make the assumption that a BI community is expected to grow by three times over the next two years. The chart below displays the average report execution times as the Y axis and the active user community as the X axis. For 20 active users the average report execution time is at 5.5 seconds. As 20 and 40 more users are added to the community; the average report execution time increases to 9.5 seconds and 18 seconds respectively. This is also illustrated by the following image.
Figure 1 Bar chart displaying the increase in report execution times as users are added to the system
Although the current service level projections may be within acceptable parameters, a degradation of approximately three fold may not go over well with the current user community. Further analysis of the resource utilization data from this test, shows that the servers still have ample CPU and memory available. This finding allows for the assumption that the IBM Cognos Report service is queuing requests as more users are added, which in turn causes the slower report execution times.
The chart below displays the average report execution times as the Y axis and the active user community as the X axis after the addition of more IBM Cognos Report Services to the environment. For 20 active users the average report execution time is at 5.5 seconds. As 20 and 40 more users are added to the community; the average report execution time increases to 6.5 seconds and 10 seconds respectively. This is also illustrated by the following image.
Figure 2 Bar chart displaying a decrease in report execution times as users are added to the system with additional IBM Cognos Report Services
Now while server resource usage is higher on the system due to the additional Report Service consuming more memory and CPU; the tradeoff is that the growing user community has less impact on the legacy users.
In either situation, this predictable scalability testing allows one to quantify how adding users to a static system will affect the overall end user performance.
Horizontal scalability can be defined as the ability to add hardware to a production environment, keep the amount of system users constant, and expect to see improved performance. For example, if a customer doubles the available hardware while keeping the amount of system users constant they could expect a 50% improvement in performance.
General instructions to test horizontal scalability:
- Configure the software application on a specific “small” hardware configuration, say a single server. Have 2 more identical servers configured but sitting idle.
- Run a load test on the single server system for a given timeframe, say 80 users for a single server. Record the end-user responsiveness and server resources information.
- Run the same load test from a number of users perspective (80 users in this example) on a two server system for a given timeframe. Record the end-user responsiveness and server resources information.
- Run the same load test from a number of users perspective (80 users in this example) on a three server system for a given timeframe. Record the end-user responsiveness and server resources information.
- Review and collate the data. The end-user responsiveness and server resources should decrease in a predictable fashion.
Horizontal Scalability Example
Make the assumption that a BI community is starting to grow unsatisfied with BI performance as a set amount of new users are now utilizing the BI application. Adding more hardware will likely help the problem but to what degree?
A test is executed for this scenario following the horizontal scalability suggestions. The simplified results below show a bar chart with the average report execution time for the Y axis and the number of application servers on the X axis. The average report execution time for 1 application server is 30.3 seconds. As one and two additional application servers are added to the environment; the average report execution times reduce to 17.1 and 9.5 seconds respectively. As the bar graph displays linear improvement; the hardware addition looks like it may provide the necessary capacity to improve the end-user experience to a more than acceptable level.
Figure 3 Bar graph displaying a linear decrease in report execution times as more application servers are added
The test results show that adding hardware to a system where the user community is kept constant but the hardware resources available to the IBM Cognos BI application tier increase that user’s experience a predictable level of performance gain.
Thus horizontal scalability testing allows one to quantify how adding hardware to a static user community will affect the overall end user performance.
This can be defined as the ability to add hardware to a production environment, add a proportionate amount of users to the environment, and expect the same level of performance. For example, if a customer doubles the hardware they might expect to double the users and maintain the same performance.
General instructions to test vertical scalability:
- Configure the software application on a specific “small” hardware configuration, say a single server. Have 2 more identical servers configured but sitting idle.
- Run a load test on the single server system for a given time frame, say 40 users for a single server. Record the end-user responsiveness and server resources information.
- Run another load test but increase the user load and servers proportionally, say 80 users for two identical servers. Record the end-user responsiveness and server resources information.
- Run a final load test and again increase the user load and servers proportionally, say 120 users for three identical servers. Record the end-user responsiveness and server resources information.
- Review and collate the data. The end-user responsiveness and server resources should remain relatively flat and stable.
Vertical Scalability Example
Make the assumption that a BI community is going to triple in size in the near future. The current community has an Service Level Agreement (SLA) in place that end user performance times cannot degrade which may pose a problem given what the results from the Predictable Scalability testing demonstrated which was a linear increase.
Tripling the hardware to match the tripling of the user community may alleviate the issue. A test is executed to model this scenario following the vertical scalability suggestions above.
The bar chart below has the average report execution times on the Y axis. The three bars, from left to right, indicate that 40 users using one application server has a report execution time of 30.3 seconds, 80 users using two application servers has a report execution time of 32.1 and 120 users using three application servers has a report execution time of 30.7 seconds. These simplified results show a near identical performance characteristics and the hardware addition looks like it may provide the necessary capacity to maintain the end-user experience to a more than acceptable level.
Figure 4 Bar chart displaying a consistent report execution time as the number of users and applications servers increase
Validating Software Stability
Of equal importance to product performance and scalability is the principle of software stability. Software stability is the measurement of the software's ability to support a given workload for an extended period of time without any downtime, interruptions to the user community, or intervention by an administrator.
Stability tests often answer the questions people ask after performance and scalability tests have been completed. Things such as when the BI system is under load:
- Does the BI system maintain end user responsiveness and throughput?
- Does the BI system exhibit any significant memory growth (such as by memory leaks or memory fragmentation)?
- Does the BI system balance the load efficiently over time?
- Do any BI processes unexpectedly abort?
- Does the user community start experiencing unexpected behaviour such as errors?
Knowing the Typical User Community
One of the keys to designing a good stability test is knowing the characteristics of the user community and how a 'typical day' plays out. In the following table the typical user community is broken down into Report Consumers, Lightweight Reporting, Heavyweight Reporting and Data Analysts.
The Report Consumers group views saved report outputs and comprise 25 percent of the overall community. This group typically will consume reports executed overnight in a scheduled or batch mode.
The Lightweight Reporting group interactively runs reports for business needs and comprises 40 percent of the total community. This group requires up to date reports on data that may be changing throughout the work day.
The Heavyweight Reporting group runs reports for higher business level needs and make up 30 percent of the total user community. This group typically consumes large amounts of data as they require information about the business as a whole.
The final 5 percent of the user community is made up of the Data Analysts group who does ad hoc business data analysis. This group usually consists of advanced users who delve into the data in an ad-hoc basis, using tools such as Analysis Studio or Business Insight Advanced, to deep dive for specific trends in their company's data.
This breakdown is also presented in the following table.
Table 1 Table displaying the makeup of a typical BI community
|Report Consumers||View Saved Report Output||25%|
|Lightweight Reporting||Interactively Running Reports for Business Needs||40%|
|Heavyweight Reporting||Interactively Running Reports for High Level Business Needs||30%|
|Data Analysts||Ad Hoc Business Data Analysis||5%|
Once this is known, planning representative test cases based on the four groups as well as accounting for the scheduled/batch reporting can provide useful information about the stability of the BI system under load.
Design the Stability Test Based on the User Community
In order to define typical test cases that map to various roles in the typical BI community as defined previously, the test cases would need to be created for:
- Report Consumers would view saved report output.
- Lightweight Reporting users would interactively execute lightweight reports in Cognos Viewer or Business Insight.
- Heavyweight Reporting users would interactively execute heavyweight reports in Cognos Viewer or Business Insight.
- Data Analysts would interact with data using BI tools such as Analysis Studio, Query Studio, and Business Insight Advanced.
These would then be added all together into a single test scenario based on the defined percentages. Along with a representative amount of scheduled reporting executing non-interactively, this forms the backbone of the stability test.
How long to run the stability test depends on many factors. If time is available, the test can be scheduled to run for days or weeks. If time is not available, then trends in the test artifacts will have to be analyzed closely to look for issues that, if continued, may spell problems for the BI user community. Bottom line though is that the longer the stability test executes the greater chance it will uncover a potential issue.
Other factors that can affect issues being uncovered is user load and load balancing. If the system is not load balanced, or if not enough user load is placed on the respective servers to adequately exercise them, it will be more difficult to uncover potential issues. If servers are pushed to their near-capacity, it is more likely for potential issues to be uncovered.
Monitoring the Stability Test
Monitoring the stability test is easy in some respects and more difficult in others. The good news is that it runs for a long time so it makes it easier to look for issues such as:
- Memory leaks or fragmentation
- Load balancing problems
- Network bandwidth bottlenecks
- Potential deterioration in end user performance
- Occurrences of end user errors over time as certain hardware or software limits are met or exceeded
To gather this information, the performance analysts must have run-time access to historical views of:
- End user transactional performance data
- System throughput (bytes) and network utilization data
- End user experienced errors
- System resource utilization, including BI processes and processes of software that supports BI such as web servers and databases
- Software logs
Once armed with this information, your performance team can illustrate observations and make predictions for the BI community based on what was uncovered during the test or what trends were seen that may cause issues to the user community if left unchecked (for example, an imbalance in load or excessive memory growth).
General Testing Tips
The following are general and simple tips for testing that can benefit a team designing and executing a multi-user test cycle.
Change One Variable at a Time
Try to be as scientific as possible. While it may seem time consuming to change one variable at a time in a large software environment it will save time in the long run. Nothing is worse than changing five things, watching the performance change significantly, and then have to backtrack through all the changes to find the exact cause. Make sure every change is clearly documented and include the reason for the change.
After a change, it is recommended that at least two data points be revisited. Ideally these data points would be a high user and low user test for a specific test case, the reasoning being that some changes may assist users on a very busy system but inversely slow down users on a less busy system or vice versa. This is good information to note prior to recommending any changes, especially to a production environment.
Reset as Many Things as Possible Prior to Executing a Test
Since tests results are compared to one another, ensure that each test starts at the same point. For example, it is recommended that each performance or scalability test be run after the entire software system has been restarted, software logs backed up, and server resource monitoring restarted.
Otherwise, the previous test can influence the current test since certain database connections may already be established, data may be cached, and processes may be using more memory since they have yet to clean up all the stored information from the previous sessions. IBM Cognos BI products can easily be started and stopped from the command line so as long as the tester has the correct privileges it can be relatively trivial to reset IBM Cognos products prior to each test.
Execute Baseline Tests in the Same Timeframe as the Current Test
What sometimes occurs is testers will use results from 6 months ago as the baseline for a current test. The hardware and configuration are the same so what could be the problem? What can happen is that in that time frame the server may have been patched, the disk may have become heavily fragmented, or new software may be running on the server that is consuming CPU cycles and memory that wasn’t present 6 months ago. All can affect the scalability and performance numbers.
To minimize this risk always re-run the baseline right before the current test so that the baseline and current test are equally affected by any factors on the server.
Automate as Much of the Rest as Possible
Automation will solve two problems:
- The test can run relatively unattended so off-hours can be utilized for testing. Working hours can then be spent analyzing results rather than running tests.
- Automation will ensure each test or subsequent re-test is run in the exact same fashion.
Typically, all the software tools used for performance testing come will a full set of command line tools so only simple batch programming is required.
Monopolize the Hardware the Test is Running On
Nothing is worse than having an unexpected result and later finding out someone else was on the server while the test was running and executed some nasty query thinking it wouldn’t affect the ongoing test. In many environments, performance and scalability testers need to share with user acceptance testers. A schedule should be worked out with these other groups so that the performance tests don’t interfere with user acceptance tests and vice versa.
Do Not Wait Until the test is Over to Analyze the Results
Plan to analyze test results on a daily basis. Performance testers will want to find problems early so that they can be solved. Problems are easier to resolve at the beginning of the test cycle than near or at the end. This will also allow the tester to provide preliminary results to management types who are very interested in the progress of the tests. With these results, plan to send out periodic status updates so all parties involved are aware of the current status of the test.
Creating a Test Plan for the Performance and Scalability Engagement
Testing without a plan is a recipe for trouble. These are common guidelines but they are useful to review. Ensure that all the stakeholders read and sign that they agree to the test plan prior to testing.
Ensure the following sections are in the test plan.
- Purpose: The purpose of the test. It should be direct and short such as “The purpose of this performance and scalability test is to measure specific report execution times between IBM Cognos BI 8.4 and the current production running version of IBM Cognos BI 10.1. Specific end-user experience cases and server resource consumption during those cases will be measured.”
- Goals: Define the performance success criteria for the test. Be as specific as possible. It is extremely helpful to know the success criteria before testing.
- Test Case Description: A full description of the specific test cases for the performance test. Write out step-by-step what each test does. For example, for Business Analyst test cases write out each click the user or performance test tool will perform. If reports are run in IBM Cognos BI, have a description of the type of report run, the number of rows returned, etc. These should relate to the success criteria for the test.
- Test Execution Matrix: Take the general test descriptions from steps 2-3 above and create specific test cases where the test case description, the number of users, and the hardware resources used for that test case are clearly defined. These should relate to the success criteria for the test.
- Deliverables: A clear description of what deliverables will result from the test and their due dates.
- Schedule: Generate a schedule that can fit into the time frame of the test. Include script development, automation development, software configuration, etc into the schedule. Save at least one day for every week of the test cycle for documentation and final analysis of the test results. Have all parties agree to the schedule. Place a caveat into the schedule that states that if issues arise that negatively affect the schedule that a meeting should be called as soon as possible to discuss remedies such as extending the test cycle or removing some of the test cases.
Finally, after the test is completed, review the test plan to see what could be improved. Make note of any issues and incorporate the solutions into subsequent tests.
Depending on the exact specifics of the testing more sections may be needed but this section forms a good base.
Delivering the Results
While it is more exciting just to jump in and start testing, take the time in the initial part of the test planning phase to determine the deliverables for the test. Sometimes going back later to gather missing information is not an option.
Here is a short list of items to consider when it comes to deliverables:
- How will the data be delivered to various audiences? For example, executives may prefer a formal PowerPoint type presentation while the Business Analysts and Server Administrators may prefer to see granular data in an Excel format or an IBM Cognos BI report.
- What information is required for each audience? For example, executives may want to see the performance characteristics summarized. Analysts may want to see how much time certain tasks can take. Server Administrators may want to see how much server resources are consumed by various tests. Discuss this ahead of time with the various stakeholders so nothing major is missed.
- Learn how to correlate the end-user experience with the server resources in the same report or graph. These are very powerful graphs in which data can be derived to see how the end-user experience correlates to the resource consumption on the server. For example, one test may show big performance degradation in end-user times. However, when this graph is correlated to a graph showing memory consumption it may show that the server has insufficient memory to run this many users.
Appendix A: Useful Software for Measuring End-User Transactions and Generating Load for the Performance and Scalability Test
The following software, while not an exhaustive list by any stretch, are strong tools for driving load and measuring certain aspects of performance and scalability. They are merely suggestions and not necessarily an endorsement of one over the other. Investigate these tools and evaluate them for use in your organization’s test suite.
IBM Rational Performance Tester (Single and Multi-user)
Rational Performance Tester (RPT) is a performance testing tool offered by IBM (http://www-01.ibm.com/software/awdtools/tester/performance/).
LoadRunner (Single and Multi-user)
LoadRunner is a performance testing tool offered by Hewlett Packard.
Fiddler (Single user)
Fiddler is a free, browser based, single user performance tool available online at http://www.fiddler2.com/fiddler2/. It works with Internet Explorer, FireFox, and Opera among others.
Appendix B: Useful Software for Measuring Server Resources for the Performance and Scalability Test
The following software can be used for measuring server resource usage during performance and scalability tests. These are merely suggestions and not necessarily an endorsement of one over the other.
Microsoft’s SysInternals Tools (Windows)
There are a multitude of useful tools such as psList that can output process and server related information to a file. That file then can be parsed to retrieve relevant information.
Microsoft’s Performance Monitor (Windows)
This is Windows default Performance Monitoring tool and comes with the operating system.
NMon (AIX and Linux)
This system performance monitoring tool is available for download and installation online.
‘top’ and ‘ps’ (Linux and UNIX)
These tools come with the operating system. Review the man pages for both commands for the correct arguments to gather the necessary server and process information for products running on these operating systems.