Performance analysis report on Remote Management Agent and Tivoli Enterprise Console

This article is a performance analysis report for IBM® Remote Management Agent Version 1.0, Build 543 (provided with IBM WebSphere® Remote Server 5.1.2.1) and IBM Tivoli® Enterprise Console Version 3.9 with Fixpak 3.

Share:

Ryan Brown (ryanab@us.ibm.com), Software Engineer, IBM

Ryan Brown is a Staff Software Engineer at the IBM Raleigh Lab in North Carolina. He has worked on a variety of retail solutions and online commerce offerings at IBM. He has also worked in system verification testing and scalability and performance testing.



15 March 2006

Introduction

IBM Remote Management Agent is a component of the IBM Store Integration Framework that simplifies the delivery of new consumer-facing devices in stores to support the delivery of service. Remote Management Agent monitors and manages a multitude of in-store IT devices to improve system availability by helping customer satisfaction, employee output, and revenues. IBM Tivoli Enterprise Console provides automated problem diagnosis and resolution to improve system performance and to reduce support costs. Tivoli Enterprise Console also provides event management and root cause analysis and resolution.

The objectives of the performance analysis report were to:

  • Understand the performance capabilities and limitations of the Remote Management Agent in a high stress, simulated store environment.
  • Understand the performance capabilities and limitations of the Tivoli Enterprise Console and provide recommendations for improving performance.

Strategy scenario

To test the Remote Management Agent server performance, a custom tool was developed that created a configurable number of General Agents (GAs). The GAs then generated events and method invocations over a specified length of time. The questions were:

  • Can the Master Agent handle a large number of General Agents (100 or more)?
  • Can the Master Agent handle a large number of virtual MBeans (5000 or more)?
  • What is the maximum frequency of events that a Master Agent can handle?

For the Tivoli Enterprise Console performance testing, the same custom tool was used to generate events. Events were then forwarded on to the Tivoli Enterprise Console to measure the performance limitations. We wanted to find out:

  • What quantity of events can Tivoli Enterprise Console process properly?
  • What can be done to improve the performance of Tivoli Enterprise Console?

Hardware and software details

The following lists hardware and software used for this test effort.

Tivoli Enterprise Console 3.9 server

  • Dual Intel® Xeon® 2.4 GHz processor
  • 2 GB RAM
  • Novell® SuSE® SLES 8, kernel version SMP - 2.4.21-295 operating system
  • 100 Mbit LAN
  • IBM DB2® v8.1, FixPak 4
  • Tivoli Configuration Manager v4.2.3
  • Tivoli Enterprise Console v3.9, FixPak 3

Remote Management Agent server

  • Intel Xeon 2.4 GHz processor
  • 2 GB RAM
  • SuSE SLES 9, kernel version2.6.8-24.14 operating system
  • 100 Mbit LAN
  • Remote Management Agent v1.0, Build 543

Measurement tools

The following Unix® system tools were used:

  • top
  • vmstat
  • netstat
  • ntop

The top command provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system. It can sort the tasks by CPU usage, memory usage, and runtime.

The vmstat command measures and records CPU and virtual memory usage during the tests. It calculates statistic averages for memory, paging, and CPU activity on a sampling period of a length specified by the user. The vmstat command is typically run in a separate xterm window in the foreground with the output going to the terminal. You can also run it in the background while redirecting output to a file. The primary argument is the sample interval expressed in seconds. The following command runs vmstat with a sample interval of 5 seconds and redirects the output to a file:

vmstat 5 > vmstat.log

This command creates a file called vmstat.log. The important fields from the vmstat output are shown below.

Memory:

  • swpd: Amount of virtual memory used.
  • free: Amount of idle memory.
  • buff: Amount of memory used as buffers.
  • cached: Amount of memory used as cache.

CPU:

  • us: Time spent running non-kernel code.
  • sy: Time spent running kernel code.
  • id: Time spent idle.

The netstat command provides many options and shows various information about the network interfaces on the system. Relevant output includes network connections, routing tables, and interface statistics.

The ntop command displays additional network information. You can view it in a browser, such as Mozilla™ FireFox®, by accessing http://localhost:3000. The ntop command displays a list of hosts that are currently using the network and report information about the traffic generated and received by each host.


Machine configuration

For testing the Remote Management Agent capabilities and limitations, the server was set up on a SLES 9 machine with 2 GB of RAM and a 2.4 GHz processor. It was connected via a private 100 Mbit LAN to several additional Intel boxes which ran the custom Remote Management Agent load test tool. Each of these Intel boxes could run many (50-100) General Agents.

For testing Tivoli Enterprise Console's capabilities and limitations, the setup is similar. The SNMP trap mapper is started on the Remote Management Agent server so that events are forwarded on to the Tivoli Enterprise Console. The machines are still all connected on a private 100 Mbit LAN.


Tuning recommendations

This section describes recommendations for improving performance with the Remote Management Agent and the Tivoli Enterprise Console server.


Remote Management Agent server

The Remote Management Agent server is CPU-bound, which means the speed or quantity of CPUs directly affects the throughput of the server. If the Remote Management Agent server is expected to receive more than 1000 events per minute or 6000 invocations per minute, adding an additional CPU is the best solution. Otherwise, try to limit the number of events and invocations at the Remote Management Agent server. The purpose of this article is not to provide an in-depth tuning guide. There are IBM Redbooks that contain additional recommendations on improving overall system performance, such as Highly Available Architectures and Capacity Planning with WebSphere Remote Server V5.1.2.1.

The Remote Management Agent server's memory usage was also monitored and determined to be insignificant. A machine with 2 GB of RAM is more than enough for this event rate. Also, network traffic with 1400 events per minute seemed reasonable for a LAN environment. Details of the testing are in Remote Management Agent test results.


Tivoli Enterprise Console server

The purpose of the testing was to verify the Tivoli Enterprise Console server's event processing capabilities and limitations. Like the Remote Management Agent server, the Tivoli Enterprise Console server is CPU-bound and increasing the number or quality of CPUs will give the largest improvement in throughput. Total events per second at the server is the main consideration when calculating required hardware. The best way to save on processing power is to limit the number of incoming events before they reach the server.

The following tips are taken from a Tivoli Enterprise Console 3.9 performance report:

  • The Tivoli Enterprise Console server is CPU-bound, which means the speed of the CPU directly impacts the event throughput of the server.
  • Tivoli Enterprise Console can take advantage of multiprocessors. When deciding between deploying on a single 4-processor machine, or two 2-processor machines of equal speed (with the Tivoli Enterprise Console server on one, and RIM_host/RDBMS on another), note that there is a drop in throughput for the two machine environment due to network latency. For a simple or moderately complicated rulebase, choose a slower 2-processor machine over a faster 1-processor. When using a complex rulebase, consider a faster 1-processor machine because the rule engine will be the bottleneck.
  • Use a 100 Mbit or better network connection between the server and RDBMS.
  • Set Maximum number of event messages buffered in memory higher than the peak expected burst size. This prevents the server from doing added CPU and disk work if the buffer overflows.
  • Tune the database bufferpools and check that the log spaces are of sufficient size. Database overflow can introduce latency into the event processing. Details on how to do this are discussed in Sample DB2 configuration updates for Tivoli Enterprise Console.
  • Plan on additional CPU capacity for programs executed by the rulebase.
  • Plan on an additional CPU capacity for excessive number of Java™ consoles or Web consoles. Both access the RDBMS directly and run extra RIM processes.
  • The Tivoli Enterprise Console server is not memory-intensive. Two Gbytes is sufficient, even with RIM and the RDBMS on the same machine. More might be needed for complex rulebases and external programs executed by the rulebase.
  • Event throughput is bound by the server speed. The number of gateways and endpoints has little effect. Therefore, total events per second is the major design criteria.

Remote Management Agent test results

Two separate load test tools were used during the testing of the Remote Management Agent server. The basic design of both tools was identical, but the implementation differed. The first tool had several drawbacks, such as using a staggered start time for the General Agents. An improved load test tool was created to address these issues. However, there are some useful results from the initial tests. Specifically, long runs of up to 2 hours with a high rate of events were completed and network usage for some tests was recorded. Additionally, the testing that was completed with the first test tool attempted to simulate a more realistic customer environment, with machine configurations defined for small and medium stores. Testing with the second test tool focused on reaching the limits of the server's ability to handle events and method invocations.

Both Remote Management Agent load test tools were created to run stress tests to see how many General Agents, MBeans, JMX events, and method invocations the Remote Management Agent server can handle. To test this, three components were created:

  1. A General Agent simulator that can create MBeans that can send events.
  2. An event listener that counts events coming from each General Agent.
  3. A tool for the Remote Management Agent that invokes methods on MBeans proxied from the General Agents.

Component 1 runs on a "General Agent machine" (any machine will do), components 2 and 3 run on a "Master Agent machine" (any machine with Remote Management Agent installed).

Machine hardware and software configurations used for this test are noted in Hardware and software details.

Test results: Load test tool #1

Machine configurations used in Remote Management Agent testing were as follows:

  • Initial test: 1 General Agent, 1 MBean, 10 events per minute.
  • Small store: 2 General Agents, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA.
  • Medium store: 25 General Agents, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA.

The custom Remote Management Agent load test tool was used to generate the General Agents, MBeans, events, and invocations using one to four Intel machines, depending on the required load. Initial testing included testing a predetermined number of events and method invocations during 5 minute, 30 minute, and 2 hour runs on both a small store and a medium store environment. Load tests were repeated twice for each set up and results were recorded in the following tables.

The tables below contain the following column headings:

  • Time: Total time of the load test run.
  • CPU usage: Average CPU usage. This information was taken by examining top and vmstat logs.
  • Network usage: Descriptions of network traffic on the Remote Management Agent server.
  • Additional notes: Any additional notes regarding the specific load test run.
Table 1. Initial test environment: 1 GA, 1 MBean, 10 events per minute
TimeCPU usageNetwork usageNotes
5 minAvg - 3%Peak - 28Kbps
Avg - 6 Kbps
No noticeable impact on system.
5 minAvg - 5%Peak - 12.8 Kbps
Avg - 4.2 Kbps
No noticeable impact on system.
Table 2. Small store environment: 2 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
TimeCPU usageNetwork usageNotes
5 minAvg - 5%Total -1.57 MB120 total invocations, 600 total events
5 minAvg - 5%Total - 1.57 MB
Avg - 41.9 Kbps
120 total invocations, 600 total events
30 minAvg - 13%Total - 5.2 MB
Avg - 23 Kbps
600 total invocations, 3600 total events
30 minAvg - 15%Total - 5.4 MB
Avg - 23.9 Kbps
600 total invocations, 3600 total events
2 hrsAvg - 18%Total - 18.9 MB
Avg - 21.7 Kbps
2400 total invocations, 14400 total events
2 hrsAvg - 20%Total - 18.9 MB
Avg - 21.7 Kbps
2400 total invocations, 14400 total events
Table 3. Medium store environment: 25 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
TimeCPU usageNetwork usageNotes
5 minAvg - 75%Total -18.5 MB
Avg - 176.2 Kbps
Each GA runs for 5 minutes, but there is a stagger in their starting, so the total run takes about 14 minutes (840 seconds). This explains the network avg usage of ((18,500/840)*8)=176.2. Average of about 330 events per GA. Total around 8250.
5 minAvg - 75%Total - 18 MB
Avg - 171.4 Kbps
Each GA runs for 5 minutes, but there is a stagger in their starting, so the total run takes about 14 minutes (840 seconds). This explains the network avg usage of ((18,000/840)*8)=171.4. Average of about 330 events per GA. Total around 8250.
30 minAvg - 100%N/A15750 total invocations, 44715 total events
2 hrsAvg - 100%N/A179693 sent/175858 recv + 63000 invokes. Lost 3835 events. Loss caused by the way the custom load test tool processed events.
2 hrsAvg - 100%N/A179782 sent/174767 recv + 63750 invokes. Lost 5015 events. Loss caused by the way the custom load test tool processed events.

Although network usage was not monitored for the 30 minute and 2 hour tests in the "medium store" configuration, you can calculate a rough estimate from the 5 minute test (which actually took 14 minutes to complete). 8250 total events in 14 minutes translate to about 590 events per minute. Assuming the tested Remote Management Agent server can handle roughly 1400 events per minute, you can expect network traffic to reach 2.3 times 171.4 Kpbs or 342 Kbps. In a LAN environment (100 Mbit), this is not a significant factor.

The 2 hour runs on the "medium store" environment did lose some events. This was caused by the way in which the custom load test tool was created. Events were queued up; however, when the General Agents disconnected before events were processed, the events were lost. Note that although the Remote Management Agent server was stressed for a 2 hour time period, it did not crash, and it continued to process events. If the number of events had been slightly lower, or the CPU slightly faster, 100% of the events would have processed successfully.

Test results: Load test tool #2

The improved version of the Remote Management Agent load test tool is described in this section.

First, it counts all of the General Agents discovered, then tells them to create their MBeans (using the number specified as input to the create GAs script). When each General Agent has finished creating their MBeans, it waits until all of the proxied MBeans are created. The test does not continue until the specified number of proxied MBeans are detected. If the specified number is not detected, the tool will wait in an infinite loop.

After all of the proxied MBeans have been detected, the Master Agent tool tells the General Agents to start sending events. The General Agent MBeans start new threads that emit events. To make sure all of the threads are started before events are sent, the threads are forced to wait 30 seconds. This does not count against the running time. After all of the threads are started, plus a 30 second wait, the timer is started. At this point, method invocations are initiated, if you have specified a number greater than 0.

After the specified time limit, the Master Agent tallies up the total number of events received, and also events received per General Agent. Then, the Master Agent tool sends a signal to the General Agents that causes them to go offline.

Below are the test results using the second Remote Management Agent load test tool. All runs lasted 3 minutes. The table below contains the following column headings:

  • # GAs: Total number of General Agents.
  • MBeans/GA: Number of MBeans created for each General Agent.
  • Events/MBeans/min: Number of events per MBean per minute.
  • MBean total: Total number of MBeans in the test.
  • Events/min: Number of events per minute.
  • Events/min/GA: Number of events per minute per General Agent.
  • CPU avg: Average CPU usage during the test.
  • CPU max: Peak CPU usage during the test.
Table 4. Medium store environment: 25 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
#GAsMBeans/GAEvents/MBeans/minMBeans totalEvents/minEvents/min/GACPU avgCPU max
1101010100100914
12010202002001630
11020102002001632
21010202001001538
551025250502035
12020204004003048
22010404002003044
21020204002002850
15010505005003850
51010505001003450
1051050500503562
10501500500503654
12030206006004047
22020408004005771
13030309009006079
15020501,0001,0007185
1100101001,0001,0007187
250101001,0005007282
1010101001,0001006785
12050201,0001,0006778
5540251,0002006182
51020501,0002006482
520101001,0002006583
54052001,0002006787
10520501,0001006788
102052001,0001005968
120052001,0001,00069100
150025001,0001,00078100
11,00011,0001,0001,00099100
202525001,000506783
110120101,2001,2007787
14030401,2001,2007985
12550251,2501,2508394
1100201002,0002,000100100
250201002,0001,000100100

When this data is graphed out with the average CPU usage versus events per minute, you can see that the relationship is linear (see Figure 1). This further indicates that the Remote Management Agent is CPU-bound. Also, note that event distribution, that is, the number of General Agents that are generating the events, does not affect the CPU usage. The event rate is important to the Remote Management Agent server.

Figure 1. CPU usage increases linearly with an increase in events per minute
Figure 1. CPU usage increases linearly with an increase in events per minutel

Additional test results

The following table provides results of the Remote Management Agent testing for a variety of configurations.

Table 5. Results of testing for variety of configurations
#GAsMBeans/GAMBeansEvents/minInvokes/minMinutesNotes
25307501,5007503044715 events, 15750 invocations, CPU 100%
25307507,50075058121 events received, 2250 invocations, CPU 100% (many events lost)
20012002002003520 events, 426 invocations
275127527527551316 events, 1122 invocations
101,00010,0001,00001514604 events
101,00010,000010,0001592087 invocations
38041,6008001,6003023724 events, 35088 invocations
10012012,00096012,0003028696 events, 115706 invocations
10024024,00096024,0003028497 events, 138005 invocations
200204,0008004,00012095613 events, 335627 invocations
300103,000800012095495 events

From this testing, the following limits were discovered for the Remote Management Agent server with the hardware and software configuration described previously:

  • 1400 events per minute or 6000 method invocations per minute.
  • 79000 MBeans.
  • 400 General Agents (there was no apparent limit to this, but 400 is as high as was tested and the limit of what is probably realistic).
  • Hybrid rate: 1000 events per minute with 4500 method invocations per minute.

Note that these are the limits of what the Remote Management Agent server can handle while dedicating most of its processing power to handling events and invocations. Obviously, a server will have additional tasks to handle. Careful planning determines whether the machine's hardware is sufficient.


Tivoli Enterprise Console test results

Tivoli Enterprise Console testing was fairly cursory because there are already several documents and IBM Redbooks that discuss performance and tuning recommendations. These recommendations were mentioned in the previous section, Tuning recommendations. Testing for this project included using one Remote Management Agent server to forward events to Tivoli Enterprise Console, and to discover the event processing limitations. After tuning the database properly and setting up a simplified rulebase, the following tests were performed successfully:

  • 800 events in 4 minutes
  • 1600 events in 4 minutes
  • 3159 events in 4 minutes, 11 seconds
  • 1200 events in 1 minute

As stated previously, this test was performed on a machine with dual Intel Xeon 2.4 GHz processors and the majority of the processing power is used to handle the events. If the anticipated throughput is higher, additional hardware would be required. A better solution is to limit the incoming events by only forwarding critical ones to Tivoli Enterprise Console. There is no limit on the number of devices (Remote Management Agent servers) that can be connected. The only real limit is the event rate.


Sample DB2 configuration updates for Tivoli Enterprise Console

You can run the following commands from a DB2 command prompt to improve the performance of the Tivoli Enterprise Console:

-- buffer pool for small tables and indexes
CREATE BUFFERPOOL BP_4K_small SIZE 40960 PAGESIZE 4K
-- buffer pool for large tables
CREATE BUFFERPOOL BP_16K_data SIZE 10240 PAGESIZE 16K
-- buffer pool for long column tables
CREATE BUFFERPOOL BP_32K_long SIZE 2560 PAGESIZE 32K
-- (seqdetect is an important parameter but defaults to the correct value of 'yes' 
for Tivoli Enterprise Console)
-- maxlocks is the percentage of locks held in the locklist before lock escalation occurs.
update db cfg for tec using maxlocks 70
-- locklist is the number of 4K pages to maintain the list of locks held
update db cfg for tec using locklist 200
-- locktimeout is the # of seconds to wait to obtain a lock
update db cfg for tec using locktimeout 60
-- the number of threads available to write changed pages in the buffer to disk
update db cfg for tec using num_iocleaners 16
-- the number of threads available to prefetch pages into the buffer pool
update db cfg for tec using num_ioservers 12
-- specifies how many frequency values to collect for columns for runstats processing
update db cfg for tec using num_freqvalues 40
-- space in 4K pages to save table descriptor settings for triggers/procedures
update db cfg for tec using catalogcache_sz 50
-- size in 4K pages to set aside for static and dyname SQL compilations
update db cfg for tec using pckcachesz 75
-- space in 4K pages to buffer before writing changes to the transaction log
update db cfg for tec using logbufsz 20
-- size in 4K pages to make the private/shared sort areas
update db cfg for tec using sortheap 350
-- the number of commit requests to bundle if requests occur in less than 1 second
update db cfg for tec using mincommit 10

You can do additional database management, such as performing a reorg on Tivoli Enterprise Console tables, when the tables undergo numerous changes. Additions, deletions, and updates of information in the tables cause the data to become disorganized, and retrieving disorganized data is more time-consuming. The reorg command reorganizes the tables by reconstructing the rows to eliminate fragmented data. You also improve performance by compacting the information.


Conclusion

This article has given examples of performance results that you may see when running the Remote Management Agent and Tivoli Enterprise Console software in a retail environment. One of the key concepts is that both the Remote Management Agent and Tivoli Enterprise Console are typically CPU-bound and you can often obtain the best results by either limiting the number of events that are forwarded to the enterprise, or by increasing your processing power. The article also provided recommendations for improving performance through configuration recommendations and DB2 tuning tips.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Tivoli, Tivoli (service management)
ArticleID=105862
ArticleTitle=Performance analysis report on Remote Management Agent and Tivoli Enterprise Console
publish-date=03152006