IBM Remote Management Agent is a component of the IBM Store Integration Framework that simplifies the delivery of new consumer-facing devices in stores to support the delivery of service. Remote Management Agent monitors and manages a multitude of in-store IT devices to improve system availability by helping customer satisfaction, employee output, and revenues. IBM Tivoli Enterprise Console provides automated problem diagnosis and resolution to improve system performance and to reduce support costs. Tivoli Enterprise Console also provides event management and root cause analysis and resolution.
The objectives of the performance analysis report were to:
- Understand the performance capabilities and limitations of the Remote Management Agent in a high stress, simulated store environment.
- Understand the performance capabilities and limitations of the Tivoli Enterprise Console and provide recommendations for improving performance.
To test the Remote Management Agent server performance, a custom tool was developed that created a configurable number of General Agents (GAs). The GAs then generated events and method invocations over a specified length of time. The questions were:
- Can the Master Agent handle a large number of General Agents (100 or more)?
- Can the Master Agent handle a large number of virtual MBeans (5000 or more)?
- What is the maximum frequency of events that a Master Agent can handle?
For the Tivoli Enterprise Console performance testing, the same custom tool was used to generate events. Events were then forwarded on to the Tivoli Enterprise Console to measure the performance limitations. We wanted to find out:
- What quantity of events can Tivoli Enterprise Console process properly?
- What can be done to improve the performance of Tivoli Enterprise Console?
The following lists hardware and software used for this test effort.
Tivoli Enterprise Console 3.9 server
- Dual Intel® Xeon® 2.4 GHz processor
- 2 GB RAM
- Novell® SuSE® SLES 8, kernel version SMP - 2.4.21-295 operating system
- 100 Mbit LAN
- IBM DB2® v8.1, FixPak 4
- Tivoli Configuration Manager v4.2.3
- Tivoli Enterprise Console v3.9, FixPak 3
Remote Management Agent server
- Intel Xeon 2.4 GHz processor
- 2 GB RAM
- SuSE SLES 9, kernel version2.6.8-24.14 operating system
- 100 Mbit LAN
- Remote Management Agent v1.0, Build 543
The following Unix® system tools were used:
- top
- vmstat
- netstat
- ntop
The top command provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system. It can sort the tasks by CPU usage, memory usage, and runtime.
The vmstat command measures and records CPU and virtual memory usage during the tests. It calculates statistic averages for memory, paging, and CPU activity on a sampling period of a length specified by the user. The vmstat command is typically run in a separate xterm window in the foreground with the output going to the terminal. You can also run it in the background while redirecting output to a file. The primary argument is the sample interval expressed in seconds. The following command runs vmstat with a sample interval of 5 seconds and redirects the output to a file:
vmstat 5 > vmstat.log |
This command creates a file called vmstat.log. The important fields from the vmstat output are shown below.
Memory:
swpd: Amount of virtual memory used.free: Amount of idle memory.buff: Amount of memory used as buffers.cached: Amount of memory used as cache.
CPU:
us: Time spent running non-kernel code.sy: Time spent running kernel code.id: Time spent idle.
The netstat command provides many options and shows various information about the network interfaces on the system. Relevant output includes network connections, routing tables, and interface statistics.
The ntop command displays additional network information. You can view it in a browser, such as Mozilla™ FireFox®, by accessing http://localhost:3000. The ntop command displays a list of hosts that are currently using the network and report information about the traffic generated and received by each host.
For testing the Remote Management Agent capabilities and limitations, the server was set up on a SLES 9 machine with 2 GB of RAM and a 2.4 GHz processor. It was connected via a private 100 Mbit LAN to several additional Intel boxes which ran the custom Remote Management Agent load test tool. Each of these Intel boxes could run many (50-100) General Agents.
For testing Tivoli Enterprise Console's capabilities and limitations, the setup is similar. The SNMP trap mapper is started on the Remote Management Agent server so that events are forwarded on to the Tivoli Enterprise Console. The machines are still all connected on a private 100 Mbit LAN.
This section describes recommendations for improving performance with the Remote Management Agent and the Tivoli Enterprise Console server.
Remote Management Agent server
The Remote Management Agent server is CPU-bound, which means the speed or quantity of CPUs directly affects the throughput of the server. If the Remote Management Agent server is expected to receive more than 1000 events per minute or 6000 invocations per minute, adding an additional CPU is the best solution. Otherwise, try to limit the number of events and invocations at the Remote Management Agent server. The purpose of this article is not to provide an in-depth tuning guide. There are IBM Redbooks that contain additional recommendations on improving overall system performance, such as Highly Available Architectures and Capacity Planning with WebSphere Remote Server V5.1.2.1.
The Remote Management Agent server's memory usage was also monitored and determined to be insignificant. A machine with 2 GB of RAM is more than enough for this event rate. Also, network traffic with 1400 events per minute seemed reasonable for a LAN environment. Details of the testing are in Remote Management Agent test results.
Tivoli Enterprise Console server
The purpose of the testing was to verify the Tivoli Enterprise Console server's event processing capabilities and limitations. Like the Remote Management Agent server, the Tivoli Enterprise Console server is CPU-bound and increasing the number or quality of CPUs will give the largest improvement in throughput. Total events per second at the server is the main consideration when calculating required hardware. The best way to save on processing power is to limit the number of incoming events before they reach the server.
The following tips are taken from a Tivoli Enterprise Console 3.9 performance report:
- The Tivoli Enterprise Console server is CPU-bound, which means the speed of the CPU directly impacts the event throughput of the server.
- Tivoli Enterprise Console can take advantage of multiprocessors. When deciding between deploying on a single 4-processor machine, or two 2-processor machines of equal speed (with the Tivoli Enterprise Console server on one, and RIM_host/RDBMS on another), note that there is a drop in throughput for the two machine environment due to network latency. For a simple or moderately complicated rulebase, choose a slower 2-processor machine over a faster 1-processor. When using a complex rulebase, consider a faster 1-processor machine because the rule engine will be the bottleneck.
- Use a 100 Mbit or better network connection between the server and RDBMS.
- Set
Maximum number of event messages buffered in memoryhigher than the peak expected burst size. This prevents the server from doing added CPU and disk work if the buffer overflows. - Tune the database bufferpools and check that the log spaces are of sufficient size. Database overflow can introduce latency into the event processing. Details on how to do this are discussed in Sample DB2 configuration updates for Tivoli Enterprise Console.
- Plan on additional CPU capacity for programs executed by the rulebase.
- Plan on an additional CPU capacity for excessive number of Java™ consoles or Web consoles. Both access the RDBMS directly and run extra RIM processes.
- The Tivoli Enterprise Console server is not memory-intensive. Two Gbytes is sufficient, even with RIM and the RDBMS on the same machine. More might be needed for complex rulebases and external programs executed by the rulebase.
- Event throughput is bound by the server speed. The number of gateways and endpoints has little effect. Therefore, total events per second is the major design criteria.
Remote Management Agent test results
Two separate load test tools were used during the testing of the Remote Management Agent server. The basic design of both tools was identical, but the implementation differed. The first tool had several drawbacks, such as using a staggered start time for the General Agents. An improved load test tool was created to address these issues. However, there are some useful results from the initial tests. Specifically, long runs of up to 2 hours with a high rate of events were completed and network usage for some tests was recorded. Additionally, the testing that was completed with the first test tool attempted to simulate a more realistic customer environment, with machine configurations defined for small and medium stores. Testing with the second test tool focused on reaching the limits of the server's ability to handle events and method invocations.
Both Remote Management Agent load test tools were created to run stress tests to see how many General Agents, MBeans, JMX events, and method invocations the Remote Management Agent server can handle. To test this, three components were created:
- A General Agent simulator that can create MBeans that can send events.
- An event listener that counts events coming from each General Agent.
- A tool for the Remote Management Agent that invokes methods on MBeans proxied from the General Agents.
Component 1 runs on a "General Agent machine" (any machine will do), components 2 and 3 run on a "Master Agent machine" (any machine with Remote Management Agent installed).
Machine hardware and software configurations used for this test are noted in Hardware and software details.
Test results: Load test tool #1
Machine configurations used in Remote Management Agent testing were as follows:
- Initial test: 1 General Agent, 1 MBean, 10 events per minute.
- Small store: 2 General Agents, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA.
- Medium store: 25 General Agents, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA.
The custom Remote Management Agent load test tool was used to generate the General Agents, MBeans, events, and invocations using one to four Intel machines, depending on the required load. Initial testing included testing a predetermined number of events and method invocations during 5 minute, 30 minute, and 2 hour runs on both a small store and a medium store environment. Load tests were repeated twice for each set up and results were recorded in the following tables.
The tables below contain the following column headings:
- Time: Total time of the load test run.
- CPU usage: Average CPU usage. This information was taken by examining top and vmstat logs.
- Network usage: Descriptions of network traffic on the Remote Management Agent server.
- Additional notes: Any additional notes regarding the specific load test run.
Table 1. Initial test environment: 1 GA, 1 MBean, 10 events per minute
| Time | CPU usage | Network usage | Notes |
|---|---|---|---|
| 5 min | Avg - 3% | Peak - 28Kbps Avg - 6 Kbps | No noticeable impact on system. |
| 5 min | Avg - 5% | Peak - 12.8 Kbps Avg - 4.2 Kbps | No noticeable impact on system. |
Table 2. Small store environment: 2 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
| Time | CPU usage | Network usage | Notes |
|---|---|---|---|
| 5 min | Avg - 5% | Total -1.57 MB | 120 total invocations, 600 total events |
| 5 min | Avg - 5% | Total - 1.57 MB Avg - 41.9 Kbps | 120 total invocations, 600 total events |
| 30 min | Avg - 13% | Total - 5.2 MB Avg - 23 Kbps | 600 total invocations, 3600 total events |
| 30 min | Avg - 15% | Total - 5.4 MB Avg - 23.9 Kbps | 600 total invocations, 3600 total events |
| 2 hrs | Avg - 18% | Total - 18.9 MB Avg - 21.7 Kbps | 2400 total invocations, 14400 total events |
| 2 hrs | Avg - 20% | Total - 18.9 MB Avg - 21.7 Kbps | 2400 total invocations, 14400 total events |
Table 3. Medium store environment: 25 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
| Time | CPU usage | Network usage | Notes |
|---|---|---|---|
| 5 min | Avg - 75% | Total -18.5 MB Avg - 176.2 Kbps | Each GA runs for 5 minutes, but there is a stagger in their starting, so the total run takes about 14 minutes (840 seconds). This explains the network avg usage of ((18,500/840)*8)=176.2. Average of about 330 events per GA. Total around 8250. |
| 5 min | Avg - 75% | Total - 18 MB Avg - 171.4 Kbps | Each GA runs for 5 minutes, but there is a stagger in their starting, so the total run takes about 14 minutes (840 seconds). This explains the network avg usage of ((18,000/840)*8)=171.4. Average of about 330 events per GA. Total around 8250. |
| 30 min | Avg - 100% | N/A | 15750 total invocations, 44715 total events |
| 2 hrs | Avg - 100% | N/A | 179693 sent/175858 recv + 63000 invokes. Lost 3835 events. Loss caused by the way the custom load test tool processed events. |
| 2 hrs | Avg - 100% | N/A | 179782 sent/174767 recv + 63750 invokes. Lost 5015 events. Loss caused by the way the custom load test tool processed events. |
Although network usage was not monitored for the 30 minute and 2 hour tests in the "medium store" configuration, you can calculate a rough estimate from the 5 minute test (which actually took 14 minutes to complete). 8250 total events in 14 minutes translate to about 590 events per minute. Assuming the tested Remote Management Agent server can handle roughly 1400 events per minute, you can expect network traffic to reach 2.3 times 171.4 Kpbs or 342 Kbps. In a LAN environment (100 Mbit), this is not a significant factor.
The 2 hour runs on the "medium store" environment did lose some events. This was caused by the way in which the custom load test tool was created. Events were queued up; however, when the General Agents disconnected before events were processed, the events were lost. Note that although the Remote Management Agent server was stressed for a 2 hour time period, it did not crash, and it continued to process events. If the number of events had been slightly lower, or the CPU slightly faster, 100% of the events would have processed successfully.
Test results: Load test tool #2
The improved version of the Remote Management Agent load test tool is described in this section.
First, it counts all of the General Agents discovered, then tells them to create their MBeans (using the number specified as input to the create GAs script). When each General Agent has finished creating their MBeans, it waits until all of the proxied MBeans are created. The test does not continue until the specified number of proxied MBeans are detected. If the specified number is not detected, the tool will wait in an infinite loop.
After all of the proxied MBeans have been detected, the Master Agent tool tells the General Agents to start sending events. The General Agent MBeans start new threads that emit events. To make sure all of the threads are started before events are sent, the threads are forced to wait 30 seconds. This does not count against the running time. After all of the threads are started, plus a 30 second wait, the timer is started. At this point, method invocations are initiated, if you have specified a number greater than 0.
After the specified time limit, the Master Agent tallies up the total number of events received, and also events received per General Agent. Then, the Master Agent tool sends a signal to the General Agents that causes them to go offline.
Below are the test results using the second Remote Management Agent load test tool. All runs lasted 3 minutes. The table below contains the following column headings:
- # GAs: Total number of General Agents.
- MBeans/GA: Number of MBeans created for each General Agent.
- Events/MBeans/min: Number of events per MBean per minute.
- MBean total: Total number of MBeans in the test.
- Events/min: Number of events per minute.
- Events/min/GA: Number of events per minute per General Agent.
- CPU avg: Average CPU usage during the test.
- CPU max: Peak CPU usage during the test.
Table 4. Medium store environment: 25 GAs, 30 MBean per GA, 60 events per minute per GA, 10 method invocations per minute per GA
| #GAs | MBeans/GA | Events/MBeans/min | MBeans total | Events/min | Events/min/GA | CPU avg | CPU max |
|---|---|---|---|---|---|---|---|
| 1 | 10 | 10 | 10 | 100 | 100 | 9 | 14 |
| 1 | 20 | 10 | 20 | 200 | 200 | 16 | 30 |
| 1 | 10 | 20 | 10 | 200 | 200 | 16 | 32 |
| 2 | 10 | 10 | 20 | 200 | 100 | 15 | 38 |
| 5 | 5 | 10 | 25 | 250 | 50 | 20 | 35 |
| 1 | 20 | 20 | 20 | 400 | 400 | 30 | 48 |
| 2 | 20 | 10 | 40 | 400 | 200 | 30 | 44 |
| 2 | 10 | 20 | 20 | 400 | 200 | 28 | 50 |
| 1 | 50 | 10 | 50 | 500 | 500 | 38 | 50 |
| 5 | 10 | 10 | 50 | 500 | 100 | 34 | 50 |
| 10 | 5 | 10 | 50 | 500 | 50 | 35 | 62 |
| 10 | 50 | 1 | 500 | 500 | 50 | 36 | 54 |
| 1 | 20 | 30 | 20 | 600 | 600 | 40 | 47 |
| 2 | 20 | 20 | 40 | 800 | 400 | 57 | 71 |
| 1 | 30 | 30 | 30 | 900 | 900 | 60 | 79 |
| 1 | 50 | 20 | 50 | 1,000 | 1,000 | 71 | 85 |
| 1 | 100 | 10 | 100 | 1,000 | 1,000 | 71 | 87 |
| 2 | 50 | 10 | 100 | 1,000 | 500 | 72 | 82 |
| 10 | 10 | 10 | 100 | 1,000 | 100 | 67 | 85 |
| 1 | 20 | 50 | 20 | 1,000 | 1,000 | 67 | 78 |
| 5 | 5 | 40 | 25 | 1,000 | 200 | 61 | 82 |
| 5 | 10 | 20 | 50 | 1,000 | 200 | 64 | 82 |
| 5 | 20 | 10 | 100 | 1,000 | 200 | 65 | 83 |
| 5 | 40 | 5 | 200 | 1,000 | 200 | 67 | 87 |
| 10 | 5 | 20 | 50 | 1,000 | 100 | 67 | 88 |
| 10 | 20 | 5 | 200 | 1,000 | 100 | 59 | 68 |
| 1 | 200 | 5 | 200 | 1,000 | 1,000 | 69 | 100 |
| 1 | 500 | 2 | 500 | 1,000 | 1,000 | 78 | 100 |
| 1 | 1,000 | 1 | 1,000 | 1,000 | 1,000 | 99 | 100 |
| 20 | 25 | 2 | 500 | 1,000 | 50 | 67 | 83 |
| 1 | 10 | 120 | 10 | 1,200 | 1,200 | 77 | 87 |
| 1 | 40 | 30 | 40 | 1,200 | 1,200 | 79 | 85 |
| 1 | 25 | 50 | 25 | 1,250 | 1,250 | 83 | 94 |
| 1 | 100 | 20 | 100 | 2,000 | 2,000 | 100 | 100 |
| 2 | 50 | 20 | 100 | 2,000 | 1,000 | 100 | 100 |
When this data is graphed out with the average CPU usage versus events per minute, you can see that the relationship is linear (see Figure 1). This further indicates that the Remote Management Agent is CPU-bound. Also, note that event distribution, that is, the number of General Agents that are generating the events, does not affect the CPU usage. The event rate is important to the Remote Management Agent server.
Figure 1. CPU usage increases linearly with an increase in events per minute

The following table provides results of the Remote Management Agent testing for a variety of configurations.
Table 5. Results of testing for variety of configurations
| #GAs | MBeans/GA | MBeans | Events/min | Invokes/min | Minutes | Notes |
|---|---|---|---|---|---|---|
| 25 | 30 | 750 | 1,500 | 750 | 30 | 44715 events, 15750 invocations, CPU 100% |
| 25 | 30 | 750 | 7,500 | 750 | 5 | 8121 events received, 2250 invocations, CPU 100% (many events lost) |
| 200 | 1 | 200 | 200 | 200 | 3 | 520 events, 426 invocations |
| 275 | 1 | 275 | 275 | 275 | 5 | 1316 events, 1122 invocations |
| 10 | 1,000 | 10,000 | 1,000 | 0 | 15 | 14604 events |
| 10 | 1,000 | 10,000 | 0 | 10,000 | 15 | 92087 invocations |
| 380 | 4 | 1,600 | 800 | 1,600 | 30 | 23724 events, 35088 invocations |
| 100 | 120 | 12,000 | 960 | 12,000 | 30 | 28696 events, 115706 invocations |
| 100 | 240 | 24,000 | 960 | 24,000 | 30 | 28497 events, 138005 invocations |
| 200 | 20 | 4,000 | 800 | 4,000 | 120 | 95613 events, 335627 invocations |
| 300 | 10 | 3,000 | 800 | 0 | 120 | 95495 events |
From this testing, the following limits were discovered for the Remote Management Agent server with the hardware and software configuration described previously:
- 1400 events per minute or 6000 method invocations per minute.
- 79000 MBeans.
- 400 General Agents (there was no apparent limit to this, but 400 is as high as was tested and the limit of what is probably realistic).
- Hybrid rate: 1000 events per minute with 4500 method invocations per minute.
Note that these are the limits of what the Remote Management Agent server can handle while dedicating most of its processing power to handling events and invocations. Obviously, a server will have additional tasks to handle. Careful planning determines whether the machine's hardware is sufficient.
Tivoli Enterprise Console test results
Tivoli Enterprise Console testing was fairly cursory because there are already several documents and IBM Redbooks that discuss performance and tuning recommendations. These recommendations were mentioned in the previous section, Tuning recommendations. Testing for this project included using one Remote Management Agent server to forward events to Tivoli Enterprise Console, and to discover the event processing limitations. After tuning the database properly and setting up a simplified rulebase, the following tests were performed successfully:
- 800 events in 4 minutes
- 1600 events in 4 minutes
- 3159 events in 4 minutes, 11 seconds
- 1200 events in 1 minute
As stated previously, this test was performed on a machine with dual Intel Xeon 2.4 GHz processors and the majority of the processing power is used to handle the events. If the anticipated throughput is higher, additional hardware would be required. A better solution is to limit the incoming events by only forwarding critical ones to Tivoli Enterprise Console. There is no limit on the number of devices (Remote Management Agent servers) that can be connected. The only real limit is the event rate.
Sample DB2 configuration updates for Tivoli Enterprise Console
You can run the following commands from a DB2 command prompt to improve the performance of the Tivoli Enterprise Console:
-- buffer pool for small tables and indexes CREATE BUFFERPOOL BP_4K_small SIZE 40960 PAGESIZE 4K -- buffer pool for large tables CREATE BUFFERPOOL BP_16K_data SIZE 10240 PAGESIZE 16K -- buffer pool for long column tables CREATE BUFFERPOOL BP_32K_long SIZE 2560 PAGESIZE 32K -- (seqdetect is an important parameter but defaults to the correct value of 'yes' for Tivoli Enterprise Console) -- maxlocks is the percentage of locks held in the locklist before lock escalation occurs. update db cfg for tec using maxlocks 70 -- locklist is the number of 4K pages to maintain the list of locks held update db cfg for tec using locklist 200 -- locktimeout is the # of seconds to wait to obtain a lock update db cfg for tec using locktimeout 60 -- the number of threads available to write changed pages in the buffer to disk update db cfg for tec using num_iocleaners 16 -- the number of threads available to prefetch pages into the buffer pool update db cfg for tec using num_ioservers 12 -- specifies how many frequency values to collect for columns for runstats processing update db cfg for tec using num_freqvalues 40 -- space in 4K pages to save table descriptor settings for triggers/procedures update db cfg for tec using catalogcache_sz 50 -- size in 4K pages to set aside for static and dyname SQL compilations update db cfg for tec using pckcachesz 75 -- space in 4K pages to buffer before writing changes to the transaction log update db cfg for tec using logbufsz 20 -- size in 4K pages to make the private/shared sort areas update db cfg for tec using sortheap 350 -- the number of commit requests to bundle if requests occur in less than 1 second update db cfg for tec using mincommit 10 |
You can do additional database management, such as performing a reorg on Tivoli Enterprise Console tables, when the tables undergo numerous changes. Additions, deletions, and updates of information in the tables cause the data to become disorganized, and retrieving disorganized data is more time-consuming. The reorg command reorganizes the tables by reconstructing the rows to eliminate fragmented data. You also improve performance by compacting the information.
This article has given examples of performance results that you may see when running the Remote Management Agent and Tivoli Enterprise Console software in a retail environment. One of the key concepts is that both the Remote Management Agent and Tivoli Enterprise Console are typically CPU-bound and you can often obtain the best results by either limiting the number of events that are forwarded to the enterprise, or by increasing your processing power. The article also provided recommendations for improving performance through configuration recommendations and DB2 tuning tips.
- Highly Available Architectures and Capacity Planning with WebSphere Remote Server V5.1.2.1
- Tivoli Enterprise Performance Tuning
Guide
-
Event Management and Best Practices
- Enable the On Demand Store with IBM Store Integration Framework





