Technical Blog Post
Collection of data for troubleshooting a performance issue involving IBM Business Process Manager
IBM Business Process Manager (BPM) supports powerful, high-performance business process management while providing a simple way to model business processes. BPM is based on WebSphere Application Server and includes many components. So it can be complicated when you hit a performance issue, as the performance issue can occur in any component. This blog will discuss the required information and data for troubleshooting the performance issue by yourself or the IBM Support team.
Benchmark your application before putting it into production
Before we go into the details, we would emphasize the importance of a load test and a staging environment. Load test is a workload designed to stress your system. The usual goal is to understand the system behavior under various conditions, but there are other worthwhile reasons for running load test, such as reproducing a desired system state or migrating onto new hardware. Load test lets you create fictional circumstances, beyond the real conditions you can observe.
- Validate your assumptions about the system and see whether your assumptions are realistic.
- Reproduce a bad behavior you're trying to eliminate in the system.
- Measure how your application currently performs. If you don't know how fast it currently runs, you can't be sure any changes you make are helpful. You can also use historical benchmark results to diagnose problems you didn't foresee.
- Simulate a higher load than your production systems handle, to identify the scalability bottleneck that you will encounter first with growth.
- Plan for growth. Benchmarks can help you estimate how much hardware, network capacity, and other resources you'll need for your projected future load. This can help reduce risk during upgrades or major application changes.
- Test your application's ability to tolerate a changing environment. For example, you can find out how your application performs during a sporadic peak in concurrency or with a different configuration of servers, or you can see how it handles a different data distribution.
Clarify the performance problem
Try to answer the questions below when encountering the performance issue:
- Where are you seeing performance problems? Process Server? Process Center? Process Designer?
- How did you identify a performance problem? Launching a BPD, progressing to the next coach in a flow? Executing a particular part of your custom code?
- Does it impacting all areas?
- Can you quantify the performance issue?
- How long does a specific action take?
- Do you have any logging that shows where the time is being spent?
- Do you have any performance tools to profile the server and analyze the data?
- Is this a new problem or have you always had it in this environment?
- Do you see the same problem in other environments on the network? Is the performance problem consistent or does it change?
- Is performance worse at certain times of the day? Is performance bad under load?
- Does performance improve after a restart of the application server?
- Do you have any workaround?
Performance issues can be caused by configuration or product defect. Most performance issues could be resolved by tuning the configurations. If you have detailed answers to the above questions when the performance issue occurs, you might already have some clue about the culprit.
Gather general diagnostic data
When you report a performance related PMR (problem management record) or SR (service request), IBM Support normally requests a set of performance MustGather data:
- MustGather: Performance, hang, or high CPU issues with WebSphere Application Server on AIX
- MustGather: Performance, hang, or high CPU issues with WebSphere Application Server on Linux
- MustGather: Performance, hang, or high CPU issues with WebSphere Application Server on Windows
There are good tools in the above links that you can make use of. I'd like to provide more guidance to collect the required data below.
1. Collecting the required data:
- a) If you have not already done so, enable verboseGC and restart the problematic server(s).
- b) At the time of the problem, run the script (an example from the Linux MustGather is referenced here) with the following command:
This script will create a file named linperf_RESULTS.tar.gz and three javacores. This script should be executed as the root user. As with any script, you may need to add execute permissions (chmod) before executing the script.
In the above command, JVMPID is the Process ID of the problematic JVM(s). If specifying multiple Process IDs, they should each be separated by a space.?
2. Collecting log files:
- Collect the server logs (SystemOut.log, native_stderr.log,...) from the problematic server(s):
profile_root /logs/ server_name /*
3. Gather database performance report:
- Oracle - Automated Workload Report (AWR)
- Microsoft SQL Server - Dashboard report (v2005, v2008)
- IBM DB2 - Run the following commands against the Process Server database:
db2 get db cfg
db2support . -d <dbname> -c -g -s -o db2support_primary.zip
db2support . -d <dbname> -g -s -o db2support_standby.zip
db2evmon -path /tmp > sqltrace.txt
The above data should give you a general direction where the problem might occur. A performance issue is often complicated. You may need to gather the data many times to find out the root cause. If you identify a particular component has the problem, you may need to gather more specific data. With the above data, you should get a overall idea of the performance issue.