Troubleshooting
Problem
Diagnosing The Problem
General tuning versus specific issues
If you need assistance with general tuning, environment health checks, or application tuning then you may want to look into engaging IBM Services. We also have this Performance tuning Redbooks that provides guidance on the many aspects of Business Automation Workflow / Business Process Manager performance tuning.
"IBM Business Process Manager V8.5 Performance Tuning and Best Practices"If you have a specific product issue or question-related to product behavior then we can assist with that via technical support. We will need a detailed use case for the issue, the delay times you are seeing, or your question and why you consider this a product performance issue.
Overview of Performance diagnostic information
Gather the following information and files. See the steps below for more detailed information:
- Detailed problem description including use case, questions, and concerns
- Detailed environment description and topology
- <profile_root>/logs directory
- <profile_root>/config directory
- verbose GC logging
- Java™cores captured during the "delay" timeframe
- Feedback on whether any OS or Database resource is being exhausted
- (As needed) PI or TWX export of the application involved in the delay
- (As needed) Tracing specific to the product area of the problem
- (As needed) Logging of the HTTP traffic involved in the issue
Detailed Performance diagnostic information
The following information provides detailed steps to gather each type of information that may be required for performance issues. When capturing a set of data ensure that it is complete and covers the timeframe of the occurrence that you point out.
1: Detailed problem description
- You should provide as much information as possible.
The main questions that need to be answered: Other questions to consider:- What product use case or application path has performance issues?
For example, log in to the process portal, start a task, startup of BAW server - What delay times in seconds are you seeing?
- Why do you consider this a product performance issue?
- Was there a time when the same scenario did not have a performance concern?
- Does this always occur for a particular action?
If not, how often does it occur? - Is it specific to a particular application or piece of an application?
- Was there any change before the issue started to occur?
For example a configuration change, a new application deployed, or load increased - Is there anything you have found which helps work around the issue?
- If the related action is not known, how often does the issue occur?
Do you notice any similarities like high user load, a particular time of day, or other common factors? - How does this impact your business?
- What product use case or application path has performance issues?
- Provide the output of the following command for the involved profile:
versionInfo -fixpacks -ifixes
Also, ensure the following information is provided:- Description of the topology/environment including clusters and nodes used
- Provide details about the database and other third-party products involved in the behavior
- Are there multiple networks involved in the communications?
If so, do they span large distances (BPM and database server not collocated)? - If browser-related, do you see the same behavior in Internet Explorer, Firefox, and Chrome?
- After the issue occurs provide the <profile_root>/logs directory which contains the impacted servers.
Also, be sure to provide timestamps of when the issue starts and ends so we have the scope of what should be reviewed in the logs.
- Provide the <profile_root>/config directory from the deployment manager
You may leave out the <profile_root>/config/cells/<Cell_Name>/applications folder to reduce the archive file size
- JVM memory usage is a common issue for performance problems so it is always good to enable this. It is also a very lightweight tracing which would generally not impact the performance if it is left on.
See Enabling VerboseGC for specific steps to enable this.
- Try to generate 3-5 javacores spaced out evenly over the period when the delay is experienced or every minute for hangs that don't return. The javacores need to be gathered while the issue is occurring although it can be helpful to also get one just before the delay is experienced for comparison. They need to be taken on the JVM where the delays are occurring which often is in the application cluster.
Javacores can be triggered by a kill -3 on Unix OSes or the WAS commands. This will not "kill" the process, it just sends the process a signal 3 and will create a heapdump and a javacore.
Example: kill -3 <java process ID>
For more details on the WAS commands or triggering this on Windows see this article on generating javacores for Windows.
- Have your system administrator review the OS and database resources while the issue is occurring. Let us know if anything is being exhausted to the point that it causes performance issues like disk i/o, memory, or network bandwidth. Also if you are using a virtualized environment be sure that you also review the underlying physical resources
.
- If the issue is related to a particular application then we will need more details about that application. Let us know the name of the module or BPD involved and any specific activities or application paths that are related to the issue.
For SCA modules, provide a Project Interchange(PI) export from IBM Integration Designer.
For Process Applications, provide a TWX export of the involved snapshot from the Workflow Center.
- Tracing can impact performance so generally, it can be good to get an initial set of data without a trace. Although often after initial problem analysis, it is good to get a trace specific to a particular component area. Trace for specific areas can be found in the various product MustGathers.
Although the best trace will depend on the problem type, WAS.clientinfopluslogging=all can be a good general trace because it provides info on SQL queries and EJB method boundaries.
- If the performance issue involves client/server processing then capturing the network traffic can be helpful to narrow the scope of the issue. The data capture should include, the URL or operation, headers, request/response data, and the timing. Fiddler or Wireshark are useful tools for this. For more information, Collect an HTTP traffic capture with Fiddler or your web browser or refer captured via browser plugins.
This covers general information useful for most performance issues. Although each issue could require additional information specific to an area or problem. Component-specific mustgathers can be reviewed for more details on data to collect for a particular area. Performance issues may also require multiple collections of data as the issue is narrowed down.
What to do next
- Review the log files and traces at the time of the problem to try to determine the source of the problem.
- Use Business Automation Workflow documentation or the support site to search for known problems.
- Once you complete gathering all the needed information and diagnostics, you can add them to your case. Alternatively, you can upload files to ECURep. For more information, see Enhanced Customer Data Repository (ECURep) - Overview.
Related Information
Product Synonym
BAW, BPM
Was this topic helpful?
Document Information
Modified date:
14 September 2022
UID
swg21611603