Troubleshooting BPM issues can be complex without the correct tools and techniques. This article describes a set of tools and techniques suggested by the IBM® BPM SWAT and Support team to help you with problem determination for IBM Business Process Manager. This content is part of the IBM Business Process Management Journal.

Share:

Allen Chan (avchan@ca.ibm.com), STSM, BPM SWAT Technical Lead and Lead Architect, IBM

Allen Chan photoAllen Chan is a Senior Technical Staff Member of IBM, and is the currently the BPM SWAT Technical Lead and the Lead Architect for BPM installation, Configuration and Migration. In his role as the BPM SWAT Technical Lead, he leads a team of experts to help ensure customer success in key BPM deployment and rollout scenarios. Prior to his current role, he was the architect for BPM application lifecycle, ESB and integration, as well as BPM tools.



Timothy Brantner (timmy@us.ibm.com), Manager, BPM, BPM Cloud, WebSphere Process Server and Blueworks Live Technical Support, IBM

Tim Brantner photoTim Brantner is a Technical Support Manager of BPM, BPM Cloud, WebSphere Process Server and Blueworks Live. In his current role, Tim leads and coordinates a worldwide team to deliver the best possible support for BPM customers. Prior to his current role, he specialized in delivering customer programs (skills transfer education, beta programs, early design programs, and customer acceleration programs) and customer support. Originally from Austin, Texas, Tim left development in the dot-com, start-up world to join IBM in 2001 in Raleigh, North Carolina; where he still resides.



Jing You (youjing@cn.ibm.com), Advisory Software Engineer, IBM

Jing You photoJing You has more than 10 years experience in software development and industry solutions. He is currently an Advisory Software Engineer on the WebSphere BPM Runtime Development team, where he works on the development and support for SCA and web service bindings. He is also involved BPM-related customer support. Prior to his current role, he was an architect with the Banking Industry Solution Lab in IBM and led the design and delivery of the first Virtual Teller Machine (VTM) project in China.



Sun Guang Da (sundag@cn.ibm.com), Staff Software Engineer, WebSphere Process Server and BPM Level 3 Support, IBM

Da Guang Sun photoDa Guang Sun is a Staff Software Engineer on the BPM Level 3 Support team, currently providing L3 support for WebSphere Process Server and BPM product-related issues related to SCA, business objects, bindings and web services.



Dave Spriet (spriet@ca.ibm.com), Senior Software Architect for BPM and Customer Deployment, IBM

Dave Spriet photoDave Spriet is a Senior Software Architect for IBM business process management and works with customers and business partners in his BPM SWAT role, which involves BPM architecture reviews, migration, upgrade and customer enablement. Dave has been with IBM since 1998 and has focused primarily on BPM, SOA and connectivity throughout his career. He has a Bachelor's degree (with Honors) in Computer Science and Statistics from McMaster University, Canada.



04 December 2013

Also available in Chinese

Introduction

IBM Business Process Manager supports powerful, high-performance business process management while providing a simple way to model business processes. It is based on Java™ technology and a WebSphere® Application Server infrastructure. This article outlines a set of tools and techniques to help you with problem determination of IBM BPM production issues.


Handling sensitive data

Try the Workflow service

Create long-running, stateful workflows that orchestrate tasks and services with synchronous or asynchronous event-driven interactions with the Workflow service from Bluemix. Try it for free!

In the majority of the cases, "logging, tracing, monitoring, probing" involves capturing information in a file for analysis by a subject matter expert (SME). Some of the collected data is sensitive and you may not want to expose it outside of your organization. The following table lists the different levels of sensitivity and how do you de-sensitize them:

Table 1. Sensitive data types
Information How to de-sensitize
Network / infrastructure data – This information includes the host name, port, IP address, and so on. You can replace sensitive infrastructure information with aliases. However, it's important to use different aliases for different hosts, ports, and so on.
Process data – This information pertains to the definition and structure of the business process such as the task name, business process definition (BPD) name, and service name. However, it does not contain business data. You can replace sensitive infrastructure information with aliases. However, it's important to use different aliases for different process constructs.
Business data – This information reflects the business payload. This information may be needed to help with troubleshooting. You can replace sensitive infrastructure information with dummy information. However, sometimes it's important to retain data formatting or characteristics. For example, you can replace business data with an XML message that has a similar structure, but with example data.

In the sections that follow, we call out the areas where you may need to take process or business data sensitivity into consideration.


Troubleshooting checklist documentation

For troubleshooting purposes, it's important to have a good problem description up front. The more effort you spend here, the faster the problem is resolved. For example, consider what was the user was doing, the exact timestamps, the error codes, and so on.

One of the best troubleshooting best practices is to have a system for recording or documenting all of the changes that are being made to individual environments (such as fixes installed). This one activity provides the best insight into troubleshooting problems. Be prepared to share this information with IBM.

The IBM Business Process Manager Information Centers have a wealth of information on troubleshooting and support.


System data monitoring and collection

If you have an issue with your BPM system, you should first check the system status and eliminate any potential issues with the operation system, JVM or database. In this section, we'll describe what you should check at the system level, how to collect system data, and introduce some of the monitoring tools.

BPM and database monitoring

As a fundamental of problem determination, Java™ virtual machine (JVM) and system health are often very important, especially for performance tuning issues. Following are examples showing some of the scenarios you should monitor:

  • The JVM heap reaches 70% of maximum and stays there for over 5 minutes. This scenario is usually characteristic of an impending out-of memory error.
  • The CPU remains at over 90% for 5 minutes.
  • The hard disk capacity reaches 70% or whatever value is not acceptable.
  • The hard disk space capacity increases by 20% in less than 5 minutes. Rapid consumption of hard disk space usually indicates an issue is occurring that produces a lot of log entries and uses up disk space.
  • Garbage collection happens frequently without much success in freeing up memory. This scenario usually means that there is more demand for memory than is available, which creates conditions for a possible out-of-memory error.

You should monitor the following information for the BPM server and database:

  • BPM:
    • JVM: heap size, garbage collection, thread usage, connection pool usage
    • System: CPU usage, disk space, I/O, temp space, memory usage,
  • Database: CPU usage, disk space, I/O, cache size

You can use the following tools to monitor and analyze the system health:

Table 2. System health monitoring tools
System/JVM informationMonitoring/analysis tool
Disk space, CPU load, Disk I/O, and so onOperating system tools
Nmon
Java core fileIBM Whole-system Analysis of Idle Time tool (WAIT tool)
Java heap real-time informationJConsole JVM tool
Java thread dumpIBM Support Assistant
IBM Thread and Monitor Dump Analyzer for Java
Java heap dumpIBM Heap Analyzer
Garbage collection, JDBC threads, web threads, servlet threads, and so onWebSphere Application Server Performance Monitoring Infrastructure (PMI)

in the following sections, we'll describe some of these tools, as well as the data collection methods for different types of system data, including:

  • IBM WAIT tool
  • JConsole JVM tool
  • Java thread dumps
  • IBM Heap Analyzer data interpretation
  • Disk space
  • Garbage collection data interpretation
  • Logs and trace
  • Cross-component trace
  • IBM Support Assistant

IBM WAIT tool

The IBM WAIT tool can help you pinpoint performance bottlenecks in workloads with a central Java component. WAIT analyzes easy-to-collect runtime data, and produces an HTML report that describes the main bottlenecks. WAIT takes as input one or more javacore files from a running IBM or Sun®-based JVM. It provides scripts that make gathering data easy.

WAIT is designed to be zero-install; it requires no monitoring agents, and the report is an interactive web page. You need not restart your application server, and the data collection overhead should be minimal.

You can access the tool at http://wait.ibm.com.

Figure 1. IBM WAIT tool
IBM WAIT tool

Click to see larger image

Figure 1. IBM WAIT tool

IBM WAIT tool

JConsole JVM tool

JConsole is a JMX-compliant monitoring tool. It uses the extensive JMX instrumentation of the JVM to provide information on performance and resource consumption of applications running on the Java platform.

Since the JConsole has already shipped with modern JVM, it is the most direct way to monitor the memory usage of JVM, threads details, and classes information. By monitoring this information, you can observe whether there is any memory leak over time or any thread in deadlock status that no response.

The jconsole executable is in JDK_HOME/bin, where JDK_HOME is the directory where the JDK is installed. If this directory is on your system path, you can start the tool by typing jconsole in a command (shell) prompt. Otherwise, you have to type the full path to the executable file.

Figure 2. JConsole JVM tool
JConsole JVM tool

Click to see larger image

Figure 2. JConsole JVM tool

JConsole JVM tool

Java thread dumps

Thread dumps give you a snapshot in time of what the threads in the JVM are processing at that specific time. For example, you can create 5 thread dumps at 30 second intervals so that you can see, over time, what is happening and if threads are stuck.

To generate a thread dump, do one the following:

After you get the thread dump file, you can use following tools to analyze it:

JVM heap analyser data interpretation

Java heap areas define objects, arrays, and classes. Java heap dumps are snap shots of Java heaps at specific times. By analyzing the trend of JVM heap size, you can easily find a potential memory leak.

Figure 3 shows the JVM heap size dramatically increased after some requests coming in. This means you need pay more attention to the the request processing. Also you need to take a snapshot on the heap and do analysis to see what kinds of objects exhausted the heap space.

Figure 3. JVM heap size graph
JVM heap size graph

Click to see larger image

Figure 3. JVM heap size graph

JVM heap size graph

IBM HeapAnalyzer helps you find a possible Java heap leak area through the use of a heuristic search engine and analysis of the Java heap dump in Java applications. HeapAnalyzer analyzes Java heap dumps by parsing the dump, creating directional graphs, transforming them into directional trees, and executing the heuristic search engine.

For more information, refer to Using HeapAnalyzer to diagnose Java heap issues.

Garbage collection data interpretation

Garbage collection can have a big effect on application performance. It's recommended that you monitor the garbage collection (GC) information together with JVM heap size.

You can see clearly if the allocated memory was freed to the memory pool by the GC (Figure 4). When the heap requests are not too great, but the memory usage is increasing, you may need to check the GC algorithm.

Figure 4. JVM heap requests
JVM heap requests

The GC and Memory Visualizer are part of a new tooling suite from IBM that analyzes verbose GC logs to help provide just this sort of insight into memory management issues.

Logs and trace

In some cases, you may need to gather information for specific server components. IBM BPM is based on WebSphere Application Server, which has enabled the log and trace capability for issue diagnosis. You can turn on different log levels and trace settings to gather additional execution data that will help identify issues or narrow down the scope of the problem (this might contain business data). Fefer to the following documents to collect log and trace information.

Note: Collected data might contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Cross-Component Trace

The Cross-Component Trace (XCT), available only in IBM BPM Advanced, maps SystemOut.log and trace.log records back to the SCA programming model using WebSphere Integration Developer.

Figure 5. Enable Cross-Component Trace
Enable Cross-Component Trace

You can enable Cross-Component Trace using the following approaches:

  • While the server is running, using runtime trace, which takes effect immediately.
  • Persisted over a server restart using a Configuration trace. This approach is useful when you want to find out how messages are being passed between different SCA components so you can better identify which component is causing an issue.

IBM Support Assistant 4.1

IBM Support Assistant is a complimentary software offering that provides you with a workbench to help you with problem determination. With a focus on quickly finding key information, automating repetitive steps, and arming you with a variety of serviceability tools, the Support Assistant enables you to do self-analysis and diagnosis of problems and to achieve faster time to resolution. It also provides access to several different serviceability tools that can assist you in many areas of problem diagnosis, including Java troubleshooting, product configuration analysis, log analysis, and more.

You can download the IBM Support Assistant here.

Figure 6. IBM Support Assistant
IBM Support Assistant

Business process data monitoring

If you've finished the problem determination at the system level and still encounter issues on your BPM system, we recommend that you focus on your business processes. Process issues may be caused either by the business process design or the business process execution by a process engine. In this section, we'll introduce some important monitoring tools that are part of various versions of IBM BPM. You should be able to use these out-of-box tools in your corresponding BPM environment.

Process and Instrumentation Monitors

In some cases, what you really care about is the performance of a process from your business perspective. In such cases, you need indicators to tell you about the performance inside the BPM engine, including:

  • How long a step of a process takes
  • Which process is the most complex
  • How many process instances were created for a process

BPM provides a Process Monitor and an Instrumentation Monitor to monitor the performance of a process. The Process Monitor enables the identification of currently running instances of processes and services. It enables you to stop items that are consuming large amounts of resources or stuck in an infinite loop due to process modeling bugs. The Instrumentation Monitor collects and displays detailed instrumentation data.

Note: Collected data may contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Process Monitor

To access the Process Monitor, log into the Process Admin Console (http://<server_ip>:<server_port>/ProcessAdmin) and select Monitoring => Process Monitor. Figure 7 shows the Summary page of the Process Monitor:

Figure 7. Process Monitor Summary page
Process Monitor Summary pag

The Process Monitor has the following pages:

  • Summary – shows you how many active services and processes are currently consuming CPU resources. It also shows which processes and services are most expensive in terms of the total time, total number of instances, and total number steps needed to execute them.
  • Processes - shows the data for all of the processes in the system.
  • Services – shows the data for all of the services in the system.

Figure 8 shows the Services page of the Process Monitor.

Figure 8. Process Monitor Services page
Process Monitor Services page

Click to see larger image

Figure 8. Process Monitor Services page

Process Monitor Services page

Instrumentation Monitor

This section tells you how to enables the display and collect instrumentation data. The Instrumentation Monitor is useful for identifying BPMN process instance performance bottlenecks, as well as capturing instrumentation data.

Log into the Process Admin Console and select Monitoring => Instrumentation Monitor. The current instrumentation data is displayed as shown in Figure 9.

Figure 9. Instrumentation Monitor
Instrumentation Monitor

Click to see larger image

Figure 9. Instrumentation Monitor

Instrumentation Monitor

If you want to reset the instrumentation data to display the latest data, click Refresh. If you want to automatically reset the instrumentation at set intervals, select a time unit from the Automatically refresh every list.

To start logging instrumentation data, click Start Logging. The instrumentation data is saved in a DAT format and the file is placed in the \logs folder. The exact path to the instrumentation log directory is shown on the Instrumentation Monitor page when you start logging. For example: <BPM_HOME>\AppServer\profiles\StandAloneProfile\.\logs\inst001.dat.

By checking the Instrumentation Monitor, you are able to see if system functions like EJB API, caches, and database queries are taking longer than usual.

Instrumentation log files show every command that is issued on the Process Server and how long each command takes to run. You should perform this analysis in conjunction with thread dumps and log files.

Figure 10. Instrumentation log file
Instrumentation log file

For more information, see Reading and decoding instrumentation files for Teamworks, WebSphere Lombardi Edition (WLE), and IBM Business Process Manager (BPM) products.

Process Inspector (BPMN)

If your process system is based on BPMN, the Process Inspector provides information about events and processes across your entire system, including information about such things as general metadata, current activities, timers, message events, orphaned tokens, and the status of tasks. In addition to viewing the detailed activity information about process instances, you can also interactively perform certain troubleshooting and maintenance tasks. For example, from the Process Inspector display, you can delete a process instance, or restart a group of failed process instances.

There are two types of Process Inspector interfaces: a web UI from the Process Admin console and a UI integrated with Process Designer.

From the web UI, you can look into details about a running process instance. The web UI also shows information and details about failing instances and allows you to suspend running instances to help narrow down the issues. Figure 11shows the web-based Process Inspector.

Figure 11. Web-based Process Inspector
Web-based Process Inspector

Click to see larger image

Figure 11. Web-based Process Inspector

Web-based Process Inspector

BPM provides the integrated Process Inspector as part of Process Designer. With the integrated Process Inspector, you can debug a process in Process Designer. When you select the Inspector tab on the Process Designer view, it switches to the Process Inspector. Click Start to start a process and simulate it in Process Designer and to monitor the state of the process. If you get an error on the process, you can easily identify the error and the location of it. Figure 12 shows an integrated Process Inspector:

Figure 12. Integrated Process Inspector
Integrated Process Inspector

Click to see larger image

Figure 12. Integrated Process Inspector

Integrated Process Inspector

Note: Collected data might contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Business Process Choreographer Explorer (BPEL) on BPM Advanced

If you are using BPEL, available only on IBM BPM Advanced, you can use the out-of-the-box, customizable Business Process Choreographer (BPC) Explorer to monitor BPEL processes. The BPEL Explorer provides rich features for those BPEL-based processes, including:

  • Default views for processes and human tasks (templates, running instances, insight view, and visualization)
  • Ability to create custom views to track and monitor specific instances
  • Lifecycle management
  • Ability to review and change process and task states
  • Ability to repair instances by modifying states and runtime variables
  • Additional capabilities such as migrating to new versions and changing ownership

You can access the BPC Explorer at: http://<server_ip>:<server_port>/bpc.

Figure 13. Business Process Choreographer BPEL Explorer
Business Process Choreographer BPEL Explorer

Event Manager Monitor

The Event Manager Monitor displays tasks and activities that were successfully scheduled, initiated, and are running in the Event Manager. It displays processes that are in the queue, running, or paused.

To access the Event Manager Monitor, log into the Process Admin Console and select Event Manager => Monitoring.

Figure 14. Event Manager Monitor
Event Manager Monitor

Note: When you enable the Event Manager to collect or monitor event data, the collected data may contain process data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Failed event management

Failed events are administered through the Failed Event Manager application in the BPM Advanced administrative console. It monitors and logs failed events for the following scenarios:

  • Runtime faults of asynchronous SCA/JMS/MQ invocations
  • Long-running BPEL process failures(stopped activities, failed and terminated process instances)
  • Business Flow Manager (BFM) infrastructure failures (hold queue messages are represented as failed events)
Figure 15. Failed Event Manager
Failed Event Manager

Features of the Failed Event Manager include:

  • Management of failed events (SCA, JMS or BPC related)
  • Search (all or by criteria, such as date, component, source or destination, and so on)
  • View failed events (payload, business data, and root cause exception)
  • Access to related components (such as BPC Explorer redirection)
  • Modify content (for example, if incorrect payload content or format is a root cause)
  • Delete or resubmit failed event messages

Notes:

  1. Trace can be enabled on demand for resubmission to further analyze related problems.
  2. Failed events may contain process data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Performance and troubleshooting tools

If you have encountered a performance issue with BPM, determining the cause can be a challenge because the BPM production environment has many components with varying topologies.

For example, in a smaller organization, all the BPM components can be installed on one machine, but in a large and growing organization, if the BPM-based system is a core system to the organization, the topology may be complex to achieve the high availability and throughput. Typically, a golden topology includes at least two load balancer servers, two BPM servers, and two database servers, with application clusters created on top of them. If there is a performance issue with a complex BPM topology, the challenge is that you may have to check both the network environment and the software environment.

Some of the aspects you may need to consider include: whether the database space is increased and exhausted; whether there is a memory leak in your customized application; whether the network is down; even whether the browser is encountering performance issues when executing large Javascript programs.

This section describes some of the tools and practices you may need to employ to diagnose a performance issue.

WebSphere Performance Monitoring Infrastructure

WebSphere Performance Monitoring Infrastructure provides the extensible infrastructure layer to enable applications to collect or view performance data. A Tivoli® Performance Viewer (TPV) is shipped with BPM. You can find it in on the Monitor and Tuning tab in the administrative console. If you have the WebSphere Performance Monitoring Infrastructure feature installed, you can use that to monitor your server performance.

Figure 16. Tivoli Performance Viewer
Tivoli Performance Viewer

Java dumps and cores

You may need to generate Java dumps and cores for your server in some performance cases. Log into the administrative console and select the Troubleshooting to go to the Java dumps and cores page. From this page, you can generate heap dumps, java cores and system dumps for specific servers. This data is needed when diagnosing performance or memory issues.

For information on how to analyze this data, refer to the Out of memory issues section.

Figure 17. Java dumps and cores
Java dumps and cores

Database traces

Database product-specific traces, such as the AWR report for Oracle® database, enable you to identify the most often executed queries, as well as determine the time spent for the most expensive queries.

Figure 18. Database trace
Database trace

Click to see larger image

Figure 18. Database trace

Database trace

You can turn on database trace in the IBM Business Process Manager repository by following the instructions in Collect troubleshooting data for database problems for IBM Business Process Manager (BPM).

Note: The collected data may contain process data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.

Security trace

In general, the IBM Business Process Manager products use the security features in WebSphere Application Server. For more information, see:

These features work for most of the security-related issues on the BPM server, especially BPM Advanced components. You can also use the IBM Business Process Manager security trace: WLE.wle_security=*.

Browser trace

Tools like Firefox® Firebug® and the Internet Explorer® Developer Tool, as well as browser-independent tools like HTTPWatch® and Fiddler enable you to examine the browser HTTP requests and corresponding response times. They also assist with troubleshooting HTTP request/response issues, such as portal and Coach-based issues, especially if they is related to expected performance, missing web resources, or custom JavaScript issues with Coach views.

Figure 19. Browser trace
Browser trace

Click to see larger image

Figure 19. Browser trace

Browser trace

Network trace

Network latency is one of the major reasons for slow response time between Process Designer and Process Center. There are lots of tools and the details for network issues, which are outside the scope of this article. Following are some tools you can use to diagnosis the network issues in some situations:

  • Use Fiddler for network latency issues to simulate network delay and observe the behavior.
  • Use Wireshark to help analyze network traffic.
  • Use tracert (for Windows) or Traceroute (for UNIX-based operating systems) to understand the packet network pathways and network hops between two machines.
  • Use ping to ensure optimal network connectivity and measure latency in milliseconds.
  • Use telnet to ensure access to specific ports on a remote machine.

Note: The collected data may contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.


Web services issues

If you suspect that there are problems with web services, you can turn on the appropriate web services traces. For more information, see Collect troubleshooting data for web services problems in IBM Business Process Manager.

In addition, you can use tools like SOAPUI or TCPMon to validate the WSDLs and set up a mock web services endpoint or proxy to capture incoming and outgoing messages to make sure they match expected values. Refer to Supported web services standards in the IBM BPM Information Center.

Note that the web service is based on the web and the network. So when you encounter an exception on the web service connection, the first thing you should check is the network status. For example, you should check that the target service is available from your BPM server. If HTTPs protocol was used verify that the SSL information has been imported into your server correctly.

IBM Integration Developer provides an integrated, Eclipse-based TCP/IP Monitor tool that enables you to monitor requests and responses to and from your server. To open the tool, select Window => Show View => TCP/IP Monitor. (For more information, refer to TCP/IP Monitor view in the WebSphere Application Server Information Center.

Figure 20. The TCP/IP Monitor
The TCP/IP Monitor

Note: The collected data may contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.


Out-of-memory issues

Out-of-memory issues are always sudden and often post-mortem analysis does not provide all the scenario information that is required for problem determination. There are many possible causes for out-of-memory issues, including:

  • Working with and uploading large documents
  • Creating a large number of instances and tasks in a very short time
  • Retrieving external data to populate a very large business object, such as one that has:
    • More than 100 member fields
    • Infinite recursive complex child members.

Following are procedures that you can perform to help narrow down the issue if you suspect the out-of-memory issue is due to execution of a particular process instance.

  1. Capture the java core and the associated heap dump resulting from the out-of-memory issue, which indicate the state of the JVM at the time of the failure, as well as possible objects that might have contributed to the issue. For example, they might show the product components that caused or contributed to this failure, or even the individual process application component that caused this failure.
  2. After restarting the server, go into the Web Process Inspector, examine all of the instances, and identify the last few instances that were updated prior to the out-of-memory issue. Those instances are the most likely candidates that caused the out-of-memory issue.
  3. After you narrow things down to the list of candidates, you need to examine the process design to determine the most likely steps that are causing the out-of-memory issue.

Figure 21 shows an example heap dump showing that there is a BPM business object(org/jdom/Element) that is occupying 1.4 GB.

Figure 21. Heap dump analysis out-of-memory sample
Heap dump analysis out-of-memory sample

Click to see larger image

Figure 21. Heap dump analysis out-of-memory sample

Heap dump analysis out-of-memory sample

Following are links to additional information on out-of-memory issues:


Application logging

It is a good idea to include application logging in the Process App or SCA applications.

For an SCA-based application (developed in the IBM Integration Designer in IBM BPM Advanced), you can turn on Cross-Component Trace and use IBM Integration Designer to analyze the execution path of SCA modules. The log will record:

  • The message process sequence from module to module.
  • The entering time and exiting time of an SCA invocation.

When performance issues occur, this trace provides a lot of useful information on how much time is consumed on your modules to help you determine the most time-consuming module.

For a BPMN-based application (developed in the Process Designer), you can turn on process instrumentation so that you can see which services and processes take the longest time to complete.

Note: Collected data may contain business data. Refer to Data sensitivity for information on de-sensitizing the sensitive business data.


Preventive measures

Preventing issues is always better than fixing them. In this section, we will talk about some recommended measures to follow in order to prevent the potential issues from occurring in the first place.

Maintain a clean system

Regular maintenance helps us better separate the normal behavior from the abnormal behavior. That can help us find the system error quickly and help us determine the exception area or possible cause more quickly. The following table gives some recommendations for ongoing system maintenance procedures and suggested frequencies for those procedures.

Table 3. Maintenance procedures
FrequencyProcedures
On a daily basis
  1. Verify that all error queues are empty.
    1. For SCA-based applications, check the failed event manager for any failed events.
    2. If you defined any exception queue for the connections component used on SCA import and export, check those queues also.
    3. For BPMN-based applications, check the Process Inspector to see if there are any failed instances.
  2. Check the server log files for errors and exceptions. Account for the error if it cannot be eliminated.
  3. Monitor the memory and CPU spikes.
  4. Look for failed instances and identify the root cause of the failure.
  5. Check the FFDC log files.
  6. Monitor the average growth in instances and task count. For BPMN processes, compare the changes in values in the LSW_PRI_KEY table.
On a weekly basis
  1. Check for product maintenance fixes.
  2. Check the product technotes for any issues or exceptions found in the BPM log files.
  3. Check for database log files and follow the recommendations for your database product to maintain a healthy database.
  4. Review the database performance report (take a 4-hour sample).
On a monthly basis Run regular process data cleanup to remove old or completed information.

Document your environment

A well-documented system environment makes problem determination simpler and easier to diagnosis. When an issue occurs, IBM BPM Support may ask for documentation to get information about your system. It's recommended that you document the following on your system:

  • Architecture (machines, releases and fixes applied, applications, databases, applications deployed, topology diagram, network, and so on)
    • The best way to summarize your environment for releases and fixes applied is to generate a versionInfo file for your BPM installation. The versionInfo script is located in the <BPM_Install_Root>/bin folder. For Windows, it will be versionInfo.bat, and for Linux/Unix it will be versionInfo.sh. It's recommended that you use the -long parameter to generate complete information. This parameter displays the details on fixpacks and ifixes.
    • To obtain application information, zip the entire <Profile_Root>/config folder, which contains all the configuration and application information for your environment.
  • Change logs that describe what is different today that might cause a problem.

Test updates

Testing is mandatory for any system change. It's best to prepare a testing environment and test changes thoroughly before going to production:

  • Test any configuration changes, process changes, and process migration in your pre-production environment
  • Set up a production-like or identical environment to reproduce issues or to perform test regression.

Tune for performance

Performance tuning is actually systematic engineering. You must consider the whole picture of your system, including hardware, network, software and any other components. Following are some best practices:

  • Start early. You should consider the performance when you start to design the system.
  • Don't make assumptions about the performance-related data, such as user access per hour, throughput of a single BPM node, and so on. Collect data from the real world or existing projects so that you can design your performance solution based on real data.
  • Tune the entire software stack, the operating system, the network, and especially the database.
  • Plan a long-term task that is specific to your environment and applications. Things may change over time: your database space may become full, the concurrent requests may grow, or your JVM heap size may not be enough for your application. Set up a plan to monitor the system performance frequently and immediately tune it to achieve high performance.

Look at the complete picture

Your BPM systems are running on hardware, software, and also networks. Don't focus only on the software, pay attention to the hardware and the network as well. Otherwise, you may miss obvious issues and waste time and money.

Don't monitor only product-related log files, dumps, and so on. You need to also monitor the operating system, database, and network behavior that might affect the middleware product. Get familiar with system-level metrics and tools (CPU utilization, hard disk paging and I/O, database throughput, network traffic, and so on.

Implement a diagnostic collection plan

Before a BPM system is launched to production, it will be helpful to prepare a diagnostic collection plan listing items you want to collect and how they are collected. This will help you:

  • Enable your team to react quickly when the system is having issues
  • Document and automate diagnostic artifact collection
  • Educate and train the team in data collection, tools, and analysis.

Use BPM monitoring tools

BPM has prepared effective monitoring tools for you, which should be used not only when there are issues, but on a daily basis to help you:

  • Identify runtime problems and performance bottlenecks
  • Predict upcoming problems and be pro-active

Back up your data

Back up your important data and configurations often to prevent loss of data when system crashes occur. You should also create a back-up plan before launching the production system. We recommend that you:

  • Create regular and automated backups, or maintenance windows, either online or offline.
  • Back up your data before significant system updates or modifications.

Conclusion

This article described the available tools and recommended approaches for different aspects of BPM problem determination, as well as preventive measures and best practices you can follow to help prevent problems from occurring in the first place.


Acknowledgements

The authors would like to thank the following people for their contribution of the content and their experience to this article: Dawn Ahukanna, Susan Herrmann, Meng Wang, Richard Metzger, Sandhya Kapoor, Ming Gao, Todd Deen, Matt Luczkowiak, Ray Tseng, Bill Wentworth, Lawrence Louie, Dave Booz and Don Bourne.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Business process management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Business process management
ArticleID=954589
ArticleTitle=Troubleshooting IBM Business Process Manager
publish-date=12042013