Monitoring and tuning a BPM platform at the system level is far away from getting process tracking points data and build nice scoreboards showing the process performance. But at the end if the system doesn't perform well the users will complain about how it's affecting their performance and how that is impacting the process SLAs. And if that happens the BPM "system" will be seen as the reason for the bad business performance. Besides all that no one likes to use a system that is slow and has bad responsiveness.
Talking about performance...the two common suspects
At the core of a IBM BPM Process Server or Process Center is a Websphere Cell and right under and supporting its functioning on the persistence side is a database. When comes to understand how a BPM system is running and behaving, it's fundamental that this two core pieces are well monitored and around are the right skills to understand what's happening in order to tune it.
The database...one of two
On the DB side, depending on which one it is (DB2, Oracle, SQL Server), the tools may vary but the skills are conceptually the same: Well prepared and skilled DBAs monitoring and understanding what isn't performing as fast as it should, findings ways to improve it and providing useful information to the development teams on what can be better done. And as the databases are getting more "intelligent" and more complex algorithms and heuristics are used for several different things, including execution plans and special features a good skilled and updated DBA is fundamental.
The application server...the other obvious one
On the Websphere Application Server (WAS) side there are several different ways and tools to monitor its functioning. Probably the most known is the Tivoli Performance Viewer (TPV) integrated on the Admin Console. TPV uses what's known as Websphere Performance Monitoring Infrastructure (PMI) displaying several different reports that can be customized to include different indicators about everything that is going on each server (JVM) inside the WAS cell.
Monitoring WAS performance with Tivoli Performance Viewer
Basically you start by selecting the server or servers that you want to monitor and the modules to view on each. There are summary reports and more complete ones with all the kinds of counters and there's also the ability to save a log for future analysis.
Now, let's assume that you have an IBM BPM Process Server with a four-cluster topology in 2 Nodes and you scaled the AppTarget cluster inside which you have for instance 12 servers like represented on the following diagram:
If we count all the JVMs in the cell (even not counting the deployment manager and the two node agents) we have a total of 18.
Now let's imagine that we want to keep an eye on what's going on with these servers and we'll use PMI. Our major issue will be that we'll be jumping between webpages and inside each on these pages we'll have only data reported for a single server. We won't be able to have a single page with data from several different servers.
Websphere performance tuning toolkit
Websphere Application Server Performance Tuning Toolkit (PTT) is an eclipse based tool specifically designed to help users monitor their system using data analysis and statistical inference technology. Through JMX it collects data from the same exact PMI than TPV which means that you don't need to install anything server side. But it's much more than just collecting and displaying for real time monitoring. Actually, we can define rules suitable to our own system that will enable performance problem detection and diagnostics.
The following chart, included in the article http://www.ibm.com/developerworks/websphere/downloads/peformtuning.html explains its main functionalities:
An important aspect is that the collected data is stored into a data cube meaning that you can later query it and compare past behaviors with current ones.
How to start
We can use PTT as one of the IBM Support Assistant (ISA) tools or as a standalone tool that is available for Windows and Linux. After starting it we shall add the hosts (actually WAS cells) to monitor and talking about a BPM environment it could be like one for the the Development Process Center, another corresponding to the Q&A Process Server and finally another one for the Production Process Server. Creating these entries it's just about specifying a name (ex: PRD), the IP or host name for the deployment manager of the cell, the SOAP port which default is 8879, and then an user and password.
After having these entries created and properly configured we just need to select the host (cell) to monitor and connect to it. PTT automaticly gets the cell topology and server information.
What do you get?
The following picture shows an example of what you get after connecting to a host:
You can see the list of hosts on the top left area and for the selected one you get the cell topology. That includes the deployment manager and all the node agents as well as all the servers in each of these nodes.
The central area of the previous picture shows for all the servers in the cell a subset of the major critical counters. And checking these counters for all the servers is just about scrolling down that page. This minor aspect makes a huge difference compared to browsing different pages (for each server) specially when the number of servers is high. The refreshing interval which by default is 1 minute, could be customized. At each iteration all the data is collected from the host and stored on the PTT cube.
Graphical view for the selected server
The details for each server can be monitored by double-clicking the server name on the bottom-left panel or any counter for that server on the overview dashboard. By doing that we'll get to the Monitor tab with several charts for different viewers.
The information available in some of the tabs is not only about showing the collected data from the PMI. We have alerts based on some analysis made over that data, as well as data correlation and statistics.
Drill-down to detailed data
From the charts on the Monitor tab we can also drill-down to detailed data for each of the views. The following picture shows the drill-down on the JDBC Response Time:
Besides having one section with data for the JdbcConnectionPool, there's a special option in the menu that gives us yet further information about each connection.
Review and edit tuning parameteres
And if the user as the right permissions, another nice feature is about being able to load and eventually update a considerable set of important parameters for each server. It's the bottom section called Tuning Parameters from where we can tune heap sizes, several thread pools as well as other important parameters. Again, comparing these values in a single page for the several servers is much easier and intuitive than browsing between different pages for each server. We have two available actions regarding these: Retrieve and Update configurations (rever as opções), like shown on the following picture:
Generate Heap Dumps, Thread Dumps and Enable Traces
After selecting a server is also available the Operation menu from where we can generate Thread Dumps, Heap Dumps and Enable Traces.
Generate Performance Report
Another important feature is about being able to generate a complete report that can also be exported as PDF, Powerpoint or Word format.
Besides having much more features than just monitoring, the way this tool is designed and the ability to keep an eye on several critical indicators for each of the servers in a cell makes it really differentiator when compared to other options like PMI. Monitoring is crutial for the sucess of any system and in a complex "stack" like a BPM product, being able to understand and measure what's going on is key. At the end you can't control what you can't measure just because you can't even understand it.
Tuning is as much of a science as an art but again to apply any kind of changes in order to improve a system, we need to:
- be able to monitor it before making the changes - informed decisions are crutial
- be able to monitor it after making the changes - evaluate the results of the tuning actions is also crutial
What this means is that there's no tuning without strong monitoring and good tools are essential for that.