IBM Support

Sharing one real case about slowness in BPM Production Environment due to long running query.

Technical Blog Post


Abstract

Sharing one real case about slowness in BPM Production Environment due to long running query.

Body

     

In this article, I want to share one real case on how to resolve slowness on BPM Production Environment due to long running query.

The end user found that task page is not getting loaded after clicking on task in Work List. In Browser console it's showing below error


GET https://<hostname:port>/rest/bpm/wle/v1/task?parts=actions 400 (Bad Request)

We asked customer to collect the browser hosted client side trace and co-incidental server logs together

Note: Here are the steps for Fiddler trace collection:
1) Download the tool from
http://www.telerik.com/download/fiddler and install it
2) Open Fiddler
3) Select Tools > Fiddler Options > HTTPS > Decrypt HTTPS traffic (from all processes)
4) Select File > Capture Traffic (Capture Traffic should be checked)
5) Open web browser, clear browser cache, also clear the old record in Fiddler, and then reproduce the problem
6) After the problem is reproduced, save all the sessions to .saz file and send it to us
 
If your unable to install Fiddler, you can use Chrome developer tools or IE developer tools to capture the browser side trace.
- HAR captured through Chrome Developer tools (F12) network tab or exported from HTTP watch
- XML network traffic captured via IE (F12) developer tools network tab.

Here is a video that demonstrates how to collect the browser debugging trace.
http://www-01.ibm.com/support/docview.wss?uid=swg21666382

 

By checking the provided fiddler trace, we found that

GET https://<hostname:port>/rest/bpm/wle/v1/task/20041741?parts=actions

200 OK (application/json)

took 3:30 second

 

So the root cause of the problem is not "Status: 0", but REST call response is taking a long time to return:

 

=> BPM Server -> SystemOut.log:

000032e9 LoggingUtils I  The elapsed time for the following REST request was 209.96 seconds, which exceeds the warning threshold of 60 seconds: https://<hostname:port>/rest/bpm/wle/v1/task/20041741

 

=> BPM Server -> trace.log:

 

A lot of statements like

select USER_ID,USER_NAME,FULL_NAME,PROVIDER,USER_STATE,LAST_STATE_MODIFIED from LSW_USR_XREF where USER_ID = ?

                 TYPE FORWARD ONLY (1003)

                 CONCUR READ ONLY (1007)

 

select t0.USER_ID,t0.USER_NAME,t0.FULL_NAME,t0.PROVIDER,t0.USER_STATE,t0.LAST_STATE_MODIFIED from LSW_USR_XREF t0 where UPPER(USER_NAME) = UPPER(?)

I asked the user to run all queries listed on following web page. Result output should be in CSV format

https://github.com/stephan-volz/database-scripts/blob/master/DB2_ProcessInstances_Tasks.txt

This will show, if the user is lacking regular housekeeping for their BPM production environment.

Often way to huge data volume in DB is root cause for slow select statements.


So we asked the user to perform housekeeping.

Housekeeping: https://www.ibm.com/developerworks/bpm/bpmjournal/1312_spriet/1312_spriet.html

After doing housekeeping, the user needs to do reindexing of some BPM tables according to https://developer.ibm.com/answers/questions/193035/after-deleting-content-in-ibm-bpm-what-tables-shou/

Apart from that, we also asked the user to apply performance tuning according to below links.


DB Performance Tuning: https://www.ibm.com/developerworks/bpm/bpmjournal/1509_volz1-trs/1509_volz1.html

Search acceleration tools: https://www.ibm.com/support/knowledgecenter/SSFPJS_8.5.7/com.ibm.wbpm.admin.doc/topics/ctuningtools.html

But the user's DBA still found below query to be getting executed frequently in their production database

select t0.USER_ID,t0.USER_NAME,t0.FULL_NAME,t0.PROVIDER,t0.USER_STATE,t0.LAST_STATE_MODIFIED from LSW_USR_XREF t0 where UPPER(USER_NAME) = UPPER(?)

Possible solution:

We found that one known APAR JR58657, which is much matched with this case but there is no ifix feasible for this end user's product version (BPM 8570.2017.06).

Then BPM developer mentioned that the performance should be solvable through an Index on USER_NAME in LSW_USR_XREF.
This user is on BPM 8570.2017.06. Recently in BPM 8600.xxx or newer version we introduced a couple of new indexes.

So finally the execution time of the query was significantly improved in this end user's production environment by creating the following index:

CREATE INDEX @SCHEMA@.IX_UXREF_USERNAME ON @SCHEMA@.LSW_USR_XREF(UPPER(USER_NAME))   

I wish this article can give you some help when you meet this kind of similar issue.

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11080795