How to locate performance bottleneck in component stack IBM InfoSphere MDM Collaboration Server?

Question

I face slow performce in some cases while working with the User Interace of IBM InfoSphere MDM Collaboration Server (MDMCS). How can I locate the performance bottleneck within the product component stack?

Cause

Due to the variety of customization options and the multi-tier architectural configuration of the product, there are many factors that may contribute to slow performance.

Answer

This technote is intended to

Provide an overview of performance impacting components

and

Outline the approach you should take to locate the performance problem within the component stack

This will help to provide the diagnostic data to product support team, if required.

Figure 1 provides an overview of the components that impact performance

Figure 1: Layered view of main performance impacting components

By using a small set of tools and a coordinated reproduction approach, you will be able to identify the layer or component where most of the time is spent. Once identified, more detailed investigations can concentrate solely on the identified component. Figure 2 provides a graphical overview of tools to use for this task

Figure 2: Tools to use to locate the performance bottleneck in the component stack

What to collect in detail

Whenever you face a performance problem, try to exactly define the use case. Each use case needs to be handled separately and you must identify which specific action is slow. For example, when dealing with a slow item display issue, specify whether it is the time taken to load the item in the single edit window after clicking the item in left navigation pane or it is the time taken to expand a multi-occurrence group or the time taken to scroll down to the bottom of an item. In short, you must try to limit diagnostic data collection to specific use case.

In case of an user interface problem, you need to collect following information:

Overall execution time until the page is rendered completely
Use a stop watch to measure the total runtime and track the start and stop time (as close as possible) for the use case by running the “date” command on the MDMCS shell console. This time is used to correlate reproduction scenario with log file contents
Time to receive an http response from MDMCS server on client workstation
Use a http tracing tool like Fiddler or Internet Explorer Developer tools (Developer tools are available starting with IE9) to track when requests are sent from client to server and when responses are received. With Fiddler tool you can easily save the traced data into a file (.saz) for later analysis.
When does iexplore.exe start to peak in CPU usage and for how long?
Monitor it using Windows Task Manager(right mouse click on Windows Task bar -> Start Task Manager). Switch to the "Processes" tab, sort by "Image Name" and locate the iexplore.exe processes(Even if you have opened only one browser window, you might see 2 processes. Monitor the one with the higher memory footprint).
You might need to check the option "Always on top" in the option menu of Windows task manager so you can monitor both, the use case performance in Internet explorer and as well CPU usage of iexplore.exe
Execution time on MDMCS server and SQL runtime footprint
Use pimSupport.sh -t start / stop to capture all log files in debug mode automatically. You may use this script only if it is a "short" reproduction use case, otherwise data volume might be too high)
or
Manually enable debug mode (Technote 1383862) with sufficiently sized appenders and make sure to archive db.log* files immediately when the reproduction use case is finished to avoid log file wrapping
Monitor CPU and memory usage on MDMCS server
Detailed data might not be needed, but you should at least check whether there are peaks in CPU usage. You may use tools such as nmon, vmstat or top to monitor CPU usage on MDMCS server side. Optimally, record usage information in files for later review.
A simple command to track basic performance data is given below:

On Linux platform:
for i in $(seq 10); do echo "Collecting performance monitoring data"; date >>monitor.out; vmstat >>monitor.out; ps -eo pcpu,pid,user,args | sort -k 1 -r | head -5 >>monitor.out; echo "=========" >>monitor.out; sleep 2; done

On AIX platform:
for i in $(seq 2); do echo "Collecting performance monitoring data $seq"; date >>monitor.out; vmstat >>monitor.out; echo "" >>monitor.out; ps avwg | head -1>>monitor.out; ps avwg | sort +10n | tail -5 >>monitor.out; echo "=========" >>monitor.out; sleep 2; done

"seq 10": commands within the for loop are executed 10 times
"sleep 2": it defines to wait for 2 seconds at each loop.
You may change these values according to the duration of the performance use case

Sample Use case

When opening a step (clicking on verify attributes link) within the worklist summary page, it takes a long time for items to render in a multi edit page.

Observations & analysis

After clicking “verify attributes” link, iexplore.exe CPU started to peak after about 10 seconds and stayed high for another 25 seconds. In total, display of items finished after about 35 seconds
In Fiddler the response was received from MDMCS after about 10 seconds

Figure 3: Monitoring as done at client side: User interface, Windows Task Manager, Fiddler

Determination of the total MDMCS server runtime and SQL workload footprint
In a first iteration parse $TOP/appsverver_<hostname>/db.log files using Sumall utility with fromtime and threadlist option to identify the thread that was serving the request. The information gained by running the date command before starting the reproduction is used as input for the fromtime command. In the THREADSUMMARY listing one thread with name ItemListQueryStore.wpc is found, which matches the threadname as seen in Fiddler trace

java -jar Sumall.jar db.log* threadlist fromtime="2014-02-27 14:28:00"

…

992 1828 0.5 GEN_SEC_CMP_COMPANY_GETBYID] SELECT * FROM tsec_cmp_company WHERE cmp_

1566 2304 0.7 OBJECT_ACCESS_FOR_USER_ROLE_ALL_GROUPS] SELECT orm_access, orm_access_

2920 1600 1.8 GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co

-----------------------------------------------

sum(ms) cnt avg(ms) QUERY

Monitored timeperiod (MT): 2014-02-27 14:28:35,999 - 2014-02-27 16:04:49,337 : 1.60 h | 96.00 m | 5773.34 s

Statistical values for all queries

Total Count Average Total Total Runtime % of MT

(ms) (cnt) (avg ms) (minutes) (hours) (%)

14282 15754 0 0.2 0.0 0.2

PRINTING THREADSUMMARY

Total Count Average first timestamp last timestamp Thread

(ms) (cnt) (avg ms)

1 1 1.0 2014-02-27 14:28:35 unregistered (formely: jsp_274: QuickLinks.wpc)

30 14 2.1 2014-02-27 15:02:30 2014-02-27 15:02:30 logoutListener:277

69 97 0.7 2014-02-27 14:28:48 2014-02-27 16:04:49 heartbeat_appsvr

124 19 6.5 2014-02-27 14:32:41 2014-02-27 16:02:41 jvm_maintenance_appsvr

5661 9600 0.6 2014-02-27 14:28:59 2014-02-27 16:04:06 Thread-45

8397 6023 1.4 2014-02-27 14:28:36 2014-02-27 14:28:46 jsp_276: ItemListQueryStore.wpc

Total Count Average first timestamp last timestamp Thread

(ms) (cnt) (avg ms)

java -jar Sumall.jar db.log*

threadname

="jsp_276: ItemListQueryStore.wpc"

sum(ms) cnt avg(ms) QUERY

-----------------------------------------------

...

8 2 4.0 GEN_SEC_SCU_USER_GETBYID] SELECT * FROM tsec_scu_user WHERE scu_user_i

47 7 6.7 GEN_WFL_CAE_COLL_AREA_ENTRY_GETBYCASTEPPATHID] SELECT * FROM twfl_cae_

239 400 0.6 Dynamic: [ select distinct csm_spec_id from tctg_csm_category_spec_ma

326 400 0.8 POPULATE_CSM_MAP] SELECT coa_src_container_id FROM twfl_coa_collaborat

349 400 0.9 GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm

366 400 0.9 GEN_CTG_ITM_ITEM_GETBYPRIMARYKEYVERSION] SELECT * FROM tctg_itm_item W

369 400 0.9 GEN_CTG_ITD_ITEM_DETAIL_GETBYIDVERSION] SELECT * FROM tctg_itd_item_de

407 400 1.0 GET_COL_AREA_ENTRY_LOCKED_COUNT_IN_STEP] select count(*) from cae, cel

498 400 1.2 GET_CSM_INFO_FOR_ITEMSET] SELECT distinct csm_category_id, csm_spec_id

529 400 1.3 GET_ITM_INFO_FOR_ITM_LIST] SELECT itm_id, itm_primary_key, itm_version

594 400 1.5 GET_ICM_INFO_FOR_ITEMSET] SELECT * FROM tctg_icm_item_category_map WHE

784 400 2.0 GET_CA_HISTORY_BY_DATES_FOR_PK] SELECT * FROM twfl_ceh_coll_entry_hist

946 400 2.4 GET_ITD_FOR_ITEMSET] SELECT * FROM tctg_itd_item_detail WHERE itd_item

2920 1600 1.8 GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co

-----------------------------------------------

sum(ms) cnt avg(ms) QUERY

Monitored timeperiod (MT): 2014-02-27 14:28:36,138 - 2014-02-27 14:28:46,358 : 0.00 h | 0.00 m |

10.22 s

Statistical values for all queries

Total Count Average Total Total Runtime % of MT

(ms) (cnt) (avg ms) (minutes) (hours) (%)

8397

6023 1 0.1 0.0

82.2

Summary
As shown in Figure 4, the main performance bottleneck is located on client side. It took more than 25 seconds to render the page in Internet Explorer, which is over 70% of total runtime. Further detailed investigations would have to concentrate on client side.
Nevertheless it is seen that the SQL workload as well contributes to overall runtime (8s ~ 22%) and its impact can be even higher if the average SQL runtimes would be only slightly longer.

Figure 4: Summary of findings

What to provide to IBM InfoSphere MDM Collaboration Server support when being asked for the outlined use case tracking

Description of the reproduction scenario and use case. Provide screenshots along with the description
Total use case runtime as measured using the stop watch
Start time of the reproduction scenario (output of date command)
Start time and duration at which iexplore.exe started to peak CPU usage
http trace (e.g. Fiddler trace saved as .saz file)
db.log* files from $TOP/logs/appsvr_<hostname>
CPU monitoring data as collected on IBM InfoSphere MDM Collaboration Server server side

Tips

How to locate performance bottleneck in component stack IBM InfoSphere MDM Collaboration Server?

Question & Answer

Question

Cause

Answer

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?