IBM Support

How to locate performance bottleneck in component stack IBM InfoSphere MDM Collaboration Server?

Question & Answer


Question

I face slow performce in some cases while working with the User Interace of IBM InfoSphere MDM Collaboration Server (MDMCS). How can I locate the performance bottleneck within the product component stack?

Cause

Due to the variety of customization options and the multi-tier architectural configuration of the product, there are many factors that may contribute to slow performance.

Answer

This technote is intended to

  • Provide an overview of performance impacting components

and
  • Outline the approach you should take to locate the performance problem within the component stack

This will help to provide the diagnostic data to product support team, if required.

Figure 1 provides an overview of the components that impact performance



Figure 1: Layered view of main performance impacting components

By using a small set of tools and a coordinated reproduction approach, you will be able to identify the layer or component where most of the time is spent. Once identified, more detailed investigations can concentrate solely on the identified component. Figure 2 provides a graphical overview of tools to use for this task


Figure 2: Tools to use to locate the performance bottleneck in the component stack

What to collect in detail


Whenever you face a performance problem, try to exactly define the use case. Each use case needs to be handled separately and you must identify which specific action is slow. For example, when dealing with a slow item display issue, specify whether it is the time taken to load the item in the single edit window after clicking the item in left navigation pane or it is the time taken to expand a multi-occurrence group or the time taken to scroll down to the bottom of an item. In short, you must try to limit diagnostic data collection to specific use case.

In case of an user interface problem, you need to collect following information:
  • Overall execution time until the page is rendered completely
    Use a stop watch to measure the total runtime and track the start and stop time (as close as possible) for the use case by running the “date” command on the MDMCS shell console. This time is used to correlate reproduction scenario with log file contents
  • Time to receive an http response from MDMCS server on client workstation
    Use a http tracing tool like Fiddler or Internet Explorer Developer tools (Developer tools are available starting with IE9) to track when requests are sent from client to server and when responses are received. With Fiddler tool you can easily save the traced data into a file (.saz) for later analysis.
  • When does iexplore.exe start to peak in CPU usage and for how long?
    Monitor it using Windows Task Manager(right mouse click on Windows Task bar -> Start Task Manager). Switch to the "Processes" tab, sort by "Image Name" and locate the iexplore.exe processes(Even if you have opened only one browser window, you might see 2 processes. Monitor the one with the higher memory footprint).
    You might need to check the option "Always on top" in the option menu of Windows task manager so you can monitor both, the use case performance in Internet explorer and as well CPU usage of iexplore.exe
  • Execution time on MDMCS server and SQL runtime footprint
    Use pimSupport.sh -t start / stop to capture all log files in debug mode automatically. You may use this script only if it is a "short" reproduction use case, otherwise data volume might be too high)
    or
    Manually enable debug mode (Technote 1383862) with sufficiently sized appenders and make sure to archive db.log* files immediately when the reproduction use case is finished to avoid log file wrapping
  • Monitor CPU and memory usage on MDMCS server
    Detailed data might not be needed, but you should at least check whether there are peaks in CPU usage. You may use tools such as nmon, vmstat or top to monitor CPU usage on MDMCS server side. Optimally, record usage information in files for later review.
    A simple command to track basic performance data is given below:

    On Linux platform:
    for i in $(seq 10); do echo "Collecting performance monitoring data"; date >>monitor.out; vmstat >>monitor.out; ps -eo pcpu,pid,user,args | sort -k 1 -r | head -5 >>monitor.out; echo "=========" >>monitor.out; sleep 2; done

    On AIX platform:
    for i in $(seq 2); do echo "Collecting performance monitoring data $seq"; date >>monitor.out; vmstat >>monitor.out; echo "" >>monitor.out; ps avwg | head -1>>monitor.out; ps avwg | sort +10n | tail -5 >>monitor.out; echo "=========" >>monitor.out; sleep 2; done

    "seq 10": commands within the for loop are executed 10 times
    "sleep 2": it defines to wait for 2 seconds at each loop.
    You may change these values according to the duration of the performance use case

Sample Use case

When opening a step (clicking on verify attributes link) within the worklist summary page, it takes a long time for items to render in a multi edit page.



Observations & analysis
  • After clicking “verify attributes” link, iexplore.exe CPU started to peak after about 10 seconds and stayed high for another 25 seconds. In total, display of items finished after about 35 seconds
  • In Fiddler the response was received from MDMCS after about 10 seconds

Figure 3: Monitoring as done at client side: User interface, Windows Task Manager, Fiddler

  • Determination of the total MDMCS server runtime and SQL workload footprint
    In a first iteration parse $TOP/appsverver_<hostname>/db.log files using Sumall utility with fromtime and threadlist option to identify the thread that was serving the request. The information gained by running the date command before starting the reproduction is used as input for the fromtime command. In the THREADSUMMARY listing one thread with name ItemListQueryStore.wpc is found, which matches the threadname as seen in Fiddler trace

  • java -jar Sumall.jar db.log* threadlist fromtime="2014-02-27 14:28:00"

          992    1828       0.5   GEN_SEC_CMP_COMPANY_GETBYID] SELECT * FROM tsec_cmp_company WHERE cmp_
         1566    2304       0.7   OBJECT_ACCESS_FOR_USER_ROLE_ALL_GROUPS] SELECT orm_access, orm_access_
         2920    1600       1.8   GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co
    -----------------------------------------------
       sum(ms)    cnt   avg(ms)  QUERY
    Monitored timeperiod (MT): 2014-02-27 14:28:35,999  -  2014-02-27 16:04:49,337 :  1.60 h | 96.00 m | 5773.34 s

    Statistical values for all queries
    Total      Count      Average    Total        Total Runtime    % of MT
    (ms)       (cnt)      (avg ms)   (minutes)    (hours)            (%)
        14282   15754         0         0.2         0.0              0.2

    PRINTING THREADSUMMARY
    Total      Count      Average    first timestamp       last timestamp      Thread
    (ms)       (cnt)      (avg ms)
            1       1          1.0   2014-02-27 14:28:35                       unregistered (formely: jsp_274: QuickLinks.wpc)
           30      14          2.1   2014-02-27 15:02:30   2014-02-27 15:02:30 logoutListener:277
           69      97          0.7   2014-02-27 14:28:48   2014-02-27 16:04:49 heartbeat_appsvr
          124      19          6.5   2014-02-27 14:32:41   2014-02-27 16:02:41 jvm_maintenance_appsvr
         5661    9600          0.6   2014-02-27 14:28:59   2014-02-27 16:04:06 Thread-45
         8397    6023          1.4   2014-02-27 14:28:36   2014-02-27 14:28:46 jsp_276: ItemListQueryStore.wpc
    Total      Count      Average    first timestamp       last timestamp      Thread
    (ms)       (cnt)      (avg ms)


    Once the thread is identified, generate the Runtime and SQL workload statistics for the thread:

    java -jar Sumall.jar db.log* threadname="jsp_276: ItemListQueryStore.wpc"
       sum(ms)    cnt   avg(ms)  QUERY
    -----------------------------------------------
     ...
            8       2       4.0   GEN_SEC_SCU_USER_GETBYID] SELECT * FROM tsec_scu_user WHERE scu_user_i
           47       7       6.7   GEN_WFL_CAE_COLL_AREA_ENTRY_GETBYCASTEPPATHID] SELECT * FROM twfl_cae_
          239     400       0.6   Dynamic:  [ select distinct csm_spec_id from tctg_csm_category_spec_ma
          326     400       0.8   POPULATE_CSM_MAP] SELECT coa_src_container_id FROM twfl_coa_collaborat
          349     400       0.9   GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm
          366     400       0.9   GEN_CTG_ITM_ITEM_GETBYPRIMARYKEYVERSION] SELECT * FROM tctg_itm_item W
          369     400       0.9   GEN_CTG_ITD_ITEM_DETAIL_GETBYIDVERSION] SELECT * FROM tctg_itd_item_de
          407     400       1.0   GET_COL_AREA_ENTRY_LOCKED_COUNT_IN_STEP] select count(*) from cae, cel
          498     400       1.2   GET_CSM_INFO_FOR_ITEMSET] SELECT distinct csm_category_id, csm_spec_id
          529     400       1.3   GET_ITM_INFO_FOR_ITM_LIST] SELECT itm_id, itm_primary_key, itm_version
          594     400       1.5   GET_ICM_INFO_FOR_ITEMSET] SELECT * FROM tctg_icm_item_category_map WHE
          784     400       2.0   GET_CA_HISTORY_BY_DATES_FOR_PK] SELECT * FROM twfl_ceh_coll_entry_hist
          946     400       2.4   GET_ITD_FOR_ITEMSET] SELECT * FROM tctg_itd_item_detail WHERE itd_item
         2920    1600       1.8   GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co
    -----------------------------------------------
       sum(ms)    cnt   avg(ms)  QUERY

    Monitored timeperiod (MT): 2014-02-27 14:28:36,138  -  2014-02-27 14:28:46,358 :  0.00 h | 0.00 m | 10.22 s

    Statistical values for all queries
    Total      Count      Average    Total        Total Runtime    % of MT
    (ms)       (cnt)      (avg ms)   (minutes)    (hours)            (%)
         8397    6023         1         0.1         0.0             82.2

    Based on this output, the following conclusion can be reached:
    - The thread was running for 10.22 seconds on the MDMCS server side
    - During that time, a large number of 6023 SQL queries were triggered with an accumulated runtime of 8397 milliseconds, which contributed to 82.2% of the total runtime. In case average SQL runtimes would be shorter, the SQL workload and thus its contribution to total use case runtime would be even higher.
Summary
As shown in Figure 4, the main performance bottleneck is located on client side. It took more than 25 seconds to render the page in Internet Explorer, which is over 70% of total runtime. Further detailed investigations would have to concentrate on client side.
Nevertheless it is seen that the SQL workload as well contributes to overall runtime (8s ~ 22%) and its impact can be even higher if the average SQL runtimes would be only slightly longer.

Figure 4: Summary of findings

What to provide to IBM InfoSphere MDM Collaboration Server support when being asked for the outlined use case tracking

  • Description of the reproduction scenario and use case. Provide screenshots along with the description
  • Total use case runtime as measured using the stop watch
  • Start time of the reproduction scenario (output of date command)
  • Start time and duration at which iexplore.exe started to peak CPU usage
  • http trace (e.g. Fiddler trace saved as .saz file)
  • db.log* files from $TOP/logs/appsvr_<hostname>
  • CPU monitoring data as collected on IBM InfoSphere MDM Collaboration Server server side

[{"Product":{"code":"SS2U2U","label":"InfoSphere Master Data Management Collaboration Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"10.0;10.1;11.0;11.3;11.4","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 June 2018

UID

swg21969721