IBM Support

IBM Engineering Lifecycle Query Engine performance is very poor for query and indexing on Windows servers

Troubleshooting


Problem

IBM Engineering Lifecycle Management applications that utilize Lifecycle Query Engine (LQE) and Link Indexing (LDX) activities, can be significantly impacted by poor performance of disk writes by the Microsoft Windows API, implementing the force() method from the JRE.

Symptom

You can experience one or more following symptoms when there is a need for the LQE/LDX system to make large writes to disk:
  • IBM Jazz Reporting Service (JRS) Report builder takes an inconsistent amount of time to complete, or times out and fails to return results.
  • Reports on Engineering Lifecycle Manager applications dashboards take an inconsistent amount of time to complete, or times out without completing the report.
  • In LQE and LDX running reindex or updating, the data source operation are significantly slower when using memory-mapped than direct mode option.
  • Loading links across applications and from Global Configurations in an ELM application can be slow.
  • LQE/LDX server can, but not necessary does not have to exhibit high CPU usage. There is no high memory or disk usage.

Cause

LQE/LDX application uses Jena TDB as the data store. Jena is optimized to use memory-mapped files to read and write the data from the disk so the operating system is managing caching.
The method to flush data to the disk on Windows uses FlushFileBuffer API call, that is much slower than the corresponding method under Linux. While reading from the index is fast with memory-mapped mode, writes are much slower in this mode due to slow performance of Windows API writing to the System File Cache.
You can find more information on Memory-Mapped I/O page how Memory-Mapped Mode vs Direct mode works.

 

Environment

The issue happens on Windows Operating System only.
The issue was reported on both bare metal and virtualized systems.
This issue is more often reported on virtualized systems, but that can be due to higher skill levels or optimal configurations of the infrastructures supporting the virtualization software.
The issue happens on Jena based systems (LQE, LDX, DNG in version 6.x).
Note: It is not seen on Linux as it has much better concurrency and multi-threading capability for disk I/O than Windows and has not reported this issue.

Diagnosing The Problem

  1. Gather 4-6 javacores in 30s intervals from LQE/LDX application at the moment when the reported symptoms are visible.
  2. Check whether the thread lqe.BatchWriter has the following stack trace is available in most of the cores:

    Thread Name: lqe.BatchWriter[TRS 2.0 for xxx Resources]:
    at java/nio/MappedByteBuffer.force0(Native Method)
    at java/nio/MappedByteBuffer.force(MappedByteBuffer.java:216(Compiled Code))
    at com/hp/hpl/jena/tdb/base/file/BlockAccessMapped.flushDirtySegments(BlockAccessMapped.java:247(Compiled Code))
    at com/hp/hpl/jena/tdb/base/file/BlockAccessMapped.force(BlockAccessMapped.java:269(Compiled Code))
    ...

Resolving The Problem

This issue is not resolved on Windows OS. IBM strongly recommends migrating to Linux platform for LQE/LDX.
When you observe the symptoms and the message from the diagnostics confirm the issue, you can follow these best practices to mitigate the issue:
Report Developer: Look to improve the report performance with help from Report Builder Best Practices.
LQE/LDX Server administrator:
  1. Make sure that the administrators apply all Best Practices for Configuring LQE For Performance and Scalability.
  2. Enable direct mode for reindexing activities when server usage is low, such as during a scheduled maintenance window. 
    Remember to reset memory-mapped mode once reindex has completed and before users start running reports.
  3. Deploy Lifecycle Query Engine on multiple nodes. This improves the time to run reports when the index is updating. 
  4. See jazz.net: Scaling the configuration-aware reporting environment for more details on determining whether the current reporting environment sizing or topology is sufficient to improve LQE availability while reindexing.
  5. Review the LQE Administration panel to identify which data source takes the longest time to update.
    Try to identify which updates in the application results in many updates to the TRS Feed for LQE to process. Can these updates be scheduled to reduce the impact on LQE? 
    For example, OSLC integrations performing bulk updates during peak times in Engineering Test Management (ETM).
  6. Use Resource-Intensive scenarios to understand application actions that lead to frequent updates to the TRS Feed for LQE.
     
When you apply these practices and you still experience inadequate performance, migrating to Linux platform is the only solution.

Document Location

Worldwide

[{"Type":"SW","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTU9C","label":"Jazz Reporting Service"},"ARM Category":[{"code":"a8m0z000000GmlzAAC","label":"Jazz Reporting Service-\u003ELifecycle Query Engine-\u003EPerformance \/ MBeans"}],"ARM Case Number":"TS004395894","Platform":[{"code":"PF033","label":"Windows"}],"Version":"All Version(s)"},{"Type":"SW","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSUVLZ","label":"IBM Engineering Requirements Management DOORS Next"},"ARM Category":[{"code":"a8m50000000Cj1WAAS","label":"DOORS Next-\u003EPerformance"}],"Platform":[{"code":"PF033","label":"Windows"}],"Version":"6.0.2;6.0.3;6.0.4;6.0.5;6.0.6"}]

Document Information

Modified date:
26 April 2023

UID

ibm16450437