MustGather for IBM Order Management Software Certified Containers: Performance Issues

Troubleshooting

Problem

This document helps you collect and share data required to diagnose Performance issues with IBM Order Management Software Certified Containers with IBM Support.

Quick Links:

Diagnosing The Problem

Collecting & Submitting Data

Review the environment-specific MustGather first and attach all requested diagnostics to the IBM Support case. Then collect additional data as outlined in the sections below.

Application server is frozen or unresponsive

Thread dumps (Application Server)

Identify the pod where slowness or a hang is observed.

Exec into the pod:

oc exec -it name-ibm-oms-ent-prod-appserver-om-app-5bb8bb74c6-p5tww /bin/bash

Inside the pod, list processes to identify the Java PID:

ps aux

Sample output:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
default        1  5.7 28.0 7458356 2233580 ?     SLsl Sep03 173:25 /opt/ibm/java/jre/bin/java -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -Djava.awt.headless=true -Djdk.attach.allowAttachSelf=true -Dvendor=websphere -
default      858  0.0  0.0  12024  2548 pts/0    Ss+  Sep04   0:00 /bin/bash
default     1624  0.0  0.0  12024  3184 pts/1    Ss+  14:41   0:00 /bin/bash
default     1631  0.5  0.0  12024  3300 pts/2    Ss   14:44   0:00 /bin/bash
default     1637  0.0  0.0  44632  3372 pts/2    R+   14:44   0:00 ps aux

Generate a Java core (thread dump) 3–4 times at 20–30 second intervals:
```
kill -3 <pid>
```
The files javacore.date.time.id.txt will be generated under the log directory as configured in values.yaml.

Statistics data

Capture statistics data while the issue is ongoing. Provide at least one hour of data from regular business hours or the time of the issue.

Logs & tracing

Provide two sets of logs:
- VERBOSE tracing on the identified Application or API.
- SQLDEBUG tracing on the identified Application or API.
Refer to the internal video that illustrates how to put components on trace.

Note: VERBOSE trace is not suitable for the production environment. If the issue can only be reproduced in production, perform a single-user test with UserTracing enabled instead.

GC and heap diagnostics

For slowness or OutOfMemory (OOM) issues, provide:
- Any generated verbose garbage collection (verboseGC) logs.
- Heap dumps.
If the issue is identified to be due to database locking, upload the requested database locking details.

Mitigation

After collecting diagnostics, restart the application server to mitigate the issue.

Optional: HealthCenter data for intermittent issues

If the issue is intermittent, configure HealthCenter to collect periodic diagnostic files:

In values.yaml, under appserver.jvm.params, add:

params:
  - -Xhealthcenter:level:headless
  - -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/opt/ibm/wlp/output/defaultServer
  - -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15
  - -Dcom.ibm.diagnostics.healthcenter.data.profiling=off
  - -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000
  - -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20
  - -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0

Upgrade the Helm deployment.
A healthcenter<ts>.hcd file will be created under the configured output directory every 15 minutes (per -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration).

Agent or Integration server is frozen or unresponsive Runtime

Thread dumps (Agent / Integration Server)

Identify the pod where the slowness or hang is observed.

Exec into the pod:

oc exec -it name-ibm-oms-ent-prod-scheduleorder-784c8689df-w67gj /bin/bash

Inside the pod, list processes to identify the Java PID:

ps aux

Sample output:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
omsuser        1  0.0  0.0  12024  1772 ?        Ss   03:17   0:00 /bin/sh /opt/ssfs/runtime/bin/agentserver.sh -jvmargs -Xms512m -Xmx1024m -Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthce
omsuser      118  0.0  0.0  12032  1668 ?        S    03:17   0:00 /bin/sh /opt/ssfs/runtime/bin/java_wrapper.sh -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateChec
omsuser      138  0.4  6.5 5899152 522580 ?      SLl  03:17   4:20 /opt/ssfs/runtime/jdk/bin/java -d64 -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateCheck=true -DI

Generate a Java core (thread dump) 3–4 times at 20–30 second intervals:
```
kill -3 <pid>   
```
The file javacore.date.time.id.txt will be generated under the log directory configured in values.yaml.

Statistics data

Capture the statistics data while the issue is ongoing. Provide at least one hour of data from regular business hours or the time of the issue.

Logs & tracing

Provide two sets of logs:
- VERBOSE tracing on the identified Agent or Integration Server.
- SQLDEBUG tracing on the identified Agent or Integration Server.
Refer to the video that illustrates how to put components on trace for additional details.

Note: VERBOSE trace is not suitable for the production environment. If the issue can only be reproduced in production, use a single-user test with UserTracing enabled instead.

GC and heap diagnostics

For slowness or OOM issues, provide any generated:
- verboseGC logs.
- Heap dumps.
If the issue is identified as database locking, upload the database locking details.

Mitigation

After collecting diagnostics, restart the Agent and/or Integration servers to mitigate the issue.

Optional: HealthCenter data for intermittent issues

If the issue is intermittent, configure HealthCenter on the Agent/Integration servers:

In values.yaml, under omserver.common.jvmArgs or omserver.servers.jvmArgs, set:

jvmArgs: "-Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/shared/agents/$(OM_POD_NAME)/hcd -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15 -Dcom.ibm.diagnostics.healthcenter.data.profiling=off -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000 -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0"

Upgrade the Helm deployment.
A healthcenter<ts>.hcd file will be created under the specified directory every 15 minutes.

Application Server Crash or Shut Down

Out Of Memory (OOM) and subsequent server crash can be due to either Java heap space exhaustion or StackOverflowError.

OOM due to Java heap space

If OOM occurs because of Java heap space:

Collect and upload:
- verboseGC logs.
- Memory heap dumps.
- Application Server error logs.

OOM due to StackOverflow

If OOM occurs due to stack overflow:

Collect and upload:
- Application Server error logs.
- Thread dumps (if available).

Tracing guidance

Avoid enabling VERBOSE trace during performance issues, as it introduces significant overhead and can worsen the problem.
If tracing is needed for functional debugging, keep VERBOSE trace enabled for a short duration only and never under full production load.
For production-only reproducible issues:
- Prefer single-user tests with UserTracing enabled.
- Use lightweight options such as TIMER or SQLDEBUG for short intervals if tracing is required.

Agent or Integration Server Crash or Shut Down

Out Of Memory (OOM) and server crash can be due to Java heap space or StackOverflow.

OOM due to Java heap space

If OOM occurs due to Java heap space, collect and upload:
- verboseGC logs.
- Memory heap dumps.

OOM due to StackOverflow

If OOM occurs due to StackOverflow, collect and upload:
- Server error logs.
- Thread dumps.

Additional logs

Upload the Application Server VERBOSE logs related to the Agent or Integration Server.
Refer to the video that illustrates how to put components on trace.

Note: VERBOSE trace is not suitable for production. If you can only reproduce the issue on production, consider running a single-user test with UserTracing enabled.

Agent Process is Slow or Messages Not Getting Consumed

When Agent processes are slow or messages are not being consumed, collect and upload the following diagnostics:

Apply SQLDEBUG tracing on the related Agent or Integration Server for approximately 5 minutes and upload the logs.
Capture database performance data:
- Oracle: AWR report.
- DB2: Output of db2collect (see database section).
Collect verboseGC logs for the Agent / Integration JVM.

Database Slowness or Locking

Statistics data

Capture statistics data while the issue is ongoing.
Provide at least an hour’s worth of data from regular business hours or during the period of the issue.

DB2

Run the oms_db2collect_v2.sh script to gather DB2 configuration and performance data. The collection is lightweight and gives a high-level overview.
Copy oms_db2collect_v2.sh to the DB2 server and set permissions:
```
chmod +rwx oms_db2collect_v2.sh
```
Execute the script:
```
./oms_db2collect_v2.sh <dbname>
```
The script generates a file named db2collect.<timestamp>.zip in the directory where it is run; upload this file.

Oracle

Generate an AWR report for the time period in question.

Intermittent issues with blocking locks

Set the following property in customer_overrides.properties:
```
yfs.yfs.app.identifyconnection=Y
```
Use the SQL provided in Blocking_Lock_SQLs.txt 3–4 times, about 1 minute apart, during the blocking lock.
These SQL statements will provide detailed information on all JVMs contributing to blocking locks.

References

OMS Statistics Data – sample SQLs

DB2

SELECT statistics_detail_key,
       start_time_stamp,
       end_time_stamp,
       server_name,
       server_id,
       hostname,
       service_name,
       service_type,
       context_name,
       statistic_name,
       statistic_value
  FROM yfs_statistics_detail
 WHERE statistics_detail_key LIKE '2021082514%';

Oracle

SELECT statistics_detail_key,
       TO_CHAR(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
       TO_CHAR(end_time_stamp,   'YYYY-MM-DD HH24:mi:SS'),
       server_name,
       server_id,
       hostname,
       service_name,
       service_type,
       context_name,
       statistic_name,
       statistic_value
  FROM yfs_statistics_detail
 WHERE statistics_detail_key LIKE '2021082514%'
 ORDER BY statistics_detail_key;

Generating verboseGC logs

Application Server

Add the following JVM arguments under appserver.jvm.params of values.yaml:

- -verbose:gc
- -Xverbosegclog:/shared/logs/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt
- -Xgcpolicy:gencon

Update the deployment.
verboseGC logs will be generated in the path specified in -Xverbosegclog.

Agent Server

Add the following JVM arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs in values.yaml:

jvmArgs: "-Xms512m -Xmx1024m -verbose:gc -Xverbosegclog:/shared/agents/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt --Xgcpolicy:gencon"

Update the deployment.
verboseGC logs will be generated in the directory configured in -Xverbosegclog.

Enabling Heap Dump generation on user events

Application Server

Add the following JVM arguments under appserver.jvm.params of values.yaml:
```
- -Xdump:heap+java:events=user
- -XX:HeapDumpPath=$(LOG_DIR)
```
Update the deployment.
Heap dumps will be generated in the path specified by -XX:HeapDumpPath.

Agent Server

Add the following JVM arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs of values.yaml:
```
jvmArgs: "-Xdump:heap+java:events=user -XX:HeapDumpPath=/shared/agents/$(OM_POD_NAME)"
```
Update the deployment.
Heap dumps will be generated in the directory specified by -XX:HeapDumpPath.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SS6PEW","label":"Sterling Order Management"},"ARM Category":[{"code":"a8m0z000000cy0AAAQ","label":"Install and Deploy"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.0"}]

Tips

MustGather for IBM Order Management Software Certified Containers: Performance Issues

Troubleshooting

Problem

Diagnosing The Problem

Thread dumps (Application Server)

Statistics data

Logs & tracing

GC and heap diagnostics

Mitigation

Optional: HealthCenter data for intermittent issues

Thread dumps (Agent / Integration Server)

Statistics data

Logs & tracing

GC and heap diagnostics

Mitigation

Optional: HealthCenter data for intermittent issues

OOM due to Java heap space

OOM due to StackOverflow

Tracing guidance

OOM due to Java heap space

OOM due to StackOverflow

Additional logs

Statistics data

DB2

Oracle

Intermittent issues with blocking locks

OMS Statistics Data – sample SQLs

DB2

Oracle

Generating verboseGC logs

Application Server

Agent Server

Enabling Heap Dump generation on user events

Application Server

Agent Server

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?