Troubleshooting
Problem
Diagnosing The Problem
Collecting & Submitting Data
Review the environment-specific MustGather first and attach all requested diagnostics to the IBM Support case. Then collect additional data as outlined in the sections below.
Application server is frozen or unresponsive
Thread dumps (Application Server)
- Identify the pod where slowness or a hang is observed.
Exec into the pod:
oc exec -it name-ibm-oms-ent-prod-appserver-om-app-5bb8bb74c6-p5tww /bin/bashInside the pod, list processes to identify the Java PID:
ps auxSample output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND default 1 5.7 28.0 7458356 2233580 ? SLsl Sep03 173:25 /opt/ibm/java/jre/bin/java -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -Djava.awt.headless=true -Djdk.attach.allowAttachSelf=true -Dvendor=websphere - default 858 0.0 0.0 12024 2548 pts/0 Ss+ Sep04 0:00 /bin/bash default 1624 0.0 0.0 12024 3184 pts/1 Ss+ 14:41 0:00 /bin/bash default 1631 0.5 0.0 12024 3300 pts/2 Ss 14:44 0:00 /bin/bash default 1637 0.0 0.0 44632 3372 pts/2 R+ 14:44 0:00 ps auxGenerate a Java core (thread dump) 3–4 times at 20–30 second intervals:
kill -3 <pid>- The files
javacore.date.time.id.txtwill be generated under the log directory as configured invalues.yaml.
Statistics data
Capture statistics data while the issue is ongoing. Provide at least one hour of data from regular business hours or the time of the issue.
Logs & tracing
- Provide two sets of logs:
- VERBOSE tracing on the identified Application or API.
- SQLDEBUG tracing on the identified Application or API.
- Refer to the internal video that illustrates how to put components on trace.
Note: VERBOSE trace is not suitable for the production environment. If the issue can only be reproduced in production, perform a single-user test with UserTracing enabled instead.
GC and heap diagnostics
- For slowness or OutOfMemory (OOM) issues, provide:
- Any generated verbose garbage collection (verboseGC) logs.
- Heap dumps.
- If the issue is identified to be due to database locking, upload the requested database locking details.
Mitigation
After collecting diagnostics, restart the application server to mitigate the issue.
Optional: HealthCenter data for intermittent issues
If the issue is intermittent, configure HealthCenter to collect periodic diagnostic files:
In
values.yaml, underappserver.jvm.params, add:params: - -Xhealthcenter:level:headless - -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/opt/ibm/wlp/output/defaultServer - -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15 - -Dcom.ibm.diagnostics.healthcenter.data.profiling=off - -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000 - -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20 - -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0- Upgrade the Helm deployment.
- A
healthcenter<ts>.hcdfile will be created under the configured output directory every 15 minutes (per-Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration).
Agent or Integration server is frozen or unresponsive Runtime
Thread dumps (Agent / Integration Server)
- Identify the pod where the slowness or hang is observed.
Exec into the pod:
oc exec -it name-ibm-oms-ent-prod-scheduleorder-784c8689df-w67gj /bin/bashInside the pod, list processes to identify the Java PID:
ps auxSample output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND omsuser 1 0.0 0.0 12024 1772 ? Ss 03:17 0:00 /bin/sh /opt/ssfs/runtime/bin/agentserver.sh -jvmargs -Xms512m -Xmx1024m -Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthce omsuser 118 0.0 0.0 12032 1668 ? S 03:17 0:00 /bin/sh /opt/ssfs/runtime/bin/java_wrapper.sh -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateChec omsuser 138 0.4 6.5 5899152 522580 ? SLl 03:17 4:20 /opt/ssfs/runtime/jdk/bin/java -d64 -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateCheck=true -DIGenerate a Java core (thread dump) 3–4 times at 20–30 second intervals:
kill -3 <pid>- The file
javacore.date.time.id.txtwill be generated under the log directory configured invalues.yaml.
Statistics data
Capture the statistics data while the issue is ongoing. Provide at least one hour of data from regular business hours or the time of the issue.
Logs & tracing
- Provide two sets of logs:
- VERBOSE tracing on the identified Agent or Integration Server.
- SQLDEBUG tracing on the identified Agent or Integration Server.
- Refer to the video that illustrates how to put components on trace for additional details.
Note: VERBOSE trace is not suitable for the production environment. If the issue can only be reproduced in production, use a single-user test with UserTracing enabled instead.
GC and heap diagnostics
- For slowness or OOM issues, provide any generated:
- verboseGC logs.
- Heap dumps.
- If the issue is identified as database locking, upload the database locking details.
Mitigation
After collecting diagnostics, restart the Agent and/or Integration servers to mitigate the issue.
Optional: HealthCenter data for intermittent issues
If the issue is intermittent, configure HealthCenter on the Agent/Integration servers:
In
values.yaml, underomserver.common.jvmArgsoromserver.servers.jvmArgs, set:jvmArgs: "-Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/shared/agents/$(OM_POD_NAME)/hcd -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15 -Dcom.ibm.diagnostics.healthcenter.data.profiling=off -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000 -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0"- Upgrade the Helm deployment.
- A
healthcenter<ts>.hcdfile will be created under the specified directory every 15 minutes.
Application Server Crash or Shut Down
Out Of Memory (OOM) and subsequent server crash can be due to either Java heap space exhaustion or StackOverflowError.
OOM due to Java heap space
If OOM occurs because of Java heap space:
- Collect and upload:
- verboseGC logs.
- Memory heap dumps.
- Application Server error logs.
OOM due to StackOverflow
If OOM occurs due to stack overflow:
- Collect and upload:
- Application Server error logs.
- Thread dumps (if available).
Tracing guidance
- Avoid enabling VERBOSE trace during performance issues, as it introduces significant overhead and can worsen the problem.
- If tracing is needed for functional debugging, keep VERBOSE trace enabled for a short duration only and never under full production load.
- For production-only reproducible issues:
- Prefer single-user tests with UserTracing enabled.
- Use lightweight options such as TIMER or SQLDEBUG for short intervals if tracing is required.
Agent or Integration Server Crash or Shut Down
Out Of Memory (OOM) and server crash can be due to Java heap space or StackOverflow.
OOM due to Java heap space
- If OOM occurs due to Java heap space, collect and upload:
- verboseGC logs.
- Memory heap dumps.
OOM due to StackOverflow
- If OOM occurs due to StackOverflow, collect and upload:
- Server error logs.
- Thread dumps.
Additional logs
- Upload the Application Server VERBOSE logs related to the Agent or Integration Server.
- Refer to the video that illustrates how to put components on trace.
Note: VERBOSE trace is not suitable for production. If you can only reproduce the issue on production, consider running a single-user test with UserTracing enabled.
Agent Process is Slow or Messages Not Getting Consumed
When Agent processes are slow or messages are not being consumed, collect and upload the following diagnostics:
- Apply SQLDEBUG tracing on the related Agent or Integration Server for approximately 5 minutes and upload the logs.
- Capture database performance data:
- Oracle: AWR report.
- DB2: Output of
db2collect(see database section).
- Collect verboseGC logs for the Agent / Integration JVM.
Database Slowness or Locking
Statistics data
- Capture statistics data while the issue is ongoing.
- Provide at least an hour’s worth of data from regular business hours or during the period of the issue.
DB2
- Run the
oms_db2collect_v2.shscript to gather DB2 configuration and performance data. The collection is lightweight and gives a high-level overview. Copy
oms_db2collect_v2.shto the DB2 server and set permissions:chmod +rwx oms_db2collect_v2.shExecute the script:
./oms_db2collect_v2.sh <dbname>- The script generates a file named
db2collect.<timestamp>.zipin the directory where it is run; upload this file.
Oracle
- Generate an AWR report for the time period in question.
Intermittent issues with blocking locks
Set the following property in
customer_overrides.properties:yfs.yfs.app.identifyconnection=Y- Use the SQL provided in
Blocking_Lock_SQLs.txt3–4 times, about 1 minute apart, during the blocking lock. - These SQL statements will provide detailed information on all JVMs contributing to blocking locks.
References
OMS Statistics Data – sample SQLs
DB2
SELECT statistics_detail_key,
start_time_stamp,
end_time_stamp,
server_name,
server_id,
hostname,
service_name,
service_type,
context_name,
statistic_name,
statistic_value
FROM yfs_statistics_detail
WHERE statistics_detail_key LIKE '2021082514%';Oracle
SELECT statistics_detail_key,
TO_CHAR(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
TO_CHAR(end_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
server_name,
server_id,
hostname,
service_name,
service_type,
context_name,
statistic_name,
statistic_value
FROM yfs_statistics_detail
WHERE statistics_detail_key LIKE '2021082514%'
ORDER BY statistics_detail_key;Generating verboseGC logs
Application Server
Add the following JVM arguments under
appserver.jvm.paramsofvalues.yaml:- -verbose:gc - -Xverbosegclog:/shared/logs/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt - -Xgcpolicy:gencon- Update the deployment.
- verboseGC logs will be generated in the path specified in
-Xverbosegclog.
Agent Server
Add the following JVM arguments under
omserver.common.jvmArgsoromserver.servers.jvmArgsinvalues.yaml:jvmArgs: "-Xms512m -Xmx1024m -verbose:gc -Xverbosegclog:/shared/agents/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt --Xgcpolicy:gencon"- Update the deployment.
- verboseGC logs will be generated in the directory configured in
-Xverbosegclog.
Enabling Heap Dump generation on user events
Application Server
Add the following JVM arguments under
appserver.jvm.paramsofvalues.yaml:- -Xdump:heap+java:events=user - -XX:HeapDumpPath=$(LOG_DIR)- Update the deployment.
- Heap dumps will be generated in the path specified by
-XX:HeapDumpPath.
Agent Server
Add the following JVM arguments under
omserver.common.jvmArgsoromserver.servers.jvmArgsofvalues.yaml:jvmArgs: "-Xdump:heap+java:events=user -XX:HeapDumpPath=/shared/agents/$(OM_POD_NAME)"- Update the deployment.
- Heap dumps will be generated in the directory specified by
-XX:HeapDumpPath.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
16 December 2025
UID
ibm16486387