How To
Summary
Enable the monitor.sh script on IPS/NPS
Objective
Many times issues are reported that would need IBM support to collect periodic Runtime Diagnostic information. Issues such as system performance, host based "Out of memory" issues or SPU based "Out of memory" issues require us to analyze what queries were running, plan files, and other information to understand the issue better.
To make our log collection process easier, IBM support bundled up the collection process in one script that can be run via crontab (every 2 - 5 mins interval) and based on symptom, it collects the relevant information. We have also allowed an option of archiving plan files as well because there can be situation where we do not want plan files to be lost.
Environment
Steps
- Check whether it's already enable with the command
crontab -e
- If it is enabled, we should see something like:
*/X * * * * /nz/support/bin/monitor.sh -type <> - archive_plans yes -pmr TSXXXXXX
- If it's not enabled, add the following line to crontab -e:
*/<how often to collect> * * * * /nz/support/bin/monitor.sh -type <> -archive_plans yes -pmr <pmr_number>
- For example:
*/5 * * * * /nz/support/bin/monitor.sh -type <> -archive_plans yes -pmr TS0012345678
- The script will then collect information every 2 minutes.
Detail Description for each -type option:
performance
Collects performance-related information
Crontab entry: */5 * * * * /nz/support/bin/monitor.sh -type performance -archive_plans yes -pmr pmr12345
host_oom
Collects host-shared memory (nzsqa memtbl -sys), host snapshot of processes (ps -afx)
Crontab entry: */5 * * * * /nz/support/bin/monitor.sh -type host_oom -archive_plans yes -pmr pmr12345
spu_oom
Collects host shared memory (nzsqa memtbl -all)
Crontab entry: */5 * * * * /nz/support/bin/monitor.sh -type spu_oom -archive_plans yes -pmr pmr12345
all
Collects all three of above information (performance, spu, and host oom information)
Crontab entry: */5 * * * * /nz/support/bin/monitor.sh -type all -archive_plans yes -pmr pmr12345
There is one other option called as hung. This are different types of hung, and this was intended to run for specific type of hung and trigger automatic log collection for situations when we start seeing "Unable to send to dispatch" messages in dbos.log file.
hung
We see number of time customers complaining about NPS hung. There can be number of reasons for hung, but most common pattern we see is following messages in dbos.log: "Unable to send to dispatch"
Crontab entry: */5 * * * * /nz/support/bin/monitor.sh -type hung -archive_plans no -pmr pmr12345
Once we start seeing these messages, we see that no query progress further or completes. What this option does is, it checks dbos.log for occurrence of above messages in dbos.log every time (depending on how corn tab entry is set) Once these messages are found, it starts log collection process that is usually helpful to diagnose this issue. Note, once the script has detected a hung, it creates a file /nzscratch/monitor/log/pmrXXXXX/hung_deteched_. The presence of this file prevents further execution of hung detect script. The simple reason for this is to prevent its execution multiple times in 24-hour period. It also sends out email notification with some details that hung was detected. One should edit the config section of monitor.sh (the_recipient) script to add appropriate email ID.
Additional Information
Where can I download monitor.sh script?
monitor.sh is now part of latest support tools (which is included in latest NPS software release) or just support tools can be downloaded from fixed central.
Can I leave monitor.sh script running?
Yes. There is no harm in leaving it running even after your issue is resolved. There are no side effects to leave this script running. Note that this script automatically prunes (deletes) old files. Also, ensure that there are no other custom scripts running / collecting same set of information.
Document Location
Worldwide
Was this topic helpful?
Document Information
More support for:
IBM PureData System
Software version:
All Version(s)
Document number:
6420483
Modified date:
30 May 2022
UID
ibm16420483