IBM Support

Enable Monitor.sh Script

How To


Summary

Enable the monitor.sh script on IPS/NPS

Objective

Many times issues are reported that would need IBM support to collect periodic Runtime Diagnostic information. Issues such as system performance, host based "Out of memory" issues or SPU based "Out of memory" issues require us to analyze what queries were running, plan files, and other information to understand the issue better.

To make our log collection process easier, IBM support bundled up the collection process in one script that can be run via crontab (every 2 - 5 mins interval) and based on symptom, it collects the relevant information. We have also allowed an option of archiving plan files as well because there can be situation where we do not want plan files to be lost.

Environment

Netezza Performance Server
NPS on Cloud
PureData System for Analytics

Steps



  1. Check whether it's already enable with the command crontab -e
  2. If it is enabled, we should see something like:
    */X * * * * /nz/support/bin/monitor.sh -type <> - archive_plans yes -pmr TSXXXXXX
  3. If it's not enabled, add the following line to crontab -e:
    */<how often to collect> * * * * /nz/support/bin/monitor.sh -type <> -archive_plans yes -pmr <pmr_number>
  4. For example:
    */5 * * * * /nz/support/bin/monitor.sh -type <> -archive_plans yes -pmr TS0012345678
  5. The script will then collect information every 2 minutes.

Detail Description for each -type option:

 performance
     Collects performance-related information
     Crontab entry:        */5 * * * * /nz/support/bin/monitor.sh -type performance -archive_plans yes -pmr pmr12345
host_oom
     Collects host-shared memory (nzsqa memtbl -sys), host snapshot of processes (ps -afx)
     Crontab entry:        */5 * * * * /nz/support/bin/monitor.sh -type host_oom -archive_plans yes -pmr pmr12345

spu_oom
     Collects host shared memory (nzsqa memtbl -all)
     Crontab entry:        */5 * * * * /nz/support/bin/monitor.sh -type spu_oom -archive_plans yes -pmr pmr12345

all
     Collects all three of above information (performance, spu, and host oom information)
     Crontab entry:        */5 * * * * /nz/support/bin/monitor.sh -type all -archive_plans yes -pmr pmr12345

There is one other option called as hung. This are different types of hung, and this was intended to run for specific type of hung and trigger automatic log collection for situations when we start seeing "Unable to send to dispatch" messages in dbos.log file.

hung 

We see number of time customers complaining about NPS hung. There can be number of reasons for hung, but most common pattern we see is following messages in dbos.log: "Unable to send to dispatch"

Crontab entry:  */5 * * * * /nz/support/bin/monitor.sh -type hung -archive_plans no -pmr pmr12345

Once we start seeing these messages, we see that no query progress further or completes. What this option does is, it checks dbos.log for occurrence of above messages in dbos.log every time (depending on how corn tab entry is set) Once these messages are found, it starts log collection process that is usually helpful to diagnose this issue. Note, once the script has detected a hung, it creates a file /nzscratch/monitor/log/pmrXXXXX/hung_deteched_. The presence of this file prevents further execution of hung detect script. The simple reason for this is to prevent its execution multiple times in 24-hour period. It also sends out email notification with some details that hung was detected. One should edit the config section of monitor.sh (the_recipient) script to add appropriate email ID.


Additional Information

Where can I download monitor.sh script? 

monitor.sh is now part of latest support tools (which is included in latest NPS software release) or just support tools can be downloaded from fixed central. 

Can I leave monitor.sh script running? 

Yes. There is no harm in leaving it running even after your issue is resolved. There are no side effects to leave this script running. Note that this script automatically prunes (deletes) old files. Also, ensure that there are no other custom scripts running / collecting same set of information.

Document Location

Worldwide


[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SSULQD","label":"IBM PureData System"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"},{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTNZ3","label":"IBM Netezza for Cloud Pak for Data"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

More support for:
IBM PureData System

Software version:
All Version(s)

Document number:
6420483

Modified date:
30 May 2022

UID

ibm16420483