Im curious as to how others are stopping thier probes. We have two probes severs (Linux) each running a eif, ping, mttrapd and syslog probe set up in the master/slave configuration. We use a script that runs the "pkill" against each probe but lately this doesnt awlays seem to work.
Wondering how others are stopping and whether pkill alone should be enough.
NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
This topic has been locked.
8 replies Latest Post - 2013-02-22T21:36:47Z by SystemAdmin
Pinned topic Stopping Omnibus Probes
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-02-22T21:36:47Z at 2013-02-22T21:36:47Z by SystemAdmin
Mahyuddin 270003XKQT40 PostsACCEPTED ANSWER
Re: Stopping Omnibus Probes2013-02-15T06:24:54Z in response to SystemAdminHi Joe,
Your pkill command should be fine as by default it will send SIGTERM signal to the probe process. Can you define further on "but lately this doesnt awlays seem to work"?
Re: Stopping Omnibus Probes2013-02-17T20:41:22Z in response to SystemAdminHi Joe,
Usually, it would be good practice to configure the probes under PA (Process Agent), so you can stop and start the processes using nco_pa_start/nco_pa_stop or nco_pa_shutdown.
Not sure if you already configured that way. Though Kill command should work, when you issue kill command, it doesn't stop immediately compared to "kill -9". If the probe is in middle of processing something, then it waits until the processing is finished then it stops the process.
Do you see anything in the logs?
Re: Stopping Omnibus Probes2013-02-19T12:13:56Z in response to SystemAdminPA sends SIGTERM to a process to tell it to shutdown, so it's no different from sending SIGTERM (a standard kill) from the command line. That's the way probes are designed to be shutdown.
As for why it doesn't shut down immediately, it's partly because the more modern probes (and more recent libOpls) have more threads doing background tasks and we have to wait for each one to acknowledge the shutdown request and to shut down before the whole probe shuts down.
Most probes should shut down within 5 seconds of the original SIGTERM being sent, have you got any logfiles that show examples of it taking longer than that?
Re: Stopping Omnibus Probes2013-02-19T16:28:29Z in response to SystemAdminMahyuddin,
What I meant by lately is the last few times we actually tried to stop the probes (maybe a few times over the past few months). I just attempted to stop our syslog probe (nco_p_syslog) using pkill <name> and kill <PID> and the process does not come out. I waited approx. 10 minutes for the process to end before issuing kill -9.
The funny thing is we do get a disconnect alert "A PROBE process syslog running on servera has disconnected as username probe" but the process only comes out of the system with a hard kill (kill -9).
Another interesting point is after the kill -9, I can now stop normally with pkill without issue. It appears after the probes have been running for some time, the pkill doesnt work.
We are not running our probes under PA. During the above attempt to shutdown the syslog probe, nothing was written to syslog.log although our logs arent running in debug mode.
This behavior happens on all of our probes after they've been up for a while (maybe 2-3 months). It doesnt happen all of the time on every probe but enough to make us wonder. We've just been killing with -9 when this happens just not sure how healthy that is for the probe.
Mahyuddin 270003XKQT40 PostsACCEPTED ANSWER
Re: Stopping Omnibus Probes2013-02-20T02:57:46Z in response to SystemAdminHi Joe,
Killing probe process with "-9" is not healthy. As Alex stated, there are threads to shutdown and at probe side (depend on which probe) there could be connection require unsubscription/disconnection sequence and recovery file to write on.
Since the issue happen after long run, I thing it could be related to machine resource. Could you check what happen to the process memory, total RAM usage and disk space at time issue happen?
Worst case, you might need to turn on the debug log and capture the time issue occur.
Re: Stopping Omnibus Probes2013-02-20T16:18:12Z in response to SystemAdminMahyuddin,
Had one of our admins check server and says memory and disk look ok. I had the issue this morning with the EIF probe.
We have ITM as well so going to set up some monitoring to watch the process memory as well. I also tuned on debug for the log. I'll give it a couple days and repeat. I dont think it will take long to see the problem again.
GQ4Q_Simon_Knights 060000GQ4Q7 PostsACCEPTED ANSWER
Re: Stopping Omnibus Probes2013-02-22T12:23:42Z in response to SystemAdminDoes you linux distro have 'gstack' or 'pstack' installed?
If so, it might be interesting to run get a stack trace of a probe process, in addition to the debug logs you are planning to obtain. We would want to run this after you have issued a SIGTERM and have waited 10 mins for the probe to shutdown. It might give us an idea of what is still running in the probe.
The command format is:
where <PID> is the pid of the probe.
Re: Stopping Omnibus Probes2013-02-22T21:36:47Z in response to SystemAdminNew discovery.... not sure what we were looking at before but apparently our memory on these servers was very depleted.
Down to 200mb from 6gb. We also have ITM agents running (LZ, UA and WAtchdog) and upon cycling them our avaialbe memory dropped even further, down to 30mb and didnt come back. Looks like a possible memory leak??? The servers have been up for close to a year :-)
I have put in a request to add additional memory and reboot of these servers for next week. Starting to think Mahyuddin may be on the right track with a machine resource issue but ofcourse more memory wont solve a leak issue.
I'll post my results next week.
GQ4Q, I'll keep this pstack in my back pocket just incase.