Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
8 replies Latest Post - ‏2013-02-22T21:36:47Z by SystemAdmin
SystemAdmin
SystemAdmin
1283 Posts
ACCEPTED ANSWER

Pinned topic Stopping Omnibus Probes

‏2013-02-13T13:46:34Z |
Im curious as to how others are stopping thier probes. We have two probes severs (Linux) each running a eif, ping, mttrapd and syslog probe set up in the master/slave configuration. We use a script that runs the "pkill" against each probe but lately this doesnt awlays seem to work.

commands used:
pkill nco_p_syslog
pkill nco_p_mttrapd
pkill nco_p_nonnative
pkill nco_p_ping

Wondering how others are stopping and whether pkill alone should be enough.
Thanks
Joe
Updated on 2013-02-22T21:36:47Z at 2013-02-22T21:36:47Z by SystemAdmin
  • Mahyuddin
    Mahyuddin
    40 Posts
    ACCEPTED ANSWER

    Re: Stopping Omnibus Probes

    ‏2013-02-15T06:24:54Z  in response to SystemAdmin
    Hi Joe,

    Your pkill command should be fine as by default it will send SIGTERM signal to the probe process. Can you define further on "but lately this doesnt awlays seem to work"?

    Thanks,
    Mahyuddin
  • SystemAdmin
    SystemAdmin
    1283 Posts
    ACCEPTED ANSWER

    Re: Stopping Omnibus Probes

    ‏2013-02-17T20:41:22Z  in response to SystemAdmin
    Hi Joe,

    Usually, it would be good practice to configure the probes under PA (Process Agent), so you can stop and start the processes using nco_pa_start/nco_pa_stop or nco_pa_shutdown.
    Not sure if you already configured that way. Though Kill command should work, when you issue kill command, it doesn't stop immediately compared to "kill -9". If the probe is in middle of processing something, then it waits until the processing is finished then it stops the process.

    Do you see anything in the logs?

    Thanks,
    Raji.
    • SystemAdmin
      SystemAdmin
      1283 Posts
      ACCEPTED ANSWER

      Re: Stopping Omnibus Probes

      ‏2013-02-19T12:13:56Z  in response to SystemAdmin
      PA sends SIGTERM to a process to tell it to shutdown, so it's no different from sending SIGTERM (a standard kill) from the command line. That's the way probes are designed to be shutdown.

      As for why it doesn't shut down immediately, it's partly because the more modern probes (and more recent libOpls) have more threads doing background tasks and we have to wait for each one to acknowledge the shutdown request and to shut down before the whole probe shuts down.

      Most probes should shut down within 5 seconds of the original SIGTERM being sent, have you got any logfiles that show examples of it taking longer than that?
  • SystemAdmin
    SystemAdmin
    1283 Posts
    ACCEPTED ANSWER

    Re: Stopping Omnibus Probes

    ‏2013-02-19T16:28:29Z  in response to SystemAdmin
    Mahyuddin,

    What I meant by lately is the last few times we actually tried to stop the probes (maybe a few times over the past few months). I just attempted to stop our syslog probe (nco_p_syslog) using pkill <name> and kill <PID> and the process does not come out. I waited approx. 10 minutes for the process to end before issuing kill -9.

    The funny thing is we do get a disconnect alert "A PROBE process syslog running on servera has disconnected as username probe" but the process only comes out of the system with a hard kill (kill -9).

    Another interesting point is after the kill -9, I can now stop normally with pkill without issue. It appears after the probes have been running for some time, the pkill doesnt work.

    Raji, Alex

    We are not running our probes under PA. During the above attempt to shutdown the syslog probe, nothing was written to syslog.log although our logs arent running in debug mode.

    This behavior happens on all of our probes after they've been up for a while (maybe 2-3 months). It doesnt happen all of the time on every probe but enough to make us wonder. We've just been killing with -9 when this happens just not sure how healthy that is for the probe.
    Joe
    • Mahyuddin
      Mahyuddin
      40 Posts
      ACCEPTED ANSWER

      Re: Stopping Omnibus Probes

      ‏2013-02-20T02:57:46Z  in response to SystemAdmin
      Hi Joe,

      Killing probe process with "-9" is not healthy. As Alex stated, there are threads to shutdown and at probe side (depend on which probe) there could be connection require unsubscription/disconnection sequence and recovery file to write on.

      Since the issue happen after long run, I thing it could be related to machine resource. Could you check what happen to the process memory, total RAM usage and disk space at time issue happen?

      Worst case, you might need to turn on the debug log and capture the time issue occur.

      Thanks,
      Mahyuddin
  • SystemAdmin
    SystemAdmin
    1283 Posts
    ACCEPTED ANSWER

    Re: Stopping Omnibus Probes

    ‏2013-02-20T16:18:12Z  in response to SystemAdmin
    Mahyuddin,

    Had one of our admins check server and says memory and disk look ok. I had the issue this morning with the EIF probe.
    We have ITM as well so going to set up some monitoring to watch the process memory as well. I also tuned on debug for the log. I'll give it a couple days and repeat. I dont think it will take long to see the problem again.

    Thanks
    • GQ4Q_Simon_Knights
      GQ4Q_Simon_Knights
      7 Posts
      ACCEPTED ANSWER

      Re: Stopping Omnibus Probes

      ‏2013-02-22T12:23:42Z  in response to SystemAdmin
      Does you linux distro have 'gstack' or 'pstack' installed?

      If so, it might be interesting to run get a stack trace of a probe process, in addition to the debug logs you are planning to obtain. We would want to run this after you have issued a SIGTERM and have waited 10 mins for the probe to shutdown. It might give us an idea of what is still running in the probe.

      The command format is:
      pstack <PID>
      where <PID> is the pid of the probe.
  • SystemAdmin
    SystemAdmin
    1283 Posts
    ACCEPTED ANSWER

    Re: Stopping Omnibus Probes

    ‏2013-02-22T21:36:47Z  in response to SystemAdmin
    New discovery.... not sure what we were looking at before but apparently our memory on these servers was very depleted.
    Down to 200mb from 6gb. We also have ITM agents running (LZ, UA and WAtchdog) and upon cycling them our avaialbe memory dropped even further, down to 30mb and didnt come back. Looks like a possible memory leak??? The servers have been up for close to a year :-)

    I have put in a request to add additional memory and reboot of these servers for next week. Starting to think Mahyuddin may be on the right track with a machine resource issue but ofcourse more memory wont solve a leak issue.

    I'll post my results next week.

    GQ4Q, I'll keep this pstack in my back pocket just incase.

    Thanks