Topic
  • 4 replies
  • Latest Post - ‏2014-05-21T00:53:42Z by Aidan Wen
Aidan Wen
Aidan Wen
3 Posts

Pinned topic How to fast detect system offline?

‏2014-05-13T09:00:06Z |

The system offline be notice less then 1 minute on Director 5.2 (Verify Connection Interval set to 1 minute). But I have test on Director 6.3.3. It should be detect more than 4~5 minutes above. I think the problem is "Verify Connection Interval" set to 5 minutes. 

Have any way to change to 1 minute?  

Set to 5 minutes on Director 6.3.3. The server was reboot finished and nothing to alert. Or user have call us but we don't got system down message. That is not make sense for system management.

I also have make a job repeat every 1 minute with below command.

smcli.exe" pingsys -v -r -N "All Windows Systems" -t OperatingSystem

But sometime detect the system offline more than 2 minutes.

Is my command syntax right? How can fast to detect system offline like in Director 5.2 ? Thanks!

 

 

 

 

 

Updated on 2014-05-13T09:01:47Z at 2014-05-13T09:01:47Z by Aidan Wen
  • RS_UK_HAL
    RS_UK_HAL
    190 Posts

    Re: How to fast detect system offline?

    ‏2014-05-19T14:29:02Z  

    What was the result of your SMCLI command?

     

    You could also try writing a small script that will literally PING the server or servers and you could check the results, by searching for a string in the results, such as "unreachable" or "no reply".

     

    If you find any such string, then you generate an Off Line message, using SMCLI GENEVENT.

  • Aidan Wen
    Aidan Wen
    3 Posts

    Re: How to fast detect system offline?

    ‏2014-05-20T01:44:08Z  
    • RS_UK_HAL
    • ‏2014-05-19T14:29:02Z

    What was the result of your SMCLI command?

     

    You could also try writing a small script that will literally PING the server or servers and you could check the results, by searching for a string in the results, such as "unreachable" or "no reply".

     

    If you find any such string, then you generate an Off Line message, using SMCLI GENEVENT.

    1. Test VM online result:

    C:\>smcli.exe pingsys -v -r -i ibmdirector -t OperatingSystem
    ibmdirector.xxx.com.tw: Communication OK
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.

    2. Group test online result:

    C:\>smcli.exe pingsys -v -r -N "All Windows Systems" -t OperatingSystm
    ibmdirector.xxx.com.tw: Communication OK
     
    DNZCLI0745I : Ping operation succeeded on system ibmdirector.xxx.com.tw.
    test24.xxx.com.tw: Communication OK
     
    DNZCLI0742I : Ping operation in progress on system test24.xxx.com.tw.

    3. Test VM offline:

    Repeat pingsys command each 3~5 second. Until 2 minutes. Output result:

    C:\>smcli.exe pingsys -v -r -i ibmdirector -t OperatingSystem
    ibmdirector.xxx.com.tw: Not Available
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.

    And Group test result:

    C:\>smcli.exe pingsys -v -r -N "All Windows Systems" -t OperatingSystem
    ibmdirector.xxx.com.tw: Not Available
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.
    test24.xxx.com.tw: Communication OK
     
    DNZCLI0742I : Ping operation in progress on system test.xxx.com.tw.

     

    Two of questions.

    1. Event to use smcli  command to pingsys. Lost communication detect about 1~2 minutes.

    2. Test single machine and group. Output message have some of difference.

    I know your PING suggestion. I also have what'sup system could to monitor all of systems. I just think why difference with 5.2 and 6.3.

    The function so good on 5.2 and helpful. I can easy to monitor over 100 servers.

    I also have found another problem these days.

    In Event Automation Plans. System default to start "Log All Events". But it will stop to log after 1~3 days. All of services were normal. After restart  Dirserver service. Log workable. It also work well on 5.2.

    Have any ideal? My database on remote SQL 2008 R2. Don't let me to give up this IBM system.

    Thanks!

    Updated on 2014-05-20T02:50:46Z at 2014-05-20T02:50:46Z by Aidan Wen
  • RS_UK_HAL
    RS_UK_HAL
    190 Posts

    Re: How to fast detect system offline?

    ‏2014-05-20T13:56:16Z  
    • Aidan Wen
    • ‏2014-05-20T01:44:08Z

    1. Test VM online result:

    C:\>smcli.exe pingsys -v -r -i ibmdirector -t OperatingSystem
    ibmdirector.xxx.com.tw: Communication OK
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.

    2. Group test online result:

    C:\>smcli.exe pingsys -v -r -N "All Windows Systems" -t OperatingSystm
    ibmdirector.xxx.com.tw: Communication OK
     
    DNZCLI0745I : Ping operation succeeded on system ibmdirector.xxx.com.tw.
    test24.xxx.com.tw: Communication OK
     
    DNZCLI0742I : Ping operation in progress on system test24.xxx.com.tw.

    3. Test VM offline:

    Repeat pingsys command each 3~5 second. Until 2 minutes. Output result:

    C:\>smcli.exe pingsys -v -r -i ibmdirector -t OperatingSystem
    ibmdirector.xxx.com.tw: Not Available
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.

    And Group test result:

    C:\>smcli.exe pingsys -v -r -N "All Windows Systems" -t OperatingSystem
    ibmdirector.xxx.com.tw: Not Available
     
    DNZCLI0744I : Ping operation not applicable on system ibmdirector.xxx.com.tw.
    test24.xxx.com.tw: Communication OK
     
    DNZCLI0742I : Ping operation in progress on system test.xxx.com.tw.

     

    Two of questions.

    1. Event to use smcli  command to pingsys. Lost communication detect about 1~2 minutes.

    2. Test single machine and group. Output message have some of difference.

    I know your PING suggestion. I also have what'sup system could to monitor all of systems. I just think why difference with 5.2 and 6.3.

    The function so good on 5.2 and helpful. I can easy to monitor over 100 servers.

    I also have found another problem these days.

    In Event Automation Plans. System default to start "Log All Events". But it will stop to log after 1~3 days. All of services were normal. After restart  Dirserver service. Log workable. It also work well on 5.2.

    Have any ideal? My database on remote SQL 2008 R2. Don't let me to give up this IBM system.

    Thanks!

    Are you certain you are targeting the correct resource with your PINGSYS command?

     

    Anyway, if this PINGSYS or a normal network PING is successful, you can write a script that detects the result and then generates a user-defined SMCLI GENEVENT (maybe with text description saying "System is off-line") which can be detected by an Event Automation Plan.

     

    With PING you can choose how long to wait between the PINGs, and therefore control the time you want to set before you check your resources again.

     

    ---

     

    On the problem with the Event Log.  What evidence do you have that it's stopping logging all events?

     

    In any case, try setting the logging to different settings (how long it should log for) and see if that makes any difference to the logging period.

     

    If not, then log a call with IBM HelpCentre and escalate your problem to them for immediately investigation.

     

  • Aidan Wen
    Aidan Wen
    3 Posts

    Re: How to fast detect system offline?

    ‏2014-05-21T00:53:42Z  
    • RS_UK_HAL
    • ‏2014-05-20T13:56:16Z

    Are you certain you are targeting the correct resource with your PINGSYS command?

     

    Anyway, if this PINGSYS or a normal network PING is successful, you can write a script that detects the result and then generates a user-defined SMCLI GENEVENT (maybe with text description saying "System is off-line") which can be detected by an Event Automation Plan.

     

    With PING you can choose how long to wait between the PINGs, and therefore control the time you want to set before you check your resources again.

     

    ---

     

    On the problem with the Event Log.  What evidence do you have that it's stopping logging all events?

     

    In any case, try setting the logging to different settings (how long it should log for) and see if that makes any difference to the logging period.

     

    If not, then log a call with IBM HelpCentre and escalate your problem to them for immediately investigation.

     

    Thanks for your advice. I will find out the best solution. Maybe I will keep to use 5.2 for monitor none windows server 2012 machines,

    If you can, please update 5.2 & 6.3 customer experience to design team. Thanks again.