Topic
5 replies Latest Post - ‏2013-04-19T09:12:45Z by nukite8d
nukite8d
nukite8d
98 Posts
ACCEPTED ANSWER

Pinned topic Stop Resources @ System Shutdown?

‏2013-02-25T13:55:59Z |
Hi group,
today we had a discussion, how to shutdown an TSAMP node.

If you just type in "halt" the system shutdowns without stopping the automated resources.

How do you ensure, the resources are stopped normally, instead of being killed at runlevel 0?
Cheers,
Manfred
Updated on 2013-02-26T11:53:14Z at 2013-02-26T11:53:14Z by nukite8d
  • sedgewick_de
    sedgewick_de
    36 Posts
    ACCEPTED ANSWER

    Re: Stop Resources @ System Shutdown?

    ‏2013-02-25T17:52:19Z  in response to nukite8d
    Hi Manfred,

    on AIX and Linux TSA is monitoring the runlevel of each node, but is not interfering with the Operating System's shutdown procedure in any way, so resources are indeed stopped, resp. killed at runlevel 0.
    If a resource is actually stopped or killed by this procedure, it will not be restarted on that node until the shutdown (and subsequent restart) is over. If this resource can failover, it is started on another host instead.

    /etc/rc.shutdown on AIX, resp. /etc/rc.d/rc.local on Linux, would be the places to go for stopping TSA managed workload in an orderly fashion.

    Regards,
    Markus
    • nukite8d
      nukite8d
      98 Posts
      ACCEPTED ANSWER

      Re: Stop Resources @ System Shutdown?

      ‏2013-02-26T11:53:14Z  in response to sedgewick_de
      Hi Markus,
      thank you for describing the TSAMP behavior at shutdown times.

      But what do you suggest for an orderly stop?

      I see several alternatives (linux):

      1) Include out-of-automation stop commands in /etc/rc.d for every process.
      This is done by quite all default installation of software. But with this we get the start commands at boot time, too. This would be dangerous for critical resources (floater)
      Furthermore including only stop commands (/etc/rc*/K*) looks half-baked.

      2) Creating an own script /etc/init.d/sam which does an exclude during shutdown and include during the boot.
      We did this, but collidated with manual excluded nodes, which should be down for maintenance.

      3) Creating an own script /etc/init.d/sam which handles stop-requests on local Resources.
      This seems really complicate to implement. Fixed IBM.Applciations could be handled, but Floater and Equivalency make it difficult to me.
      Also, we have to ensure, the requests will be cancelled.

      4) Creating an own script /etc/init.d/sam which executes stoprsrc command for local Resources.
      This seems really complicate to implement, too. As 3) Fixed IBM.Applciations could be handled, but Floater and Equivalency make it difficult to me.
      But the fixed resources would be restarted after system boot.

      Is there another way to stop all local Resources and let the start after the node is back again?
      Regards,
      Manfred
    • nukite8d
      nukite8d
      98 Posts
      ACCEPTED ANSWER

      Re: Stop Resources @ System Shutdown?

      ‏2013-04-18T12:46:06Z  in response to sedgewick_de

      Hi,
      to test the monitoring of the runlevel I created a script, which does an include of the local node, before stopping cthats, cthags and rtrmc.

      With this I got following scenario:

      1. Exclude Localhost
      2. Warten, bis alles Offline
      3. shutdown des systems durch den Command 'reboot'
      4. /etc/init.d/sam macht einen include

      In /var/log/messages we log the execution of the used scripts. And we see, the resources are started after the include during shutdown.

      Apr 18 13:59:40 vltihamwa05 shutdown[10605]: shutting down for system reboot
      Apr 18 13:59:40 vltihamwa05 init: Switching to runlevel: 6
      Apr 18 13:59:41 vltihamwa05 su: (to root) root on /dev/console
      Apr 18 13:59:44 vltihamwa05 su: (to root) root on /dev/console
      Apr 18 13:59:53 vltihamwa05 rhnsd[3123]: Exiting
      Apr 18 13:59:54 vltihamwa05 logd: [11825]: debug: Stopping ha_logd with pid 2557
      Apr 18 13:59:54 vltihamwa05 logd: [2557]: debug: logd_term_action: received SIGTERM
      Apr 18 13:59:54 vltihamwa05 logd: [2557]: debug: logd_term_action: waiting for 0 messages to be read by write process
      Apr 18 13:59:54 vltihamwa05 logd: [2557]: debug: logd_term_action: sending SIGTERM to write process
      Apr 18 13:59:54 vltihamwa05 logd: [2576]: info: logd_term_write_action: received SIGTERM
      Apr 18 13:59:54 vltihamwa05 logd: [2576]: debug: Writing out 0 messages then quitting
      Apr 18 13:59:54 vltihamwa05 logd: [2576]: info: Exiting write process
      Apr 18 13:59:54 vltihamwa05 logd: [11825]: info: Waiting for pid=2557 to exit
      Apr 18 13:59:55 vltihamwa05 logd: [11825]: info: Pid 2557 exited
      Apr 18 13:59:55 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam stop
      Apr 18 13:59:56 vltihamwa05 logger: sam-stop_1
      Apr 18 13:59:58 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_sbm:Start called.
      Apr 18 13:59:58 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_sbm:Monitor set to RC=5 Time=0 Flags=
      Apr 18 13:59:59 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_sbm:Monitor set to RC=1 Time=0 Flags=
      Apr 18 13:59:59 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_sbm:Start returns with RC=0 after 1 seconds.
      Apr 18 14:00:01 vltihamwa05 logger: sam-stop_2
      Apr 18 14:00:02 vltihamwa05 logger: /etc/init.d/rc3.d/K04cthags stop
      Apr 18 14:00:02 vltihamwa05 cthags[2469]: (Recorded using libct_ffdc.a cv 2):::Error ID: 825....0/yPF/eJU/822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,SRCSocket.C,1.91,515                     :::GS_STOP_ST Group Services daemon stopped DIAGNOSTIC EXPLANATION Received signal[SIGTERM]. Converted to normal stop
      Apr 18 14:00:02 vltihamwa05 logger: /etc/init.d/rc3.d/K04cthats stop
      Apr 18 14:00:02 vltihamwa05 RMCdaemon[2192]: (Recorded using libct_ffdc.a cv 2):::Error ID: 822....0/yPF/jUc/822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,rmcd_gsi.c,1.53,1061                     :::RMCD_2610_101_ER Internal error. Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs
      Apr 18 14:00:02 vltihamwa05 StorageRM[8607]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,StorageRMDaemon.C,1.60,332               :::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped.
      Apr 18 14:00:02 vltihamwa05 ConfigRM[2226]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,PeerDomain.C,1.99.28.1,22665             :::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon (IBM.ConfigRMd) is exiting due to  the Group Services subsystem terminating.  The configuration manager daemon will  restart automatically, synchronize the nodes configuration with the domain and  rejoin the domain if possible.
      Apr 18 14:00:02 vltihamwa05 RecoveryRM[8598]: (Recorded using libct_ffdc.a cv 2):::Error ID: 822....0/yPF/rTg/822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,RecoveryRMDaemon.C,1.15.5.5,413          :::RECOVERYRM_2621_402_ER 2621-402 IBM.RecoveryRM daemon stopped by SRC command or exiting due to an error condition . Error id  0
      Apr 18 14:00:02 vltihamwa05 srcmstr: src_error=-9035, errno=0, module='srchevn.c'@line:'252', 0513-035 The ctrmc Subsystem ended abnormally. SRC will try and restart it.
      Apr 18 14:00:02 vltihamwa05 srcmstr: src_error=-9035, errno=0, module='srchevn.c'@line:'252', 0513-035 The IBM.ConfigRM Subsystem ended abnormally. SRC will try and restart it.
      Apr 18 14:00:02 vltihamwa05 RMCdaemon[12104]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....0/yPF/NDP0822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,rmcd.c,1.87,234                          :::RMCD_INFO_0_ST The daemon is started.
      Apr 18 14:00:02 vltihamwa05 ConfigRM[12105]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,IBM.ConfigRMd.C,1.57,347                 :::CONFIGRM_STARTED_ST IBM.ConfigRM daemon has started.
      Apr 18 14:00:03 vltihamwa05 cthats[2411]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....1/yPF/JIB0822Mn1...................:::Reference ID: 824....XqxPF/Hv.1822Mn1...................:::Template ID: 0:::Details File:  :::Location: rsct,comm.C,1.156,690                         :::TS_STOP_ST Topology Services daemon stopped Topology Services daemon stopped by: SRC
      Apr 18 14:00:04 vltihamwa05 logger: /etc/init.d/rc3.d/K04ctrmc stop
      Apr 18 14:00:04 vltihamwa05 RMCdaemon[12104]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....2/yPF/UhY0822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,rmcd.c,1.87,1026                         :::RMCD_INFO_1_ST The daemon is stopped. Number of command that stopped the daemon 3
      Apr 18 14:00:04 vltihamwa05 sshd[2882]: Received signal 15; terminating.
      Apr 18 14:00:05 vltihamwa05 auditd[2170]: The audit daemon is exiting.
      Apr 18 14:00:05 vltihamwa05 haveged: haveged stopping due to signal 15
      Apr 18 14:00:05 vltihamwa05 hatsd: hadms: hadms_init selected module softdog as default.
      Apr 18 14:00:05 vltihamwa05 cthats[12462]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....3/yPF/jzn1822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: rsct,bootstrp.C,1.215.1.13,4956               :::TS_START_ST Topology Services daemon started Topology Services daemon started by: SRC Topology Services daemon log file location /var/ct/TSAMP_MWA_Test_Linux_4/log/cthats/cthats.18.140005.C Topology Services daemon run directory /var/ct/TSAMP_MWA_Test_Linux_4/run/cthats/
      Apr 18 14:00:06 vltihamwa05 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
      Apr 18 14:00:06 vltihamwa05 kernel: Kernel logging (proc) stopped.
      Apr 18 14:00:06 vltihamwa05 kernel: Kernel log daemon terminating.
      Apr 18 14:00:06 vltihamwa05 syslog-ng[2155]: Termination requested via signal, terminating;
      Apr 18 14:00:06 vltihamwa05 syslog-ng[2155]: syslog-ng shutting down; version='2.0.9'

       

      I still looking for a way to stop all resources of a node during shutdown, without changing current exclude or automation=manuell setting.
      My last idea was to check, if a node is excluded. If not, I would exclude and directly include it to set all resources in Pending Offline.

      Do you have any other suggestions?

      Updated on 2013-04-18T12:49:16Z at 2013-04-18T12:49:16Z by nukite8d
      • nukite8d
        nukite8d
        98 Posts
        ACCEPTED ANSWER

        Re: Stop Resources @ System Shutdown?

        ‏2013-04-18T13:42:47Z  in response to nukite8d

        Runlevel 0 stops to fast

        Apr 18 15:38:28 vltihamwa05 init: Switching to runlevel: 0
        Apr 18 15:38:28 vltihamwa05 RMCdaemon[2213]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....IRzPF/KyZ0822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,rmcd.c,1.87,1026                         :::RMCD_INFO_1_ST The daemon is stopped. Number of command that stopped the daemon 3
        Apr 18 15:38:28 vltihamwa05 ConfigRM[2260]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,ConfigRMDaemon.C,1.19,249                :::CONFIGRM_STOPPED_ST IBM.ConfigRM daemon has been stopped.
        Apr 18 15:38:28 vltihamwa05 cthags[2552]: (Recorded using libct_ffdc.a cv 2):::Error ID: 825....IRzPF/fAa0822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,SRCSocket.C,1.91,515                     :::GS_STOP_ST Group Services daemon stopped DIAGNOSTIC EXPLANATION Received signal[SIGTERM]. Converted to normal stop
        Apr 18 15:38:28 vltihamwa05 ctcasd[2284]: (Recorded using libct_ffdc.a cv 2):::Error ID: 824....IRzPF/2Ka0822Mn1...................:::Reference ID:  :::Template ID: cffb2385:::Details File:  :::Location: rsct.core.sec,ctcas_main.c,1.30,399           :::ctcasd Daemon Stopped
        Apr 18 15:38:28 vltihamwa05 StorageRM[3848]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,StorageRMDaemon.C,1.60,370               :::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped.
        Apr 18 15:38:34 vltihamwa05 su: (to root) root on /dev/console
        Apr 18 15:38:37 vltihamwa05 su: (to root) root on /dev/console
        Apr 18 15:38:47 vltihamwa05 rhnsd[3142]: Exiting
        Apr 18 15:38:48 vltihamwa05 logd: [10328]: debug: Stopping ha_logd with pid 2541
        Apr 18 15:38:48 vltihamwa05 logd: [2541]: debug: logd_term_action: received SIGTERM
        Apr 18 15:38:48 vltihamwa05 logd: [2541]: debug: logd_term_action: waiting for 0 messages to be read by write process
        Apr 18 15:38:48 vltihamwa05 logd: [2541]: debug: logd_term_action: sending SIGTERM to write process
        Apr 18 15:38:48 vltihamwa05 logd: [10328]: info: Waiting for pid=2541 to exit
        Apr 18 15:38:48 vltihamwa05 logd: [2553]: info: logd_term_write_action: received SIGTERM
        Apr 18 15:38:48 vltihamwa05 logd: [2553]: debug: Writing out 0 messages then quitting
        Apr 18 15:38:48 vltihamwa05 logd: [2553]: info: Exiting write process
        Apr 18 15:38:49 vltihamwa05 logd: [10328]: info: Pid 2541 exited
        Apr 18 15:38:49 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam stop
        Apr 18 15:38:49 vltihamwa05 logger: /etc/init.d/rc3.d/K04cthags stop
        Apr 18 15:38:49 vltihamwa05 logger: /etc/init.d/rc3.d/K04cthats stop
        Apr 18 15:38:50 vltihamwa05 logger: /etc/init.d/rc3.d/K04ctrmc stop

         

      • nukite8d
        nukite8d
        98 Posts
        ACCEPTED ANSWER

        Re: Stop Resources @ System Shutdown?

        ‏2013-04-19T09:12:45Z  in response to nukite8d

        Hi,
        I logged the HealthState of the local resources during shutdown-time. It stays at 0, instead of switching to the "SystemDown" value.

        /bin/logger $0 HealthState1=$(lsrsrc -t -Ab -s "NodeNameList='vltihamwa05'" IBM.Application Name OpState HealthState HealthMessage)
        

        Apr 19 11:06:41 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam stop
        Apr 19 11:06:41 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam status self
        Apr 19 11:06:41 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam Include and wait
        Apr 19 11:06:42 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam HealthState1=Resource Persistent and Dynamic Attributes for IBM.Application Name OpState HealthState HealthMessage "mg2_02_timertrigger" 2 0 "" "mg2_12_timertrigger" 2 0 "" "samadapter" 2 0 "" "mg2_12_dmgr" 2 0 "" "sampmonagent-rs" 2 0 "" "mg2_02_dmgr" 2 0 "" "vltihamwa05_c02adv21w1" 2 0 "" "vltihamwa05_02_nodeagent" 2 0 "" "vltihamwa05_c12qcd21w1" 2 0 "" "shadow_c12_11w1" 2 0 "" "shadow_c02_11w1" 2 0 "" "vltihamwa05_c12pce21w1" 2 0 "" "vltihamwa05_c02qcd21w1" 2 0 "" "vltihamwa05_sbm" 2 0 "" "vltihamwa05_12_nodeagent" 2 0 "" "vltihamwa05_c12adv21w1" 2 0 "" "vltihamwa05_c02pce21w1" 2 0 ""
        Apr 19 11:06:47 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam HealthState2=Resource Persistent and Dynamic Attributes for IBM.Application Name OpState HealthState HealthMessage "mg2_02_timertrigger" 2 0 "" "mg2_12_timertrigger" 2 0 "" "samadapter" 2 0 "" "mg2_12_dmgr" 2 0 "" "sampmonagent-rs" 2 0 "" "mg2_02_dmgr" 2 0 "" "vltihamwa05_c02adv21w1" 2 0 "" "vltihamwa05_02_nodeagent" 2 0 "" "vltihamwa05_c12qcd21w1" 2 0 "" "shadow_c12_11w1" 2 0 "" "shadow_c02_11w1" 2 0 "" "vltihamwa05_c12pce21w1" 2 0 "" "vltihamwa05_c02qcd21w1" 2 0 "" "vltihamwa05_sbm" 2 0 "" "vltihamwa05_12_nodeagent" 2 0 "" "vltihamwa05_c12adv21w1" 2 0 "" "vltihamwa05_c02pce21w1" 2 0 ""
        Apr 19 11:06:47 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_02_nodeagent:Start called.
        Apr 19 11:06:47 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_12_nodeagent:Start called.
        Apr 19 11:06:47 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_02_nodeagent:Monitor set to RC=5 Time=0 Flags=
        Apr 19 11:06:47 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_12_nodeagent:Monitor set to RC=5 Time=0 Flags=
        Apr 19 11:06:48 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_02_nodeagent:Monitor set to RC=1 Time=0 Flags=
        Apr 19 11:06:48 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_02_nodeagent:Start returns with RC=0 after 1 seconds.
        Apr 19 11:06:48 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_12_nodeagent:Monitor set to RC=1 Time=0 Flags=
        Apr 19 11:06:48 vltihamwa05 appcmd: /storage/opt/sam/edit/appcmd: vltihamwa05_12_nodeagent:Start returns with RC=0 after 1 seconds.
        [snip]
        Apr 19 11:06:52 vltihamwa05 logger: /etc/init.d/rc3.d/K03sam HealthState3=Resource Persistent and Dynamic Attributes for IBM.Application Name OpState HealthState HealthMessage "mg2_02_timertrigger" 2 0 "" "mg2_12_timertrigger" 2 0 "" "samadapter" 2 0 "" "mg2_12_dmgr" 2 0 "" "sampmonagent-rs" 2 0 "" "mg2_02_dmgr" 2 0 "" "vltihamwa05_c02adv21w1" 1 0 "" "vltihamwa05_02_nodeagent" 1 0 "" "vltihamwa05_c12qcd21w1" 1 0 "" "shadow_c12_11w1" 2 0 "" "shadow_c02_11w1" 2 0 "" "vltihamwa05_c12pce21w1" 1 0 "" "vltihamwa05_c02qcd21w1" 1 0 "" "vltihamwa05_sbm" 1 0 "" "vltihamwa05_12_nodeagent" 1 0 "" "vltihamwa05_c12adv21w1" 1 0 "" "vltihamwa05_c02pce21w1" 1 0 ""
        Apr 19 11:06:52 vltihamwa05 logger: /etc/init.d/rc3.d/K04cthags stop
        Apr 19 11:06:53 vltihamwa05 cthags[2515]: (Recorded using libct_ffdc.a cv 2):::Error ID: 825....hYEQF/jg0.822Mn1...................:::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,SRCSocket.C,1.91,515                     :::GS_STOP_ST Group Services daemon stopped DIAGNOSTIC EXPLANATION Received signal[SIGTERM]. Converted to normal stop