Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
7 replies Latest Post - ‏2012-08-08T07:07:59Z by ostost
as755
as755
3 Posts
ACCEPTED ANSWER

Pinned topic cluster unstable

‏2009-10-30T00:48:22Z |
cluster got unstable when i was moving RG from one node to other.
since cluster got unstable I rebooted the system,
the filesystems which were supposed to get mounted, did'nt get mounted.
so I mounted them.
I ran /usr/es/sbin/cluster/utilities/clRGinfo there was no output.
when i did clstat the output is as shown below
root@tiefaphap601/root>/usr/sbin/cluster/clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
HACMP Resource Group and Application Management

when I did smit cl_admin and gone into hamcp RG and Application management
show resource group and application management,
there was no output.

Thanks and regards
as55
Updated on 2012-08-08T07:07:59Z at 2012-08-08T07:07:59Z by ostost
  • Casey_B
    Casey_B
    29 Posts
    ACCEPTED ANSWER

    Re: cluster unstable

    ‏2009-10-30T14:15:35Z  in response to as755
    Hello As755

    You seem to be describing two different problems.

    the statement "cluster got unstable" is vague. Do you meant that
    the output of clstat showed the cluster unstable?

    Or do you mean something happened to the cluster, and the resources?

    You should start checking hacmp.out to see if there are any errors.

    Second problem, you didn't mention whether you restarted the cluster after rebooting.
    Unless it is configured to do so, the cluster services will not start upon boot.

    Check the output of lssrc -ls clstrmgrES to see if the "Current State" is ST_STABLE.
    If it is ST_INIT, then you most likely did not start the cluster after rebooting.

    This is something that would be hard to debug over a forum, but your local support
    group can take a look at your snap -e, and probably help you pretty quickly.

    Hope this helps,
    Casey
  • as755
    as755
    3 Posts
    ACCEPTED ANSWER

    Re: cluster unstable

    ‏2009-10-30T19:45:47Z  in response to as755
    Thanks for ur quick response,
    I restarted the cluster services as I rebooted the system .
    The output of lssrc -ls clstrmgrES

    root@tiefaphap601/root>lssrc -ls clstrmgrES
    Current state: ST_RP_FAILED
    sccsid = "@(#)36 1.135.5.1 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0921D_hacmp550 7/21/09 13:20:11"
    i_local_nodeid 0, i_local_siteid -1, my_handle 1
    ml_idx[1]=0 ml_idx[2]=1
    tp is 204fb3b8
    Events on event queue:
    te_type 4, te_nodeid 1, te_network -1
    There are 0 events on the Ibcast queue
    There are 0 events on the RM Ibcast queue
    CLversion: 10
    local node vrmf is 5503
    cluster fix level is "0"
    The following timer(s) are currently active:
    Event error node list: tiefaphap601
    Current DNP values
    DNP Values for NodeId - 1 NodeName - tiefaphap601
    PgSpFree = 128756 PvPctBusy = 0 PctTotalTimeIdle = 92.756648
    DNP Values for NodeId - 0 NodeName - tiefaphap602
    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000
    • Casey_B
      Casey_B
      29 Posts
      ACCEPTED ANSWER

      Re: cluster unstable

      ‏2009-10-31T03:04:46Z  in response to as755
      Hello As755.

      Just so you know, we can still see the node name beneath the strike outs.

      tiefaphap601

      Now, The RP_FAILED means that you have something that failed in starting the cluster.

      You may be able to find out what is going wrong by looking at hacmp.out

      You should gather a snap, and call IBM support.

      Casey
  • as755
    as755
    3 Posts
    ACCEPTED ANSWER

    Re: cluster unstable

    ‏2009-11-18T17:13:32Z  in response to as755
    I stopped cluster services on both nodes using smit cl_stop ( do gracefully)
    after that i did
    smit hacmp -> problem determination and tools ->
    recover from hacmp script failure
    after it runs ok, just reboot both nodes and
    ur cluster should be in normal state.

    this procedure helped me.....
    • SystemAdmin
      SystemAdmin
      69 Posts
      ACCEPTED ANSWER

      Re: cluster unstable

      ‏2012-07-05T21:34:18Z  in response to as755
      Hi,

      Although you have provided the solution!

      As i observed , what went wrong while you tried to move RG from one node to other it got (could be any reason) unsuccessful and SCD remain there in system which suppose to be deleted post any dynamic reconfiguration automatically, but if dynamic configuration fails than SCD remains there in system and prevent to make any further changes in the system, in order to make further changes on system SCD needs to be removed manually by going to "Release Locks Set By Dynamic Reconfiguration" option under problem determination.

      And if SCD exist on system and node rebooted than the configuration of SCD is applied to ACD while system boot up!

      In that case two node might have different and miss-configured SCD, and the cluster will not come up!
      Correct me if i m wrong!

      Regards
      Manoj Suyal
    • smile.yrp
      smile.yrp
      1 Post
      ACCEPTED ANSWER

      Re: cluster unstable

      ‏2012-08-06T09:37:11Z  in response to as755
      Cluster is unstable an one of the RG is failed.
      Why it happens and what to do in this case.
      When i ran clstrmgrES and the output is
      1. lssrc -ls clstrmgrES
      Current state: ST_RP_FAILED
      sccsid = "@(#)36 1.135.1.104 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r610, 1129A_hacmp610 3/24/11 20:16:02"
      i_local_nodeid 0, i_local_siteid -1, my_handle 1
      ml_idx[1]=0 ml_idx[2]=1
      tp is 206dd338
      Events on event queue:
      te_type 34, te_nodeid 1, te_network 32
      There are 0 events on the Ibcast queue
      There are 0 events on the RM Ibcast queue
      CLversion: 11
      local node vrmf is 6106
      cluster fix level is "6"
      The following timer(s) are currently active:
      Event error node list: sbbexpdbp1
      Current DNP values
      DNP Values for NodeId - 1 NodeName - sbbexpdbp1
      PgSpFree = 8361023 PvPctBusy = 0 PctTotalTimeIdle = 92.608400
      DNP Values for NodeId - 2 NodeName - sbbexpdbp2
      PgSpFree = 8366749 PvPctBusy = 0 PctTotalTimeIdle = 71.078008
      which services i have to stop in this case..
  • ostost
    ostost
    7 Posts
    ACCEPTED ANSWER

    Re: cluster unstable

    ‏2012-08-08T07:07:59Z  in response to as755
    The resource group is in error because the cleanup script /etc/rc.d/init.d/listener_ctl_stop is exiting with a non-zero return code. Just search for "Failure" in the hacmp.out file and you will find the error messages.