IC5Notice: We have upgraded developerWorks Community to the latest version of IBM Connections. For more information, read our upgrade FAQ.
Topic
  • 1 reply
  • Latest Post - ‏2011-07-18T22:50:54Z by edgvlad
edgvlad
edgvlad
2 Posts

Pinned topic HA/XD 6.1

‏2011-07-18T21:04:59Z |
I 've configured a cluster XD with two nodes and GLVM. The cluster is UP but UNSTABLE.
the second node is trying to acquire the resource.

Cluster: clusterprog (1641668885)
Mon Jul 18 15:48:40 CDT 2011
State: UP Nodes: 2
SubState: UNSTABLE
Node: gsimx2 State: UP
Interface: gsimx2 (3) Address: 108.100.100.2
State: UP
Interface: gsi2en2 (0) Address: 102.100.100.2
State: UP
Interface: gsi2en6 (1) Address: 106.100.100.2
State: UP
Interface: gsi2rpv (2) Address: 109.100.100.2
State: UP
Resource Group: rgprogress State: Acquiring (Secon
dary)

in hacmp.log shows:

WARNING: Cluster clusterprog has been running recovery program 'TE_RG_MOVE_ACQUI
RE_SECONDARY' for 4020 seconds. Please check cluster status.
WARNING: Cluster clusterprog has been running recovery program 'TE_RG_MOVE_ACQUI
RE_SECONDARY' for 4500 seconds. Please check cluster status.
in clstrmgr.debug

Mon Jul 18 15:59:34 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
turn
Mon Jul 18 16:00:04 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
turn
Mon Jul 18 16:00:34 PollAliasEvents: State not STABLE/RP_RUNNING or ibcasts, re
turn

AIX version is 5300-12-02-1036
The GLVM its OK, it doesn't have STALE partitions.
I' ve rebooted my servers and the cluster alwayas show UNSTABLE.
Coud you help me???
Updated on 2011-07-18T22:50:54Z at 2011-07-18T22:50:54Z by edgvlad
  • edgvlad
    edgvlad
    2 Posts

    Re: HA/XD 6.1

    ‏2011-07-18T22:50:54Z  
    I have more information.

    this the output from node2(Secondary)
    ================================================================
    gsimx2:/var/hacmp/log >lssrc -ls clstrmgrES

    Current state: ST_CBARRIER

    sccsid = "@(#)36 1.135.1.97 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r610, 0933A_hacmp610 8/8/09 14:44:29"

    i_local_nodeid 1, i_local_siteid 2, my_handle 2

    ml_idx[1]=0 ml_idx[2]=1

    tp is 201a3378

    Events on event queue:

    te_type 1, te_nodeid 2, te_network -1

    te_type 36, te_nodeid 2, te_network 1

    te_type 10, te_nodeid 2, te_network -1

    There are 0 events on the Ibcast queue

    There are 0 events on the RM Ibcast queue

    CLversion: 11

    local node vrmf is 6100

    cluster fix level is "0"

    The following timer(s) are currently active:

    Event error node list: gsimx1

    Current DNP values

    DNP Values for NodeId - 0 NodeName - gsimx1

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    DNP Values for NodeId - 0 NodeName - gsimx2

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    gsimx2:/var/hacmp/log >

    =====================================================================
    This is the output from node1

    ========================================================================

    simx1:/usr/es/sbin/cluster>lssrc -ls clstrmgrES

    Current state: ST_RP_FAILED

    sccsid = "@(#)36 1.135.1.97 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r610, 0933A_hacmp610 8/8/09 14:44:29"

    i_local_nodeid 0, i_local_siteid 1, my_handle 1

    ml_idx[1]=0 ml_idx[2]=1

    tp is 204c1418

    Events on event queue:

    te_type 1, te_nodeid 2, te_network -1

    te_type 36, te_nodeid 1, te_network 1

    te_type 36, te_nodeid 2, te_network 1

    te_type 10, te_nodeid 2, te_network -1

    There are 0 events on the Ibcast queue

    There are 0 events on the RM Ibcast queue

    CLversion: 11

    local node vrmf is 6100

    cluster fix level is "0"

    The following timer(s) are currently active:

    Event error node list: gsimx1

    Current DNP values

    DNP Values for NodeId - 0 NodeName - gsimx1

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    DNP Values for NodeId - 0 NodeName - gsimx2

    PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

    gsimx1:/usr/es/sbin/cluster>

    =====================================================================
    When I tried to stop the node2 it sent the next message:

    ===========================================

    Command: failed stdout: yes stderr: no

    Before command completion, additional instructions may appear below.

    cl_clstop: ERROR: Node gsimx2 has 3 event(s) outstanding as reported by command

    'lssrc -ls clstrmgrES' and cannot be stopped until all outstanding events have c

    ompleted. The stop request has been aborted for all nodes. Please wait for all

    nodes to stabalize before attempting to stop cluster services again.
    =============================================================================
    How can I know what is happenning?? The problem is network???