IBM Support

IV60736: POWERHA: MKCLUSTER MAY FAIL IF HACMPCLUSTER EXISTS ON <NODE2> APPLIES TO AIX 7100-03

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During the initial PowerHA setup, the CAA cluster is
    created using verify and sync ("clmgr sync cluster" or
    via "smitty sysmirror").
    
    In some cases, the mkcluster will fail to create CAA and
    users will see this on /var/hacmp/log/clutils.log:
    
    <snipped>
    INFO: START '/usr/es/sbin/cluster/sbin/smcaactrl -O
    JOIN_NODE -P POST -T 2 -cERROR: ADD_NODE failed for
    <node2>
    1035-264 mkcluster: Could not add all new entities.
    1035-305 mkcluster: Could not create cluster.
            The device is not ready for operation.
    <snipped>
    INFO: = = END JOIN_NODE Op = = POST Stage = =
    INFO: START
    INFO: FINISH return = 1
    <snipped>
    INFO: nodename = <node2>
    INFO: return = -1, Failed to receive message: sock=8,
    recv rc=0, msgbytes=32, errno=73
    INFO: Failed to receive request message.
    INFO: FINISH return = -1
    
    Also to make sure user is hitting this APAR, with
    caa.debug enabled, the /var/adm/ras/syslog.caa of node2
    will show a looping with these messages below during that
    mkcluster attempt:
    
    <snipped>
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cl_canonical_nodename   11060   1
          START
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    cluster_utils.c     am_i_a_powerha  10789   1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    caa_query.c        entlist_start   700     1
     Failed to get kernext topology information,
    cluster_query_ext_v2 failed with 2: A file or directory
    in
     the path name does not exist.
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    caa_query.c        cl_query        2334    1
    Could not get the topology entlist:
    The system call does not exist on this system. (109)
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    caa_query.c cl_query        2370    1       Query failed:
    line 2332: The system call does not exist on this system.
    May 15 14:53:12 node2 caa:info cluster[14876770]:
    clusterconf_lib.c   _find_and_load_repos    1358    1
         got hdisk# from ODM
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        is_clvdisk      310     1
        START in_type=3
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        cluster_repos_disk_read 71
         1       START fd=6
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_bootutils.c        cluster_repos_disk_read 112
       1       FINISH
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_bootutils.c        is_clvdisk      456     1
        FINISH return = 1
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cluster_repository_read_data    4709
      1       START cr_read_type=17
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c
    cluster_repository_read_data_from_disk
    4752    1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c
    cluster_repository_read_data_from_disk
      4857    1       FINISH return = 0
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
    cluster_utils.c    cluster_node_in_nodelist_by_uuid
      713     1       START
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c    cluster_node_in_nodelist_by_uuid
         750     1       return = NULL
    May 15 14:53:12 node2 caa:debug cluster[14876770]:
     cluster_utils.c    cluster_node_in_nodelist_by_uuid
         760     1       FINISH
    <snipped>
    

Local fix

  • 1) on node1 (take a snapshot):
    #clmgr add snapshot <snapshot_name>
    2) on node2 (delete cluster definitions)
    #clmgr delete cluster
    (this will make all the HACMP* ODM classes to be empty)
    3) on node1:
    #clmgr sync cluster
    +++++++++++++++++++
    NOTES:
    1) Be very careful when running "clmgr delete cluster".
    This will completely remove the cluster definitions; in case of
    doubt where to run this command, call IBM services before.
    2) This Local fix DOES NOT apply if you are doing migration.
    Please
    apply the APAR IV60736 first or contact IBM services before you
    proceed.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:
    * Systems running the 7100-03 Technology Level with the
    * bos.cluster.rte fileset at 7.1.3.2 or 7.1.3.15
    ****************************************************************
    * PROBLEM DESCRIPTION:
    * During the initial PowerHA setup, the CAA cluster is
    * created using verify and sync ("clmgr sync cluster" or
    * via "smitty sysmirror").
    * In some cases, the mkcluster will fail to create CAA and
    * users will see this on /var/hacmp/log/clutils.log:
    * <snipped>
    * INFO: START '/usr/es/sbin/cluster/sbin/smcaactrl -O
    * JOIN_NODE -P POST -T 2 -cERROR: ADD_NODE failed for
    * <node2>
    * 1035-264 mkcluster: Could not add all new entities.
    * 1035-305 mkcluster: Could not create cluster.
    *         The device is not ready for operation.
    * <snipped>
    * INFO: = = END JOIN_NODE Op = = POST Stage = =
    * INFO: START
    * INFO: FINISH return = 1
    * <snipped>
    * INFO: nodename = <node2>
    * INFO: return = -1, Failed to receive message: sock=8,
    * recv rc=0, msgbytes=32, errno=73
    * INFO: Failed to receive request message.
    * INFO: FINISH return = -1
    * Also to make sure user is hitting this APAR, with
    * caa.debug enabled, the /var/adm/ras/syslog.caa of node2
    * will show a looping with these messages below during that
    * mkcluster attempt:
    * <snipped>
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_utils.c    cl_canonical_nodename   11060   1
    *       START
    * May 15 14:53:12 node2 caa:info cluster 14876770 :
    * cluster_utils.c     am_i_a_powerha  10789   1       START
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * caa_query.c        entlist_start   700     1
    *  Failed to get kernext topology information,
    * cluster_query_ext_v2 failed with 2: A file or directory
    * in
    *  the path name does not exist.
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * caa_query.c        cl_query        2334    1
    * Could not get the topology entlist:
    * The system call does not exist on this system. (109)
    * May 15 14:53:12 node2 caa:info cluster 14876770 :
    * caa_query.c cl_query        2370    1       Query failed:
    * line 2332: The system call does not exist on this system.
    * May 15 14:53:12 node2 caa:info cluster 14876770 :
    * clusterconf_lib.c   _find_and_load_repos    1358    1
    *      got hdisk# from ODM
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_bootutils.c        is_clvdisk      310     1
    *     START in_type=3
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_bootutils.c        cluster_repos_disk_read 71
    *      1       START fd=6
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_bootutils.c        cluster_repos_disk_read 112
    *    1       FINISH
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    *  cluster_bootutils.c        is_clvdisk      456     1
    *     FINISH return = 1
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_utils.c    cluster_repository_read_data    4709
    *   1       START cr_read_type=17
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    *  cluster_utils.c
    * cluster_repository_read_data_from_disk
    * 4752    1       START
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    *  cluster_utils.c
    * cluster_repository_read_data_from_disk
    *   4857    1       FINISH return = 0
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    * cluster_utils.c    cluster_node_in_nodelist_by_uuid
    *   713     1       START
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    *  cluster_utils.c    cluster_node_in_nodelist_by_uuid
    *      750     1       return = NULL
    * May 15 14:53:12 node2 caa:debug cluster 14876770 :
    *  cluster_utils.c    cluster_node_in_nodelist_by_uuid
    *      760     1       FINISH
    * <snipped>
    ****************************************************************
    * RECOMMENDATION:
    * Install APAR IV60736.
    ****************************************************************
    

Problem conclusion

  • Added check that cluster exists when confirming node is part
    of PowerHA.
    

Temporary fix

Comments

  • 6100-09 - use AIX APAR IV61060
    6100-09 - use AIX APAR IV61060
    6100-09 - use AIX APAR IV61060
    7100-03 - use AIX APAR IV60736
    7100-04 - use AIX APAR IV61112
    

APAR Information

  • APAR number

    IV60736

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    NoHIPER

  • Submitted date

    2014-05-21

  • Closed date

    2014-05-28

  • Last modified date

    2016-05-11

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IV61060 IV61112

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U862108

       UP14/10/29 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU009","label":"Systems - Cognitive"},"Product":{"code":"SSMV87","label":"AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":""},{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":""},{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":""}]

Document Information

Modified date:
11 May 2016