Topic
  • 3 replies
  • Latest Post - ‏2012-05-03T14:53:10Z by SystemAdmin
SystemAdmin
SystemAdmin
2092 Posts

Pinned topic which IP is controlled by CNFS?

‏2012-05-02T20:52:31Z |
I'm using CNFS on a 3-node Linux cluster (Centos 5.8) running GPFS
3.4.0-10, and I'm seeing confusing messages related to IP address takeover
under CNFS.

Each GPFS server has several interfaces. The eth0 interface for each GPFS
(CNFS) server is on the 192.168.110.0/24 subnet. That is a fixed address,
assigned to each server, and activated at boot time. That interface is
not accessible to NFS clients, and is only used within an administrative
network.

The CNFS interfaces are in the network 170.212.169.0/24. Clients are
able to NFS mount volumes from the CNFS servers via that network. The
CNFS addresses are not active at boot time, are not assigned to specific
servers, and they reverse-map to a "sbia-nfs" to enable round-robin
DNS.

The output from "mmlscluster --cnfs" shows:


[root@sbia-infr2 ras]# mmlscluster --cnfs   GPFS cluster information ======================== GPFS cluster name:         sbia-gpfs.uphs.upenn.edu GPFS cluster id:           13882466807548967463   Cluster NFS global parameters ----------------------------- Shared root directory:                /gpfs/cluster_shared/cnfsSharedRoot Virtual IP address: sbia-nfs rpc.mountd port number:               892 nfsd threads:                         32 Reboot on failure enabled:            yes CNFS monitor enabled:                 yes   Node  Daemon node name            IP address       CNFS state  group  CNFS IP address list ------------------------------------------------------------------------------------------- 1   sbia-infr1-admin.uphs.upenn.edu 192.168.110.2    enabled        0   170.212.169.87 2   sbia-infr2-admin.uphs.upenn.edu 192.168.110.3    enabled        0   170.212.169.88 3   sbia-infr3-admin.uphs.upenn.edu 192.168.110.85   enabled        0   170.212.169.89


When a CNFS server is unavailable, IP address takeover seems to work
correctly--NFS clients continue to access the same IP in 170.212.160.0.

However, the log file /var/adm/ras/mmfs.log.latest shows confusing
entries about which address is controlled by CNFS. Here's a log file
excerpt from sbia-infr2-admin when cluster node sbia-infr1-admin rebooted:


Tue Apr 10 20:12:20 EDT 2012: mmnfsrecovernode: Initiating IP takeover of 192.168.110.2 due to node failure/recovery Tue Apr 10 20:13:23 EDT 2012: mmnfsrecovernode: NFS clients of node 192.168.110.2 notified to reclaim NLM locks Tue Apr 10 20:13:28 EDT 2012: mmnfsrecovernode: NFS clients of node 192.168.110.3 notified to reclaim NLM locks Tue Apr 10 20:19:21 EDT 2012: mmnfsnodeback: Node 192.168.110.2 has recovered; releasing 170.212.169.87


The first line shown is confusing because the address 192.168.110.2 should not
be subject to IP address takeover--that's the cannonical address of the eth0
interface, not a CNFS address.

There's no notice about which public address was taken from sbia-infr1-admin when
it crashed.

The last line shown makes sense to me, and correctly shows the public
address (170.212.169.87) that is dynamically controlled by CNFS and
normally assigned to sbia-infr1-admin. That is the address that I
believe was taken over when sbia-infr1-admin crashed and was brought up
on sbia-infr2-admin.

So, does the log file just show confusingly written entries (using the fixed node IP, not the CNFS floating IP), or have I somehow
misconfigured CNFS so that GPFS is attempting to do IP takeover on the
administrative address of each server?
Updated on 2012-05-03T14:53:10Z at 2012-05-03T14:53:10Z by SystemAdmin
  • truongv
    truongv
    98 Posts

    Re: which IP is controlled by CNFS?

    ‏2012-05-02T21:29:43Z  
    CNFS only takes over IPs defined in the CNFS IP addresse list. You'll get more messages in the log if you bump up cnfsDebug value.
    
    ex: mmchconfig cnfsDebug=2
    
  • tomerperry
    tomerperry
    22 Posts

    Re: which IP is controlled by CNFS?

    ‏2012-05-03T06:16:53Z  
    Hi,

    IMHO that's the expected behavior. GPFS reports which GPFS node failed ( thus its being represented by the GPFS IP and not the CNFS IP) - but takes over the public IP ( e.g. the 170.212.169.0/24 range).

    Tomer.
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: which IP is controlled by CNFS?

    ‏2012-05-03T14:53:10Z  
    • truongv
    • ‏2012-05-02T21:29:43Z
    CNFS only takes over IPs defined in the CNFS IP addresse list. You'll get more messages in the log if you bump up cnfsDebug value.
    <pre class="jive-pre"> ex: mmchconfig cnfsDebug=2 </pre>
    truongv wrote:
    > CNFS only takes over IPs defined in the CNFS IP addresse list.

    That's what I expect, and what I observed. I'm not going to shut down a
    production GPFS server just to check this right now, but the next time
    I do maintenance on one of the nodes I'll confirm that only the CNFS IP
    address is being taken over (as expected), not the node IP.

    However, that is not what GPFS logs. GPFS reports:
    
    Tue Apr 10 20:12:20 EDT 2012: mmnfsrecovernode: Initiating IP takeover of 192.168.110.2
    


    It appears that GPFS is doing the correct actions, but writing an
    incorrect log entry. I believe the log should read something like:
    
    Tue Apr 10 20:12:20 EDT 2012: mmnfsrecovernode: Initiating IP takeover of 170.212.169.87 from node 192.168.110.2
    


    truongv wrote:
    > You'll get more messages in the log if you bump up cnfsDebug value.
    >
    
    ex: mmchconfig cnfsDebug=2
    

    Thanks. That's very useful.