Topic
  • 1 reply
  • Latest Post - ‏2013-05-15T15:45:19Z by GarlandJoseph
SamuraiMark
SamuraiMark
2 Posts

Pinned topic Strange ping behaviour - A can ping B, B cannot ping A

‏2013-05-07T19:18:09Z |

One of my client LPARs cannot talk to my NFS server, so my users cannot log in. This happens every "once in a while". Situation Three LPARs, A (client), B (NFS) and C (NIM). Users cannot SSH into A because A cannot mount home directories from B. All three LPARs are in the same frame, and all three have IP addresses on the same VLAN.

Ping tests:

  • A and B can both ping C.
  • B (NFS) can ping A.
  • A (client) cannot ping B.

After "a while" the situation corrects itself and A can talk to B normally.

Ping tests from B to A and C:

# ping nim.empire.ca
PING kgnnim01.empire.ca (10.10.2.24): 56 data bytes
64 bytes from 10.10.2.24: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.10.2.24: icmp_seq=1 ttl=255 time=0 ms
64 bytes from 10.10.2.24: icmp_seq=2 ttl=255 time=0 ms
^C
--- kgnnim01.empire.ca ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# ping nfs.empire.ca
PING kgnnfs01.empire.ca (10.10.2.25): 56 data bytes
64 bytes from 10.10.2.25: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.10.2.25: icmp_seq=1 ttl=255 time=0 ms
64 bytes from 10.10.2.25: icmp_seq=2 ttl=255 time=0 ms
^C
--- kgnnfs01.empire.ca ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
#

Ping tests from A to B and C:

# ping nim.empire.ca
PING kgnnim01.empire.ca (10.10.2.24): 56 data bytes
64 bytes from 10.10.2.24: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.10.2.24: icmp_seq=1 ttl=255 time=0 ms
64 bytes from 10.10.2.24: icmp_seq=2 ttl=255 time=0 ms
^C
--- kgnnim01.empire.ca ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# ping nfs.empire.ca
PING kgnnfs01.empire.ca (10.10.2.25): 56 data bytes
^C
--- kgnnfs01.empire.ca ping statistics ---
17 packets transmitted, 0 packets received, 100% packet loss
#

In the time it too me to write this out, the situation is now corrected, and I did nothing to correct it:

# ping nfs.empire.ca
PING kgnnfs01.empire.ca (10.10.2.25): 56 data bytes
64 bytes from 10.10.2.25: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.10.2.25: icmp_seq=1 ttl=255 time=0 ms
^C
--- kgnnfs01.empire.ca ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
#

My only thought is that the NFS server is temporarily blocking the client via firewalling, but I have no evidence to back that up. Not sure where to go from here. Next time it happens I'll run tcpdump on the NFS box to see if it can see the incoming pings and whether it is responding. Any pointers are appreciated.

- Mark

  • GarlandJoseph
    GarlandJoseph
    167 Posts

    Re: Strange ping behaviour - A can ping B, B cannot ping A

    ‏2013-05-15T15:45:19Z  

    Make sure you don't have a duplicate IP address (check aix error logs).  Also, since they are in the same lan, check your arp caches and compare the mac addresses when you can't ping. 

    So, say node a can't ping node b, check node a's arp cache value(mac address) of node B (arp -an or arp -d <node-b-hostname> ).  Record that value.  Then do arp -d <node-b-hostname>, then ping node b again and look at the value again.  It may still be the same.  In that case, the next time you can ping node b from node a, look in node a's arp cache again and compare the value when it failed against succeeding. 

    This is just one thing to do.