Troubleshooting
Problem
Rule : SHC016 Issue Detected : Network communication problem detected on a host - packets dropped or Rx/Tx errors Severity : High Components : rack1.host1.ethif[bond2] (from ibm_host) - Number of errors Rx packets increased by 7.2 percent between 2017-10-19 18:25:07 and 2017-10-19 20:23:23 rack1.host1.ethif[eth10] (from ibm_host) - Number of errors Rx packets increased by 22.4 percent between 2017-10-19 18:25:07 and 2017-10-19 20:23:23
Symptom
System health check reported
Cause
RX errors/packet drops
Environment
N1001-00x N2001-00x N3001-00x
Diagnosing The Problem
To diagnosed the issue you need to identify and collect information about the ethernet interface in the hosts with that issue
1- ifconfig -a ---> to collect all ethernet interface information.
2- Confirm the ethernet interface or bond
[nz@host1 ~]$ ifconfig bond2
bond2 Link encap:Ethernet HWaddr 00:00:C9:EA:26:C0
inet addr:172.31.55.30 Bcast:172.31.55.255 Mask:255.255.255.0
inet6 addr: 2001:21:21:55:200:c9ff:feea:26c0/64 Scope:Global
inet6 addr: fe80::200:c9ff:feea:26c0/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1
RX packets:331707326 errors:10398231 dropped:23 overruns:0 frame:5322144
TX packets:181520810 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1047462234644 (975.5 GiB) TX bytes:19200635035 (17.8 GiB)
3- If the issue occurred in a particular bond(vlan) then confirm the ethernet interfaces that are slaves to that bond and check the the link failure count.
for example:
[nz@host1 ~]$ cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth10 (primary_reselect always)
Currently Active Slave: eth10
MII Status: up
MII Polling Interval (ms): 200
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth10
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 16811 ----> this is the issue
Permanent HW addr: 00:00:c9:ea:26:c0
Slave Interface: eth11
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:00:c9:ea:26:c4
[nz@netz01 ~]$
4- Collect the statistics on the ethernet interface having the issue
[root@host1 ~]# ethtool -S eth10 | grep -i error
rx_errors: 10398396
tx_errors: 0
rx_crc_errors: 5170911 ---> large number of cyclical error.
rx_alignment_symbol_errors: 151325
rx_in_range_errors: 2
rx_out_range_errors: 0
rx_address_match_errors: 9124924
[root@netz01 ~]#
This large number of cyclical error indicate that the NIC card for that port needs to be replaced. In this case that port belong to a 10 Ge Dual NIC. that was used by the Virtual IP.
Resolving The Problem
Replace hardware
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg22009909