IBM Support

What is meaning of "CRC Errors", "DMA Overrun", "Packet Too Short Errors" and "Packet Too Large Errors" in entstat/netstat and what can be done to resolve them?

Question & Answer


Question

What is meaning of "CRC Errors", "DMA_Overrun", "Packet Too Short Errors" and "Packet Too Large Errors" in entstat/netstat and what can be done to resolve them?

Answer

Following figures show the header, data, and trailer part of the ethernet packet with and without VLAN tag.
image-20220119093036-1
All 4 counters are part of the receive statistics of the ethernet adapter. These counters are maintained by the physical adapter. The entstat/netstat reads the counters from the physical adapter and displays them. Now let's look at each counter in detail.
CRC Errors
The CRC stands for "Cyclic Redundancy Check".  The following snippet of the entstat output shows "CRC Errors" logged by ent5 on the receiving host. 
ETHERNET STATISTICS (ent5):
Device Type: PCIe3 4-Port 10GbE SR Adapter
Hardware Address: XX:XX:XX:XX:XX:XX
Elapsed Time: 63 days 11 hours 34 minutes 25 seconds
Transmit Statistics:                       Receive Statistics:
---------------------                      -------------------
Packets: 732153706441                      Packets: 739827939731
Bytes: 821856337999828                     Bytes: 893935706238484
Interrupts: 11439901055                    Interrupts: 566076060637
Transmit Errors: 0                         Receive Errors: 0
Packets Dropped: 723                       Packets Dropped: 0
                                           Bad Packets: 0
Max Packets on S/W Transmit Queue: 292
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 52
Broadcast Packets: 44555                   Broadcast Packets: 108984
Multicast Packets: 390454                  Multicast Packets: 774441
No Carrier Sense: 0                        CRC Errors: 1928448
DMA Underrun: 0                            DMA Overrun: 0
Lost CTS Errors: 0                         Alignment Errors: 0
Max Collision Errors: 0                    No Resource Errors: 0
Late Collision Errors: 0                   Receive Collision Errors: 0
Deferred: 0                                Packet Too Short Errors: 0
SQE Test: 0                                Packet Too Long Errors: 0

Timeout Errors: 0                          Packets Discarded by Adapter: 0
Single Collision Count: 0                  Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 52
...
...
FCS (Frame Check Sequence) field contains a 4-byte CRC value used for error checking. When a source host assembles a packet, it performs a CRC calculation on all fields in the packet except the Preamble, SFD (Start Frame Delimiter), and FCS using a predetermined algorithm. The source host stores the value in the FCS field and transmits it as part of the packet. When the packet is received by the destination host, it performs a CRC test again by using the same algorithm. If the CRC value calculated at the destination host does not match the value in the FCS field, the destination host discards the packet, considering this as a CRC Error. It is important to note that the packet is discarded by adapter before sending it up to the device driver level.
Common Causes and Solutions: CRC errors can be caused by a number of factors. Typically they are caused by either defective cable, transceiver (SFP), switch port, upstream network device, etc. To address this error, try replacing the cable or transceiver (SFP) and check the switch port and upstream network device. If help is needed in finding the correct cable, transceiver (SFP), contact IBM hardware support. For switch port and network device support, contact the respective vendor.
DMA Overrun
The DMA Overrun is incremented when the adapter is using DMA (Direct Memory Access) to put a packet into system memory and the transfer is not completed. There are system buffers available for the packet to be placed into, but the DMA operation fails to complete. This occurs when the bus is too busy for the adapter to be able to use DMA for the packets. This normally occurs when the ethernet adapter cannot handle packets as fast as they are received. The following snippet of the entstat output shows "DMA Overrun" logged by ent10 on receiving host. 
ETHERNET STATISTICS (ent10):
Device Type: 2-Port 10/100/1000 Base-TX PCI-Express Adapter (14104003)
Hardware Address: XX:XX:XX:XX:XX:XX
Elapsed Time: 51 days 20 hours 48 minutes 10 seconds

Transmit Statistics:                      Receive Statistics:
---------------------                     ---------------------
Packets: 229518813757                     Packets: 779179505474
Bytes: 33323471631906                     Bytes: 1145699989615705
Interrupts: 0                             Interrupts: 179619004179
Transmit Errors: 0                        Receive Errors: 11545678
Packets Dropped: 0                        Packets Dropped: 0
                                          Bad Packets: 0
Max Packets on S/W Transmit Queue: 533
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 32549                  Broadcast Packets: 55820
Multicast Packets: 0                      Multicast Packets: 0
No Carrier Sense: 0                       CRC Errors: 0
DMA Underrun: 0                           DMA Overrun: 11545678
Lost CTS Errors: 0                        Alignment Errors: 0
Max Collision Errors: 0                   No Resource Errors: 0
Late Collision Errors: 0                  Receive Collision Errors: 0
Deferred: 0                               Packet Too Short Errors: 0
SQE Test: 0                               Packet Too Long Errors: 0
Timeout Errors: 0                         Packets Discarded by Adapter: 0
Single Collision Count: 0                 Receiver Start Count: 0
Multiple Collision Count: 0               Current HW Transmit Queue Length: 0
...
...
Common Causes and Solutions
(1) The system can be too busy at such times to upload the packets from the adapter to the upper layer to make room for new packets. Check the CPU usage during the time DMA Overrun is logged and increase the CPU if needed.
(2) Enable flow control on the switch port. The flow control is enabled on the adapter by default. For flow control to work, it needs to be enabled on the adapter and switch port. By enabling the flow control at both ends, the adapter transmits the pause frames to switch port to pause when the adapter cannot handle packets coming in.
Packet Too Short Errors and Packet Too Long Errors
The Ethernet packet has minimum and maximum size limits. The following table shows the minimum and the maximum size depending upon if the packet is standard or jumbo and if the packet has a VLAN tag or not. The adapter checks the size of the receiving packet. If the packet is shorter or longer than the limit, then the adapter increases "Packet Too Short Errors" or "Packet Too Long Errors" counter respectively and drop the packet.
image 8836
The following snippet of the entstat output shows "Packet Too Short Errors" or "Packet Too Long Errors" logged by ent6 on receiving host. 
ETHERNET STATISTICS (ent6):
Device Type: PCIe3 4-Port 10GbE SR Adapter
Elapsed Time: 45 days 12 hours 49 minutes 20 seconds
Hardware Address: XX:XX:XX:XX:XX:XX
Transmit Statistics:                       Receive Statistics:
--------------------                       -------------------
Packets: 183771813                         Packets: 3820615599
Bytes: 433300088949                        Bytes: 409820431138
Interrupts: 2870985                        Interrupts: 3247179020
Transmit Errors: 0                         Receive Errors: 0
Packets Dropped: 0                         Packets Dropped: 0
                                           Bad Packets: 0
Max Packets on S/W Transmit Queue: 32
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 37
Broadcast Packets: 2                       Broadcast Packets: 3134579853
Multicast Packets: 485868                  Multicast Packets: 528737049
No Carrier Sense: 0                        CRC Errors: 0
DMA Underrun: 0                            DMA Overrun: 0
Lost CTS Errors: 0                         Alignment Errors: 0
Max Collision Errors: 0                    No Resource Errors: 0
Late Collision Errors: 0                   Receive Collision Errors: 0
Deferred: 0                                Packet Too Short Errors: 166
SQE Test: 0                                Packet Too Long Errors: 53
Timeout Errors: 0                          Packets Discarded by Adapter: 0
Single Collision Count: 0                  Receiver Start Count: 0
Multiple Collision Count: 0                Current HW Transmit Queue Length: 37
...
...
It is important to note that the adapter removes FCS as it receives the packet from the wire and sends packet up to OS and adds FCS when it receives the packet from OS and sends the packet to the wire. This means when iptrace and tcpdump captures the packet at the OS level, packet size is 4 bytes smaller than shown in the above table because the packet does not have FCS.
Common Causes and Solutions: The Packet Too Short Errors and Packet Too Large Errors can caused due to excessive collision, electrical interference, defective cable, transceiver (SFP), duplex mismatch, switch port or upstream network device, etc. To address this error, try replacing the cable, transceiver (SFP), check switch port and upstream network device. For switch port and network device support, contact the respective vendor.
Author: Darshan Patel
Platform: AIX and VIOS on Power
Feedback: aix_feedback@wwpdl.vnet.ibm.com

[{"Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cw48AAA","label":"Networking-\u003EAdapters"}],"ARM Case Number":"TS005112938","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
19 January 2022

UID

ibm16420155