Troubleshooting
Problem
How to troubleshoot ethernet connection issues on the BladeCenter 8677 chassis.
Resolving The Problem
| Source |
|---|
RETAIN tip: H1979
| Issue |
|---|
How to troubleshoot ethernet connection issues on the BladeCenter 8677 chassis.
| Affected configurations |
|---|
The system may be any of the following IBM servers:
- BladeCenter Chassis, type 7967, any model
- BladeCenter Chassis, type 8677, any model
- BladeCenter H, type 8852, any model
- BladeCenter HS20, type 1883, any model
- BladeCenter HS20, type 1884, any model
- BladeCenter HS20, type 7981, any model
- BladeCenter HS20, type 8678, any model
- BladeCenter HS20, type 8832, any model
- BladeCenter HS20, type 8843, any model
- BladeCenter HS21, type 8853, any model
- BladeCenter HS40, type 8839, any model
- BladeCenter JS20, type 8842, any model
- BladeCenter JS21, type 8844, any model
- BladeCenter LS20, type 8850, any model
- BladeCenter LS21, type 7971, any model
- BladeCenter LS41, type 7972, any model
- BladeCenter QS20, type 0200, any model
- BladeCenter T, type 8720, any model
- BladeCenter T, type 8730, any model
This tip is not option specific.
This tip is not software specific.
| Additional information |
|---|
Networking is a complex subject whose scope far exceeds this document. Because of its complexity, their is a tendency to attribute many networking problems to bad networking hardware when the problems are really configuration. In general, the quality of modern network hardware is very high, and genuine hardware failures are very rare. Code bugs in firmware and network configuration problems account for the vast majority of networking problems. However, it can be quite difficult to find the root cause of networking problems, especially in production networks.
The purpose of this document is to provide the steps necessary to verify whether basic networking is functioning, not to diagnose subtle or complex driver/firmware bugs or network misconfiguration. For the purpose of this document, basic connectivity is defined as the ability to successfully ping another host. Network failures outside ping (ICMP) connectivity are outside the scope of this document. It should be noted that if ping does work, it is extremely unlikely that bad hardware is the cause of the larger issue. Also, this document does not cover ethernet connections to the Management Module (MM). For problems connecting to the MM over ethernet, see the document "Troubleshooting MM Connectivity issues."
Networking connectivity is accomplished by sending traffic through a series of devices. Devices only connect to one another when they successfully negotiate a connection link between one another. Generally, the path followed looks like this:
device --> link --> device --> link --> device --> [etc]
It is important to remember that even when all network devices are working correctly, a link failure can occur if the two devices are not appropriately configured. Effective network troubleshooting on the IBM BladeCenter requires knowing how each device works, and using ping or network sniffers to determine how far the ethernet packets are getting.
The logical chain of devices needed to ping a host on the external network looks like this:
Blade --> midplane --> internal switch port --> possible upper layer protocols --> external switch port --> cat 5 cabling --> upstream switch port --> host on the network
| Blade: |
|---|
Their are several components in the blade that must function correctly for the blade to send ethernet packets through the midplane to the ESM.
NIC and NIC driver -- The NIC can be disabled in BIOS. If it is disabled, the driver will not detect the interface and will generate errors at boot time. In Microsoft Windows, this will appear in the properties for Network neighborhood as a disabled adapter. Linux will generate errors at boot time when the driver tries to insert into the kernel. Both operating systems will show the NIC as being disabled. Driver problems usually generate error message during operating system startup, though the error could indicate that the NIC is disabled (the message will need to be interpreted to determine this). If the interface does not respond to a ping of the statically assigned IPaddress, the driver and/or the NIC are not working correctly.
One factor that must be remembered when doing ethernet troubleshooting on the BladeCenter is that each operating system does not assign the two physical NICs in the same order.
Whereas Microsoft Windows may assign the NIC connected to the switch in Bay 1 the first IPinterface, Linux may assign the NIC connected to the switch in Bay 1 the second IPinterface. To be certain about which IPinterface is assigned to which switch, disable the interface within the operating system and see which switch port goes down, or to disable the switch port and which which IPinterface goes down in the OS. One can also examine the MAC address table of the switch to see which MAC address is associated with which port.
NIC Teaming -- All IBM blades provide the ability to team NICs, and the teaming software tools all provide multiple algorithms for teaming the NICs. When troubleshooting ethernet problems, temporarily disabling the team is often desirable. If that is not possible, make sure that you and the user understand which NIC is which in the teaming configuration. This may seem simple, but it is not. Though the first NIC in the blade is wired to the switch in slot 1, and the second NIC is wired to connect to slot 2, the operating system does NOT always present the NICs that order. Different versions of Linux and Microsoft Windows present the two NICs in different orders. Do not assume that the first NIC presented by the operating system is the first NIC on the blade. One way to discover which bay a NIC is assigned is to disable the port that the switch is assigned and verify the Blade NIC goes down.
TCPIP configuration of the operation system -- When verifying basic network connectivity on the BladeCenter, make sure you know the (1) IPaddress (2) subnet mask (3) VLAN ID, if any, for the Blade and the host you are trying to ping. If the user is using 802.1st Quarter VLANs in their environment, it must be configured on the blade and all other switches in the network between the BladeCenter and host.
| Midplane: |
|---|
The midplane's ethernet connection from the blade to the switches is unconfigurable. The blade NIC and internal switch ports must be left in their default configuration of autonegotiation for speed and duplex. Attempts to configure the layer 1 characteristics on the blade or BladeCenter switch will lead to a link failure which appears to be a midplane failure, but is not.
A physical link problem between the blade port and the chassis switch module port could be caused by a midplane connector mating problem. Always inspect the blade and switch module connectors, replug the blade and replug the chassis switch early on in the debug process.
If a midplane failure did occur, you would see one blade NIC not establishing link with the internal switch port, but the other NIC on the blade getting link. Multiple blades would fail in the same way in that slot, and the failure would also be the same with multiple switches in the I/O port.
| Internal Switch port: |
|---|
As mentioned above, the layer 1 properties of the blade to ESM connection should not be changed. If the BladeCenter is using VLANS, the internal switch port must be properly configured to pass the traffic from this port to any other internal or external port.
| Possible upper layer protocols: |
|---|
Though most users do not use the layer 3-7 functionality offered by some of our switches, this can be a source of failure if users are using them.
More often, failures here are due to layer 2 VLAN tagging and/or PVID configuration. In either case, diagnosing problems at this layer usually requires examination of the BladeCenter switch configuration and upstream switch configuration by a network specialist.
| External Switch port: |
|---|
Configuring external switch ports so they can communicate with an upstream switch is the source of most networking problems on the BladeCenter. Successful connectivity between the BladeCenter ESM and the upstream switch requires that both devices be properly configured, and the cable between the two be fully functional. If link is established, but network traffic is not passing over the link, collect the switch configuration for the BladeCenter switch, the upstream switch, and engage a networking specialist.
If the BladeCenter is failing to establish link, consider the following:
- The default configuration for the MM is to disable all external ports for I/O modules 1-4. When a chassis has problems linking externally, log into the MM and make sure the "External Ports" is set to "enabled" for the I/O modules having link problems.
- In the past, many users observed link failures with 10/100 switches with speed and duplex set to autonegotiation. They were often able to get around these problems by hard coding switch ports to 100/Full. The gigabit standard has significantly improved the situation, and more problems are now caused by trying to hard set switches to 1000/Full than are resolved. If the BladeCenter switch cannot keep link up with another switch, verify that the configuration of the BladeCenter switch and upstream switch are set to autonegotiate for speed and duplex.
- A test to verify whether the external ports are working on a BladeCenter switch is to take a known good networking cable and connect two external ports to one another. Continue testing on all the ports. If the link comes up, that is verification that those ports are not having a physical failure. Another test is to connect a laptop or other host to the switchports on the BladeCenter having the problem. If the laptop bring up the link, the ports are not having a physical failure. If the external links do not come up when this test is done, reset the switch to the default configuration and repeat the test. A failure at this point indicates a bad switch that should be replaced.
If the above tests indicate the port is not having a physical failure, obtain the configuration for the BladeCenter switch and the upstream switch and engage a technician who has switch expertise.
| Cat 5e cabling: |
|---|
Only category 5e ethernet cables (commonly known as "cat 5e") or higher are supported on the BladeCenter. Because cabling does not have any diagnostics, a bad cable can look to the user like a bad ethernet port on either the BladeCenter switch or the upstream switch. Bad cables are not common, but they are more common than bad switch ports. If the configuration and troubleshooting up to this point has not found any problems, have the user verify that this cable is good, or us a cable which is known to be good to connect the BladeCenter switch to the upstream switch.
| Upstream switch port: |
|---|
Though networking hardware can have hardware failures, incorrect configuration is far more common as a cause of connectivity problems. To verify whether the port configuration on the user's switch is appropriate to connect to the BladeCenter chassis, obtain the BladeCenter switch configuration and the configuration of the upstream switch. If the upstream switch configuration cannot be gathered, troubleshooting connection problems becomes guesswork. That is not an effective troubleshooting technique.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
18 April 2023
UID
ibm1MIGR-5069609