Diagnosing network problems

In cases of application failure, slow system response, an application that fails to terminate or system error warnings, the potential cause might be a network problem.

About this task

It might be difficult to determine that the cause of these issues is a network problem because the symptoms might manifest on one node when they are caused by a failure or error on another node. It might also be difficult to isolate the cause of the network problem to a specific hardware component or configuration setting.

The potential source of a network problem can be classified into one of three main categories:
  • Hardware failure
  • Configuration error
  • Traffic overload
A general diagnostic method can be used to determine the potential source of a network problem:
  1. Check for connectivity between all IP addresses in the network using the ping command.
  2. If there are IP addresses that fail the connectivity test it might be a hardware failure or configuration error that is causing the network problem. Check for hardware errors on the devices assigned to those IP addresses. You can use the errpt -a command on each node suspected of network issues to view the system error log. Correct the errors and perform the connectivity test again. If the test fails after correcting hardware errors, it might be a configuration error causing the network problem. Check all of the configuration settings on the failed devices and verify that they have been set correctly.
  3. If the connectivity test is successful, the network problem might be caused by traffic overload. To diagnose this problem, you need to perform a network performance test. The test requires a minimum of 30 minutes and must be run during a scheduled outage or when no users are on the system. Use a network load tool that applies a synthetic and heavy workload with a known expected throughput result (either in megabits per second or packets per second) to the network. Compare the throughput results of the test to the expected value; if the results are close to the expected value, it might be traffic overload that is causing the network problem.

Diagnosing a failed port aggregation requires some additional considerations. If a single link of a port aggregation fails, typically the logical aggregation interface remains usable and the IP address returns pings. In this scenario, all the network traffic is directed over one of the working links contained in the aggregation. The symptom of a failed port aggregation is a loss of performance, and in some cases, the application might fail to terminate. For a port aggregation containing two links, if one of the links fails, the maximum throughput in a network performance test decreases to half. For an aggregation of n links, the maximum throughput decreases to (n - 1) / n.

To diagnose a failed port aggregation, perform the following steps:

  • Examine the system error log with the errpt command for messages related to network failures or Etherchannel adapter failures.
  • Use the netstat -v command to identify that the following network statuses or states are present:
    Aggregation
    The preferred aggregation status is Aggregated
    Link
    The preferred link status is Up
    LACP synchronization
    The preferred state for the Link Aggregation Control Protocol (LACP) is IN_SYNC
  • Check that the entstat -d en11 | grep "Link Status" command returns the following output:
    Link Status : Up
    Link Status : Up