Network performance monitoring

Network performance can be monitored with Remote Procedure Call (RPC) statistics.

The GPFS daemon caches statistics relating to RPCs. Most statistics are related to RPCs sent to other nodes. This includes a set of up to seven statistics cached per node and one statistic that is cached per size of the RPC message. For RPCs received from other nodes, one statistic is cached for each type of RPC message. The counters are measured in seconds and milliseconds

The statistics cached per node are the following:
Channel wait time
The amount of time the RPC must wait for access to a communication channel to the target node.
Send time TCP
The amount of time to transfer an RPC message to an Ethernet interface.
Send time verbs
The amount of time to transfer an RPC message to an InfiniBand interface.
Receive time TCP
The amount of time to transfer an RPC message from an Ethernet interface into the daemon.
Latency TCP
The latency of the RPC when sent and received over an Ethernet interface.
Latency verbs
The latency of the RPC when sent and received over an InfiniBand interface.
Latency mixed
The latency of the RPC when sent over one type of interface (Ethernet or InfiniBand) and received over the other (InfiniBand or Ethernet).

If an InfiniBand network is not configured, no statistics are cached for send time verbs, latency verbs, and latency mixed.

The latency of an RPC is defined as the round-trip time minus the execution time on the target node. The round-trip time is measured from the start of writing the RPC message to the interface until the RPC reply is completely received. The execution time is measured on the target node from the time the message is completely received until the time the reply is sent. The latency, therefore, is the amount of time the RPC is being transmitted and received over the network and is a relative measure of the network performance as seen by the GPFS daemon.

There is a statistic associated with each of a set of size ranges, each with an upper bound that is a power of 2. The first range is 0 through 64, then 65 through 128, then 129 through 256, and then continuing until the last range has an upper bound of twice the maxBlockSize. For example, if the maxBlockSize is 1 MB, the upper bound of the last range is 2,097,152 (2 MB). For each of these ranges, the associated statistic is the latency of the RPC whose size falls within that range. The size of an RPC is the amount of data sent plus the amount of data received. However, if one amount is more than 16 times greater than the other, only the larger amount is used as the size of the RPC.

The final statistic associated with each type of RPC message, on the node where the RPC is received, is the execution time of the RPC.

Each of the statistics described so far is actually an aggregation of values. By default, an aggregation consists of 60 one-second intervals, 60 one-minute intervals, 24 one-hour intervals, and 30 one-day intervals. Each interval consists of a sum of values accumulated during the interval, a count of values added into the sum, the minimum value added into the sum, and the maximum value added into the sum. Sixty seconds after the daemon starts, each of the one-second intervals contains data and every second thereafter the oldest interval is discarded and a new one entered. An analogous pattern holds for the minute, hour, and day periods.

As each RPC reply is received, the following information is saved in a raw statistics buffer:
  • channel wait time
  • send time
  • receive time
  • latency
  • length of data sent
  • length of data received
  • flags indicating if the RPC was sent or received over InfiniBand
  • target node identifier
As each RPC completes execution, the execution time for the RPC and the message type of the RPC is saved in a raw execution buffer. Once per second these raw buffers are processed and the values are added to the appropriate aggregated statistic. For each value, the value is added to the statistic's sum, the count is incremented, and the value is compared to the minimum and maximum, which are adjusted as appropriate. Upon completion of this processing, for each statistic the sum, count, minimum, and maximum values are entered into the next one-second interval.

Every 60 seconds, the sums and counts in the 60 one-second intervals are added into a one-minute sum and count. The smallest of the 60 minimum values is determined, and the largest of the 60 maximum values is determined. This one-minute sum, count, minimum, and maximum are then entered into the next one-minute interval.

An analogous pattern holds for the minute, hour, and day periods. For any one particular interval, the sum is the sum of all raw values processed during that interval, the count is the count of all values during that interval, the minimum is the minimum of all values during that interval, and the maximum is the maximum of all values during that interval.

When statistics are displayed for any particular interval, an average is calculated from the sum and count, then the average, minimum, maximum, and count are displayed. The average, minimum and maximum are displayed in units of milliseconds, to three decimal places (one microsecond granularity).

The following mmchconfig attributes are available to control the RPC buffers and intervals:
  • rpcPerfRawStatBufferSize
  • rpcPerfRawExecBufferSize
  • rpcPerfNumberSecondIntervals
  • rpcPerfNumberMinuteIntervals
  • rpcPerfNumberHourIntervals
  • rpcPerfNumberDayIntervals

The mmdiag command with the --rpc parameter can be used to query RPC statistics.

For more information, see mmchconfig command, mmnetverify command and mmdiag command.