Tutorial: Identifying the source of slow drain problems caused by depletion of buffer credits

Use this tutorial to find out how to use IBM Storage Insights Pro to identify a host that has depleted buffer credits that are causing a slow drain condition.

About this task

Fibre Channel (FC) networks use buffer credits to control the flow of data frames from port to port. The number of buffer credits for a port is the number of data frames that the port can receive. When that number is reached, ports cannot send further data frames until the receiving port indicates that it is ready. If all of the buffer credits of a port are being used, then the port cannot receive more data.

For example, if a host has a performance problem, then its ports might not be able to clear their buffer credits to receive more data. If the host ports cannot receive data, then switch ports cannot send data to the host ports, so the buffer credits of the switch port become depleted too. Ports on other switches in the fabric that try to send data through the switch port are also affected, and their buffer credits become depleted in turn. In this way, the buffer credit problem builds throughout the storage environment. The buffer credit depletion on the host ports impacts the switches that communicated with the host. The switches that communicate with that switch cannot use their buffer credits, so storage systems cannot communicate with the switches.

In this way, a single host with a performance problem can impact all the hosts that use the same switches and inter-switch links. This condition is called slow drain. Slow drain in your storage environment can manifest as a problem with storage systems rather than with a host.

Procedure

  1. Configure a performance alert for Port Send Delay Time, Port Send Delay I/O Percentage, or Zero Buffer Credit Timer, depending on the storage system.
    For example, configure a Port Send Delay I/O Percentage alert for IBM Storage FlashSystem 9100 that is triggered when the delay is greater than 20%.
    Click Configuration and then click Alert Policies to create a new policy or edit an existing one.
    Port Send Delay Time alert for FlashSystem 9100
  2. To view the alerts, click Dashboards and then click Alerts. If the Port Send Delay Time alert was triggered, note the time of the alert.
  3. To view information about the affected storage system, click the link in the Resource column.
  4. In the Internal Resources section of the storage system details page, click Volumes.
  5. Click the Performance tab.
  6. Set a time period for the performance chart. Set the start time to before the alert occurred and the end time to after the alert occurred.
  7. Set the chart to display the following metrics:
    • Read Data Rate
    • Overall Response time
  8. Sort the performance table by the Max Total I/O Rate column.
  9. Click the volume with the highest Total I/O Rate to show the volume in the chart. Verify that the Read Data Rate spiked at the time that the alert occurred.
    Volumes performance chart showing volume with highest data rate
  10. Right-click the volume in the performance table, then click Host Connection Performance.
  11. Set the chart to display the following metrics:
    • Read Data Rate
    • Overall Response time
  12. Verify that the spike for the Read Data Rate and the Overall Response Time occurred when the alert was triggered. If the spike occurred at the same time, the host P9_lpar5_epictest01 is the source of the slow drain that is causing problems on the storage system.
    Host Connections performance chart showing host with spike in data rate
    Tip: If the buffer credit problem occurs in a cluster, there might be multiple hosts that are mapped to the volume. In this case, you must investigate the hosts individually to determine which ones cause the delay in sending from the port.