IBM Support

HOW TO: Use iostat to diagnose CPU and IO bottlenecks

Troubleshooting


Problem

Overview

This article describes how to identify CPU and IO bottlenecks to assist with diagnosing slow queries or read and write timeouts.

Applies to

Most Linux distributions. These examples were tested with the following versions:

  • DSE 6.7, 6.0, 5.1, 5.0
  • RHEL 7.5
  • Ubuntu 16.04-18.04

Summary

It's helpful to observe the state of the system resources when you want to diagnose slow queries or read and write timeouts. Normal monitoring systems can use too much aggregation to explain major events. A common, widely available tool that can be used on servers is iostat. The iostat command is typically available in the sysstat package for your Linux distribution.

Collecting metrics

We need to run iostat during a busy period where performance problems are experienced or during a high peak load.  The following iostat command gathers data for 900 seconds (15 minutes).

  • Run iostat -x -c -d -t 1 900 > /tmp/iostat.txt
  • Run lscpu and read the 'Thread(s) per core' output

How to Find High CPU

From the output file (/tmp/iostat.txt) of the iostat command above:

If 'Thread(s) per core: 2'

With Thread(s) per core: 2, if user+system+nice+steal is over 50% seconds in the log, then the CPU is saturated and will impact performance.

With grep and awk:

grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '($6+$4)<50.0{printf("%5.1f
", $6+$4)}' | wc -l

 

if 'Thread(s) per core: 1'

With Thread(s) per core: 1, if user+system+nice+steal is over 90%  in the output file, then the CPU is saturated and will impact performance.

With grep and awk:

grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '($6+$4)<10.0{printf("%5.1f
", $6+$4)}' | wc -l

 

How to Determine if the IO is Busy

There are two metrics to evaluate disk bottlenecks: avgqu-sz (average queue size) and iowait% (%cpu time spent waiting on I/O).

Finding I/O bottlenecks via iowait

Count the number of times iowait% is over 5%, if it occurs more than 45 times in the output file, then you have identified an IO bottleneck during that time window.

With grep and awk:

grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '$4>5.0{print $4}' | wc -l

Finding I/O Bottlenecks in individual drives

When we see avgqu-sz over 1, this indicates the saturation of the device.  While different devices behave differently when saturated, this reading is a good indication that using more IO will not help the performance of the system.

Count the number of times avgqu-sz of a drive is over 1.0, if it occurs more than 18 times in the output file, it is an indication of saturation for that drive.

See also

As always, DataStax recommends searching the documentation for useful information, including:

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka0Ui0000000NsfIAE

Document Information

Modified date:
30 January 2026

UID

ibm17258600