Troubleshooting
Problem
Overview
This article describes how to identify CPU and IO bottlenecks to assist with diagnosing slow queries or read and write timeouts.
Applies to
Most Linux distributions. These examples were tested with the following versions:
- DSE 6.7, 6.0, 5.1, 5.0
- RHEL 7.5
- Ubuntu 16.04-18.04
Summary
It's helpful to observe the state of the system resources when you want to diagnose slow queries or read and write timeouts. Normal monitoring systems can use too much aggregation to explain major events. A common, widely available tool that can be used on servers is iostat. The iostat command is typically available in the sysstat package for your Linux distribution.
Collecting metrics
We need to run iostat during a busy period where performance problems are experienced or during a high peak load. The following iostat command gathers data for 900 seconds (15 minutes).
- Run iostat -x -c -d -t 1 900 > /tmp/iostat.txt
- Run lscpu and read the 'Thread(s) per core' output
How to Find High CPU
From the output file (/tmp/iostat.txt) of the iostat command above:
If 'Thread(s) per core: 2'
With Thread(s) per core: 2, if user+system+nice+steal is over 50% seconds in the log, then the CPU is saturated and will impact performance.
With grep and awk:
grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '($6+$4)<50.0{printf("%5.1f
", $6+$4)}' | wc -l
if 'Thread(s) per core: 1'
With Thread(s) per core: 1, if user+system+nice+steal is over 90% in the output file, then the CPU is saturated and will impact performance.
With grep and awk:
grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '($6+$4)<10.0{printf("%5.1f
", $6+$4)}' | wc -l
How to Determine if the IO is Busy
There are two metrics to evaluate disk bottlenecks: avgqu-sz (average queue size) and iowait% (%cpu time spent waiting on I/O).
Finding I/O bottlenecks via iowait
Count the number of times iowait% is over 5%, if it occurs more than 45 times in the output file, then you have identified an IO bottleneck during that time window.
With grep and awk:
grep avg-cpu -A1 /tmp/iostat.txt | grep -v "avg-cpu" | grep -v "-" | awk '$4>5.0{print $4}' | wc -l
Finding I/O Bottlenecks in individual drives
When we see avgqu-sz over 1, this indicates the saturation of the device. While different devices behave differently when saturated, this reading is a good indication that using more IO will not help the performance of the system.
Count the number of times avgqu-sz of a drive is over 1.0, if it occurs more than 18 times in the output file, it is an indication of saturation for that drive.
See also
As always, DataStax recommends searching the documentation for useful information, including:
Document Location
Worldwide
Historical Number
ka0Ui0000000NsfIAE
Was this topic helpful?
Document Information
Modified date:
30 January 2026
UID
ibm17258600