Performance metrics
IBM Spectrum Control can collect many different performance metrics, which indicate the particular performance characteristics of monitored resources.
About this task
- Throughput of an entire box (storage system)
- Each cluster
- Each controller (Examples: DS8000)
- Each I/O Group (Example: storage systems that run IBM Storage Virtualize)
- Each volume (or LUN)
- At the Fibre Channel interfaces (ports) on some of the storage boxes
- On Fibre Channel switches
- At the RAID array after cache hits have been filtered out
For storage systems, the performance statistics are separated into frontend I/O metrics and back-end I/O metrics. Front-end I/O metrics are a measure of the traffic between the servers and storage systems. Back-end I/O metrics are a measure of all traffic between the storage system cache and the disks in the RAID arrays in the back-end of the storage system. Most storage systems give metrics for both kinds of I/O operations: front-end and back-end operations. It is important to know whether the throughput and response times are at the front-end (close to the system level response time as measured from a server) or back-end (between the cache and disk).
- Total IO rate (overall)
- Read IO rate (overall)
- Write IO rate (overall)
- Overall response time
- Read response time
- Write response time
- Total back-end IO rate (overall)
- Back-end read IO rate (overall)
- Back-end write IO rate (overall)
- Overall back-end response time
- Back-end read response time
- Back-end write response time
For planning purposes, it's important to track any growth or change in the rates and response times. It frequently happens that I/O rate grows over time, and that response time increases as the I/O rates increase. This relationship is what "capacity planning" is all about. As I/O rates increase, and as response times increase, you can use these trends to project when additional storage performance (as well as capacity) is required.
- Total cache hit percentage
- Read cache hit percentage
- Write-cache delay percentage (previously known as NVS full percentage)
- Read transfer size (KB/operation)
- Write transfer size (KB/operation)
Low cache hit percentages can drive up response times, because a cache miss requires access to back-end storage. Low hit percentages also tend to increase the utilization percentage of the back-end storage, which might adversely affect the back-end throughput and response times. High write-cache delay percentage (previously known as NVS full percentage) can drive up the write response times. High transfer sizes typically indicate more of a batch workload, in which case the overall data rates are more important than the I/O rates and the response times.
- Total I/O rate and total data rate thresholds
- Total back-end I/O rate and total back-end data rate thresholds
- Read back-end response time and write back-end response time thresholds
- Total port I/O rate (packet rate) and data rate thresholds
- Overall port response time threshold
- Port send utilization percentage and port receive utilization percentage thresholds
- Port send bandwidth percentage and port receive bandwidth percentage thresholds
For Fibre Channel switches, the important metrics are total port packet rate and total port data rate, which provide the traffic pattern over a particular switch port. Port bandwidth percentage metrics are also important to provide an indicator of bandwidth usage based on port speeds. When there are lost frames from the host to the switch port, or from the switch port to a storage device, the dumped frame rate on the port can be monitored.
- Monitor the throughput and response time patterns over time for your environment
- Develop an understanding of expected behaviors
- Investigate the deviations from normal patterns of behavior to get warning signs of abnormal behavior
- Generate the trend of workload changes