Performance metrics for resources that run IBM Storage Virtualize

Monitor the performance metrics that are collected for IBM Storage Virtualize storage systems.

Overview

In this documentation, IBM Storage Virtualize is used to refer collectively to IBM SAN Volume Controller, IBM Storage Virtualize for Public Cloud, IBM Storage Virtualize as Software Only, and IBM Storwize storage systems, and to IBM Storage FlashSystem devices that run IBM Storage Virtualize.

Definitions are provided for the performance metrics that are collected for the following storage systems:
  • IBM Storage FlashSystem 5000
  • IBM Storage FlashSystem 5100
  • IBM Storage FlashSystem 7200
  • IBM Storage FlashSystem 7300
  • IBM Storage FlashSystem V9000
  • IBM Storage FlashSystem 9100
  • IBM Storage FlashSystem 9200
  • IBM Storage FlashSystem 9500
  • SAN Volume Controller
  • IBM Storage Virtualize for Public Cloud
  • Storwize V3500
  • Storwize V3700
  • Storwize V5000
  • Storwize V7000
  • Storwize V7000 Unified (block storage only)
The following terms are used in the performance metrics for these storage systems:
Stage
To write data from a disk to the cache. The data is not prefetched data.
Destage
To write data from the cache to a disk.
Prestage
To write prefetched data from a disk to the cache.

Storage System

Storage System is divided into the following category:
Table 1. Environmental
Metric Definition
Total Power Consumed It is the total power consumed by all the components of the storage devices including nodes, enclosures, etc. in Watts.
System Temperature (oC) It is the average temperature of the storage device in Celsius.
System Temperature (oF) It is the average temperature of the storage device in Fahrenheit.
Power Efficiency It is the total power consumed by the storage systems in Watts. It denotes how much and how efficiently the power is consumed by the storage devices. The value of power efficiency is the consumption of total power divided by raw capacity bytes of the device.
Note: Sustainability data collection is supported for Call Home with Cloud services connected devices. It is NOT supported for Data Collector (DC) connected devices.

Volume performance metrics

Tip:
Unless otherwise noted, you can view the volume metrics in Table 2, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 11, and Table 12 for the following resources:
  • Host connections
  • I/O groups
  • Nodes
  • Pools
  • Storage systems
  • Volumes
Table 2. Key metrics for volumes
Metric Definition
Data Rate (Read) The average number of MiBs per second that are transferred for read operations.
Data Rate (Write) The average number of MiBs per second that are transferred for write operations.
Data Rate (Unmap)1 The average number of MiBs per second that were unmapped. This metric corresponds to the collected ub statistic.
Data Rate (Total) The average number of MiB per second that are transferred for read, write, and unmap operations.1
Overall Host Attributed Response Time Percentage The percentage of the average response time that can be attributed to delays from host systems. This value includes both read response times and write response times, and can help you diagnose slow hosts and fabrics that are not working efficiently. For read response time, the value is based on the time that it takes for hosts to respond to transfer-ready notifications from the nodes. For write response time, the value is based on the time that it takes for hosts to send the write data after the node responds to a transfer-ready notification.
Overall I/O Rate (Read) The average number of read operations per second. This value includes both sequential and nonsequential read operations.
Overall I/O Rate (Write) The average number of write operations per second. This value includes both sequential and nonsequential write operations.
Overall I/O Rate (Unmap)1 The average number of unmap operations per second. This metric corresponds to the collected uo statistic.
Overall I/O Rate (Total) The average number of nonsequential I/O operations per second. This value includes read, write, and unmap operations.1
Pool Activity Score2 The activity level of pools, which is set to the following value: [Read I/O Rate × (1 Read I/O Cache Hit %)] ÷ Total Pool Capacity
Response Time (Read) The average number of milliseconds to complete a read operation.
Response Time (Write) The average number of milliseconds to complete a write operation.
Response Time (Unmap)1 The average number of milliseconds required to complete an unmap operation. This metric corresponds to the collected ul statistic.
Response Time (Overall) The average number of milliseconds to complete an I/O operation. This value includes both read and write operations.
Transfer Size (Read) The average number of KiB that are transferred per read operation.
Transfer Size (Write) The average number of KiB that are transferred per write operation.
Transfer Size (Overall) The average number of KiB that are transferred per I/O operation. This value includes both read and write operations.
Notes:
  1. This metric applies only to storage systems that are running IBM Storage Virtualize 8.1.1 or later. To view details about collected statistics, see External link icon Starting statistics collection.
  2. This metric is also available when you view the performance of pools.
  3. This metric is only available when you view the performance of volumes.
Table 3. Key metrics for Volume group performance
Metric Definition
Average Recovery Point

It is the average of all recovery points in seconds for the volume group since the last time statistics were collected.

Replicated Blocks

The cumulative number of blocks that this node has replicated for this volume group.

Worst Recovery Point

The maximum recovery time in seconds for the volume group since the last time statistics were collected.

 
Table 4. I/O rate metrics for volumes
Metric Definition
Transfer Rate (Cache-to-Disk)* The average number of sectors or tracks per second that are transferred from the cache to the disks.
Transfer Rate (Disk-to-Cache)* The average number of sectors or tracks per second that are transferred per second from the disks to the cache.
Unaligned Unmap I/O Rate The average number of volume unmap operations per second that are not aligned on an 8K boundary. This metric corresponds to the collected uou statistic. This metric applies only to storage systems that are running IBM Storage Virtualize 8.1.1 or later. To view details about collected statistics, see External link icon Starting statistics collection.
Table 5. Cache hit percentages metrics for volumes
Metric Definition
Overall I/O Cache Hits (Read) The percentage of all read operations that find data in the cache. This value includes both sequential and random read operations, and read operations in the volume cache and volume copy cache where applicable. You can use this value to understand throughput or response times. Low cache-hit percentages can increase response times because in the event of a cache miss, the data must be read from the back-end storage resources.
Overall I/O Cache Hits (Write) The percentage of all write operations that are handled in the cache. This value includes both sequential and random write operations, and write operations in the volume cache and volume copy cache where applicable.
Table 6. Response time metrics for volumes
Metric Definition
Peak Response Time (Read) The worst response time measured for a read operation in the sample interval.
Peak Response Time (Write) The worst response time measured for a write operation in the sample interval.
Peak Response Time (Unmap) The worst response time measured for an unmap operation in the sample interval. This metric corresponds to the collected ulw statistic. This metric applies only to storage systems that are running IBM Storage Virtualize 8.1.1 or later. To view details about collected statistics, see External link icon Starting statistics collection.
Table 7. Remote mirror metrics for volumes
Metric Definition
Global Mirror (Overlapping Write I/O Rate) The average number of overlapping write operations per second that are issued by the Global Mirror primary site. Some overlapping writes are processed in parallel and are excluded from this value.
Global Mirror (Overlapping Write Percentage) The percentage of overlapping write operations that are issued by the Global Mirror primary site. Some overlapping writes are processed in parallel and are excluded from this value.
Global Mirror (Secondary Write Lag) The average number of additional milliseconds that it takes to service each secondary write operation for Global Mirror. This value does not include the time to service the primary write operations.You monitor the value of Global Mirror Secondary Write Lag to identify delays that occurred during the process of writing data to the secondary site.
Global Mirror (Write I/O Rate) The average number of write operations per second that are issued to the Global Mirror secondary site.
Volume cache (VC) metrics are only available for SAN Volume Controller, Storwize, and FlashSystem block storage systems whose firmware version is 7.3 or later.
Tip:

The volume cache is sometimes referred to as upper cache.

Table 8. Volume cache (VC) metrics for volumes
Metric Definition
Cache Hits (Dirty Writes) The percentage of all cache write hits that occur on data that is marked as modified in the volume cache. This value represents how effectively write operations are coalesced before the data is written to disk.
Cache Hits (Read) The percentage of read operations that find data in the volume cache.
Cache Hits (Write) The percentage of cache hits for write operations that are handled in the volume cache.
Fast-Write Write Data Rate The average number of MiB per second that were written to disk in fast-write made in the upper cache. Use this information to help identify the source of back-end overloading, measure the workloads that exit the upper caches, and detect IO amplification in general.
I/O Rate (Destage) The average number of cache-to-disk transfer operations per second that are processed in the volume cache.
I/O Rate (Read) The average number of read operations per second that are processed in the volume cache. This value includes operations that are started by hosts or by remote replication sources.
I/O Rate (Write) The average number of write operations per second that are processed by the volume cache. This value includes operations that are started by hosts or by remote replication sources.
Response Time (Destage) The average number of milliseconds that it took to complete each destage operation in the volume cache. That is, the time that it took to do write operations from the volume cache to the disk.
Response Time (Stage) The average number of milliseconds that it took to complete each stage operation in the volume cache. That is, the time that it took to do read operations from the disk to the volume cache.
Write Delay Percentage (Flush-through) The percentage of write operations that are written to disk in flush-through mode in the volume cache.
Write Delay Percentage (Total Delay) The percentage of I/O operations that are delayed because of space constraints in the write cache, or because of other conditions in the volume cache. The value is a percentage of all operations.
Write Delay Percentage (Write-through) The percentage of write operations that are written to disk in write-through mode in the volume cache.
Write Delay Rate (Flush-through) The average number of tracks per second that are written to disk in flush-through mode in the volume cache.
Write Delay Rate (Total Delay) The average number of I/O operations per second that are delayed. The delay might occur because of space constraints in the write cache, or because of other conditions in the volume cache. The value is an average of all operations.
Write Delay Rate (Write-through) The average number of sectors per second that are written to disk in write-through mode in the volume cache.
Volume copy cache (VCC) metrics are only available for SAN Volume Controller, Storwize, and FlashSystem block storage systems whose firmware version is 7.3 or later.
Tip: The volume copy cache is sometimes referred to as lower cache.
Table 9. Volume copy cache (VCC) metrics for volumes
Metric Definition
Cache Hits (Dirty Writes) The percentage of all cache write hits that occur on data that is marked as modified in the volume copy cache. This value represents how effectively write operations are coalesced before the data is written to disk.
Cache Hits (Read-ahead)* The percentage of all read cache hits that occur on pre-staged data.
Cache Hits (Read) The percentage of read operations that find data in the volume copy cache.
Cache Hits (Write) The percentage of cache hits for write operations that are handled in the volume copy cache.
Fast Write Data Rate The average number of MiB per second that were written to disk in fast-write made in the lower cache. Use this information to help identify the source of back-end overloading, measure the workloads that exit the lower copy caches, and detect IO amplification in general.
I/O Rate (Destage) The average number of cache-to-disk transfer operations per second that are processed in the volume copy cache.
I/O Rate (Prestage) The average number of prefetch disk-to-cache transfer operations per second that are processed in the volume copy cache.
I/O Rate (Read) The average number of read operations per second that are processed in the volume copy cache. This value includes read operations that are associated with FlashCopy services, volume mirroring, and other internal processes. This value might also include some operations that are passed from the volume cache.
I/O Rate (Write) The average number of write operations per second that are processed by the volume copy cache. This value includes read operations that are associated with FlashCopy services, volume mirroring, and other internal processes. This value might also include some operations that are passed from the volume cache.
Response Time (Destage) The average number of milliseconds that it took to complete each destage operation in the volume copy cache. That is, the time that it took to do write operations from the volume copy cache to the disk.
Response Time (Prestage) The average number of milliseconds that it took to complete each prestage operation in the volume copy cache. That is, the time that it took to prefetch data from the disk into the volume copy cache.
Response Time (Stage) The average number of milliseconds that it took to complete each stage operation in the volume copy cache. That is, the time that it took to do read operations from the disk to the volume copy cache.
Transfer Rates (Cache-to-Disk) The average number of sectors that are transferred per second from the volume copy cache to the disks.
Transfer Rates (Disk-to-Cache) The average number of sectors that are transferred per second from the disks to the volume copy cache.
Write Delay Percentage (Flush-through) The percentage of write operations that are written to disk in flush-through mode in the volume copy cache.
Write Delay Percentage (Total Delay) The percentage of I/O operations that are delayed because of space constraints in the write cache, or because of other conditions in the volume copy cache. The value is a percentage of all operations.
Write Delay Percentage (Write-through) The percentage of write operations that are written to disk in write-through mode in the volume copy cache.
Write Delay Rate (Flush-through) The average number of sectors per second that are written to disk in flush-through mode in the volume copy cache.
Write Delay Rate (Total Delay) The average number of I/O operations per second that are delayed. The delay might occur because of space constraints in the write cache, or because of other conditions in the volume copy cache. The value is an average of all operations.
Write Delay Rate (Write-through) The average number of sectors per second that are written to disk in write-through mode in the volume copy cache.
Note: *This metric is only available for SAN Volume Controller, Storwize, and FlashSystem block storage systems whose firmware version is 7.4 or later.
Note:
Unless otherwise noted, you can view the volume metrics in Table 10 for the following resources:
  • Nodes
  • I/O groups
  • Storage systems
Table 10. Compression metrics for volumes
Metric Definition
Compressed Volumes I/O Rate The average number of all read and write operations per second for compressed volumes.
Compressed Volumes Data Rate The average number of MiB per second that were read from or written to compressed volumes.
Compressed Volumes Response Time The average number of milliseconds to complete an I/O operation for compressed volumes. This value includes both read and write operations.
Uncompressed Volumes I/O Rate The average number of all read and write operations per second for uncompressed volumes.
Uncompressed Volumes Data Rate The average number of MiB per second that were read from or written to uncompressed volumes.
Uncompressed Volumes Response Time The average number of milliseconds to complete an I/O operation for uncompressed volumes. This value includes both read and write operations.
Tip:
Unless otherwise noted, you can view the volume metrics in Table 11 for the following resources:
  • Nodes
  • I/O groups
  • Host connections
  • Storage systems
Table 11. Miscellaneous metrics for volumes
Metric Definition
Cache to Host Transfer Response Time1 The average number of milliseconds that is taken to transfer a track from the cache to the host, including any queuing time that occurs because of throttling.
Non-Preferred Node Usage Percentage2 The overall percentage of I/O operations that are not directed against the preferred node for each volume in an I/O Group. There is a small performance penalty when I/O does not go to the preferred node for each volume.
Notes:
  1. The metric is only available for SAN Volume Controller, Storwize, and FlashSystem block storage systems whose firmware version is 7.3 or later.
  2. This metric is only available when you view the performance of volumes, I/O groups, and host connections.

Legacy cache metrics are only available for SAN Volume Controller and Storwize block storage systems whose firmware version is earlier than 7.3.

Table 12. Legacy cache metrics for volumes
Metric Definition
Dirty Write Percentage of Cache Hits The percentage of all cache write hits that occur on data in the cache that is marked as modified. This value represents how effectively write operations are coalesced before the data is written to disk. This value applies only to resources that are running a version of IBM Storage Virtualize earlier than 7.3.
Start of change

Volume group performance metrics

The following performance metrics are available for volume groups:

Table 13. Replication metrics for volume groups
Metric Definition
Average Recovery Point It is the average of all recovery points in seconds for the volume group since the last time statistics were collected.
Replicated Blocks The cumulative number of blocks that this node has replicated for this volume group.
   
Worst Recovery Point The maximum recovery time in seconds for the volume group since the last time statistics were collected.
Replicated Writes The cumulative number of replication writes to the target sent by this node for this volume group.
End of change

Disk performance metrics

Disk performance metrics are divided into the following categories:
Unless otherwise noted, you can view disk metrics for the following resources:
  • Managed disks
  • Pools
  • Nodes
  • I/O Groups
  • Storage systems
Restriction: Performance metadata for managed disks in IBM Storage Virtualize for Public Cloud is not yet available.
Table 14. Key metrics for disks
Metric Definition
Data Rate (Read) The average number of MiB per second that are read from the back-end storage resources.
Data Rate (Write) The average number of MiB per second that are written to the back-end storage resources.
Data Rate (Total) The average rate at which data is transmitted between the back-end storage resources and the component. The rate is measured in MiB per second and includes both read and write operations.
I/O Rate (Read)1 The average number of read operations per second that are issued to the back-end storage resources.
I/O Rate (Write)2 The average number of write operations per second that are issued to the back-end storage resources.
I/O Rate (Total)3 The average number of I/O operations per second that are transmitted between the back-end storage resources and the component. This value includes both read and write operations.
Response Time (Read) The average number of milliseconds for the back-end storage resources to respond to a read operation.
Response Time (Write) The average number of milliseconds for the back-end storage resources to respond to a write operation.
Response Time (Overall) The average number of milliseconds for the back-end storage resources to respond to a read or a write operation.
Notes:
  1. The performance metrics for I/O Rate (Read) are available for pools, nodes, I/O groups, and storage systems.
  2. The performance metrics for I/O Rate (Write) are available for pools, nodes, I/O groups, and storage systems.
  3. The performance metrics for I/O Rate (Total) are available for pools, nodes, I/O groups, and storage systems.
Table 15. Response time metrics for disks
Metric Definition
Queue Time (Read) The average number of milliseconds that a read operation spends in the queue before the operation is sent to the back-end storage resources.
Queue Time (Write) The average number of milliseconds that a write operation spends in the queue before the operation is sent to the back-end storage resources.
Queue Time (Overall) The average number of milliseconds that a read or a write operation spends in the queue before the operation is sent to the back-end storage resources.
Peak Back-end Queue Time (Read) The longest time that a read operation spends in the queue before the operation is sent to the back-end storage resources.
Peak Back-end Queue Time (Write) The longest time that a write operation spends in the queue before the operation is sent to the back-end storage resources.
Peak Back-end Response Time (Read) The longest time for a back-end storage resource to respond to a read operation.
Peak Back-end Response Time (Write) The longest time for a back-end storage resource to respond to a write operation by a node.
Table 16. Miscellaneous metrics for disks
Metric Definition
Transfer Size (Read) The average number of KiB that are transferred per read operation from the back-end storage resources.
Transfer Size (Write) The average number of KiB that are transferred per write operation to the back-end storage resources.
Transfer Size (Overall) The average number of KiB that are transferred per I/O operation. This value includes both read and write operations.

Pool performance metrics

Key performance metrics are available for pools.

Unless otherwise noted, you can view pool metrics for the following resources:
  • Pools
Table 17. Key metrics for pools
Metric Definition
Max Write Cache Fullness* The maximum amount of the lower cache that the write cache partitions on the nodes that manage the pool are using for write operations. If the value is 100%, one or more cache partitions on one or more pools is full. The operations that pass through the pools with full cache partitions will be queued and I/O response times will increase for the volumes in the affected pools.
Write Cache Fullness* The average amount of the lower cache that the pools’ write cache partitions on the nodes are using for write operations. Monitor average cache fullness to identify the pools that are experiencing heavy cache usage.
Note: *This cache fullness metric applies to systems that are running IBM Storage Virtualize 7.3 or later.
Important: When IBM Storage Insights monitors an IBM Storage Virtualize storage system that employs volume mirroring, you cannot see volume-aggregated performance data that is displayed in IBM Storage Insights for the storage pools that contain the volume copies. You can see volume-aggregated performance data that is displayed in IBM Storage Insights for the storage pools that contain the primary volumes. See the following figure. This discrepancy is due to IBM Storage Insights reflecting only the I/O between the host and the Copy-1 volume, which is mapped to the host. To avoid presenting redundant and misleading performance data for the IBM Storage Virtualize storage system, IBM Storage Insights does not reflect the I/O work that is done by the mirror function code within the IBM Storage Virtualize storage system to update the mirror Copy-2 volume, which is not mapped to the host.
Figure 1. Storage system with volume mirroring
volume mirroring

FC Port performance metrics

Unless otherwise noted, you can view port metrics for the following resources:
  • Ports
  • Nodes
  • I/O Groups
  • Storage systems
Table 18. Key metrics for FC Ports
Metric Definition
I/O Rate (Receive) The average number of I/O operations per second for operations in which the port receives data.
I/O Rate (Send) The average number of I/O operations per second for operations in which data is sent from a port.
Data Rate (Receive) The average rate at which data is received by the port. The rate is measured in MiB per second.
Data Rate (Send) The average rate at which data is sent from the port. The rate is measured in MiB per second.
Data Rate (Total) The average rate at which data is transferred through the port. The rate is measured in MiB per second and includes both send and receive operations.
Bandwidth (Receive) The percentage of the port bandwidth that is used for receive operations. This value is an indicator of port bandwidth usage that is based on the speed of the port.
Bandwidth (Send) The percentage of the port bandwidth that is used for send operations. This value is an indicator of port bandwidth usage that is based on the speed of the port.
Bandwidth (Overall) The percentage of the port bandwidth that is used for send and receive operations. This value is an indicator of port bandwidth usage that is based on the speed of the port.
Table 19. I/O rate metrics for FC Ports
Metric Definition
Port-to-Disk I/O Rate (Receive) The average number of exchanges per second that are received from back-end storage resources.
Port-to-Disk I/O Rate (Send) The average number of IOs per second that are sent from the storage system to the back-end storage it is virtualizing. Use this metric to help measure the rate of data that is sent to back-end storage.
Port-to-Host I/O Rate (Receive) The average number of IOs per second that are received by the storage system from the hosts that are accessing its storage. Use this metric to help measure host workload against the storage system.
Port-to-Host I/O Rate (Send) The average number of IOs per second that are sent by the storage system to the hosts that are accessing its storage. Use this metric to help measure host workload against the storage system.
Port-to-Local Node I/O Rate (Receive) The average number of IOs per second that are received from other nodes within the local cluster.  Use this metric to understand the rate of inter-cluster communication.
Port-to-Local Node I/O Rate (Send) The average number of IOs per second that are sent to other nodes within the local cluster.  Use this metric to understand the rate of inter-cluster communication.
Port-to-Local Node I/O Rate (Total) The average number of IOs per second that are transmitted between the resource and other nodes within the local cluster.  Use this metric to understand the rate of inter-cluster communication.
Port-to-Remote Node I/O Rate (Receive) The average number of IOs per second that are received from nodes that are in a remote cluster.  Use this metric to understand the amount of remote replication workload.
Port-to-Remote Node I/O Rate (Send) The average number of IOs per second that are sent to nodes that are in a remote cluster.  Use this metric to understand the amount of remote replication workload.
Port-to-Remote Node I/O Rate (Total) The average number of IOs per second that are transmitted between the resource and nodes that are in a remote cluster.  Use this metric to understand the amount of remote replication workload.
Table 20. Data rate metrics for FC Ports
Metric Definition
Port-to-Disk Data Rate (Receive) The average rate at which data is received from back-end storage resources. The rate is measured in MiB per second.
Port-to-Disk Data Rate (Send) The average rate at which data is sent to back-end storage resources. The rate is measured in MiB per second.
Port-to-Disk Data Rate (Total) The average rate at which data is transmitted between back-end storage resources and the component. The rate is measured in MiB per second and includes both send and receive operations.
Port-to-Host Data Rate (Receive) The average rate at which data is received from host computers. The rate is measured in MiB per second.
Port-to-Host Data Rate (Send) The average rate at which data is sent to host computers. The rate is measured in MiB per second.
Port-to-Host Data Rate (Total) The average rate at which data is transmitted between host computers and the component. The rate is measured in MiB per second and includes both send and receive operations.
Port-to-Local Node Data Rate (Receive) The average rate at which data is received from other nodes that are in the local cluster. The rate is measured in MiB per second.
Port-to-Local Node Data Rate (Send) The average rate at which data is sent to other nodes that are in the local cluster. The rate is measured in MiB per second.
Port-to-Local Node Data Rate (Total) The average rate at which data is transmitted between the component and other nodes that are in the local cluster. The rate is measured in MiB per second.
Port-to-Remote Node Data Rate (Receive) The average rate at which data is received from nodes that are in the remote cluster. The rate is measured in MiB per second.
Port-to-Remote Node Data Rate (Send) The average rate at which data is sent to nodes that are in the remote cluster. The rate is measured in MiB per second.
Port-to-Remote Node Data Rate (Total) The average rate at which data is transmitted between the component and nodes that are in the remote cluster. The rate is measured in MiB per second.
Table 21. Response time metrics for FC Ports
Metric Definition
Port-to-Local Node Response Time (Receive) The average number of milliseconds to complete a receive operation from another node that is in the local cluster. This value represents the external response time of the transfers.
Port-to-Local Node Response Time (Send) The average number of milliseconds to complete a send operation to another node that is in the local cluster. This value represents the external response time of the transfers.
Port-to-Local Node Response Time (Overall) The average number of milliseconds to complete a send or receive operation with another node that is in the local cluster. This value represents the external response time of the transfers.
Port-to-Remote Node Response Time (Receive) The average number of milliseconds to complete a receive operation from a node that is in the remote cluster. This value represents the external response time of the transfers.
Port-to-Remote Node Response Time (Send) The average number of milliseconds to complete a send operation to a node that is in the remote cluster. This value represents the external response time of the transfers.
Port-to-Remote Node Response Time (Overall) The average number of milliseconds to complete a send operation to, or a receive operation from a node in the remote cluster. This value represents the external response time of the transfers.
Metrics availability restrictions: The response time metrics are available for nodes, I/O groups, and storage systems.
Table 22. Error rate metrics for FC Ports
Metric Definition
CRC Error Rate The average number of frames per second that are received in which a cyclic redundancy check (CRC) error is detected. A CRC error is detected when the CRC in the transmitted frame does not match the CRC computed by the receiver. For Brocade switches, this metric includes only the CRC Errors with a good end-of-frame (EOF) indicator.
Link Errors (Invalid Link Transmission Rate) The average number of bit errors per second that are detected.
Link Errors (Invalid Transmission Word Rate) The average number of bit errors per second that are detected.
Link Errors (Link Failures) The average number of miscellaneous fibre channel link errors per second for ports. Link errors might occur when an unexpected Not Operational (NOS) is received or a link state machine failure was detected.
Link Errors (Primitive Sequence Protocol Error Rate) The average number of primitive sequence protocol errors per second that are detected. This error occurs when there is a link failure for a port.
Link Errors (Signal Loss) The average number of times per second at which the port lost communication with its partner port. These types of errors usually indicate physical link problems, caused by faulty SFP modules or cables, or caused by faulty connections at the switch or patch panel. However, in some cases, this error can also occur when the maximum link distance between ports is exceeded, for the type of connecting cable and light source.
Link Errors (Sync Loss) The average number of times per second that the port lost synchronization with its partner port. These types of errors usually indicate physical link problems, caused by faulty SFP modules or cables, or caused by faulty connections at the switch or patch panel. However in some cases this can also occur due to mismatching port speeds between the partner ports, when auto-negotiation of link speed is disabled.
Port Congestion Index* The estimated degree to which frame transmission was delayed due to a lack of buffer credits. This value is generally 0 - 100. The value 0 means there was no congestion. The value can exceed 100 if the buffer credit exhaustion persisted for an extended amount of time. When you troubleshoot a SAN, use this metric to help identify port conditions that might slow the performance of the resources to which those ports are connected.
Port Protocol Errors (Port Send Delay Time) The average number of milliseconds of delay that occur on the port for each send operation. The reason for these delays might be a lack of buffer credits. You cannot view zero buffer credit performance metrics for 16 Gbps Fibre Channel ports on resources that run IBM Storage Virtualize. Use the Port Send Delay Time metric if the Zero Buffer Credit Timer metric is not available.
Port Protocol Errors (Port Send Delay I/O Percentage) The percentage of send operations where a delay occurred, relative to the total number of send operations that were measured for the port. Use this metric with the Port Send Delay Time metric to distinguish a few long delays from many short delays.
Port Protocol Errors (Zero Buffer Credit Percentage) The amount of time, as a percentage, that the port was not able to send frames between ports because of insufficient buffer-to-buffer credit. The amount of time value is measured from the last time that metadata was collected. In Fibre Channel technology, buffer-to-buffer credit is used to control the flow of frames between ports.
Port Protocol Errors (Zero Buffer Credit Timer) The number of microseconds that the port is not able to send frames between ports because there is insufficient buffer-to-buffer credit. In Fibre Channel technology, buffer-to-buffer credit is used to control the flow of frames between ports. Buffer-to-buffer credit is measured from the last time that metadata was collected. If this metric is not available, use the Port Send Delay Time metric instead.
Total Physical Port Error Rate (cnt/s) The sum of all the physical error rates such as Error Frames, CRC Errors, Short Frames, and Link Failures that are detected on the storage system port.
Total Physical Port Error Rate is the sum of the following physical error rates:
  • Error Frame Rate
  • CRC Error Rate
  • Short Frame Rate
  • Long Frame Rate
  • Bad EOF CRC Error Rate
  • Link Failure Rate
  • Loss of Sync Rate
  • Loss of Signal Rate
  • Primitive Sequence Protocol Error Rate
  • Invalid Word Transmission Rate
Total Logical Port Error Rate (cnt/s) The sum of all the logical error rates such as F-BSY Frames, F-BSY Frames, Discarded Frames, and Encoding Disparity that are detected on the storage system port.
Total Logical Port Error Rate is the sum of the following logical error rates:
  • F-BSY Frame Rate
  • F-RJT Frame Rate
  • Discarded Class 3 Frame Rate
  • Discarded Frame Rate
  • Link Reset Transmitted Rate
  • Link Reset Received Rate
  • Class 3 Send Timeout Frame Rate
  • Class 3 Receive Timeout Frame Rate
  • Encoding Disparity
Note: *The performance metric for Port Congestion Index is only available for ports.
Table 23. Power metrics for FC Ports
Metric Definition
SFP Temperature The temperature of the small form-factor pluggable (SFP) transceiver plugged into a physical port in degrees Celsius (℃). Use this metric to watch for fluctuating and high temperatures of an SFP to monitor its environmental health.
Tx Power The power in micro watts (µW) at which the SFP transmits its signal. Use this metric to monitor the transmit power (in microwatts) of an SFP to ensure that it's within the normal operating range and is not causing link instability and degraded performance.
Rx Power The power in micro watts (µW) at which the SFP receives a signal. Use this metric to monitor the receive power (in microwatts) of an SFP to ensure that it's within the normal operating range and is not causing link instability and degraded performance.
Tip: These power metrics for ports are available for storage systems that run IBM Storage Virtualize 8.4.0 or later.
Table 24. Miscellaneous metrics for FC Ports
Metric Definition
Port-to-Local Node Queue Time (Receive) The average time in milliseconds that a receive operation spends in the queue before the operation is processed. This value represents the queue time for receive operations that are issued from other nodes that are in the local cluster.
Port-to-Local Node Queue Time (Send) The average time in milliseconds that a send operation spends in the queue before the operation is processed. This value represents the queue time for send operations that are issued to other nodes that are in the local cluster.
Port-to-Local Node Queue Time (Overall) The average number of milliseconds that a send or receive operation spends in the queue before the operation is processed. This value is for send and receive operations that are issued between the component and other nodes that are in the local cluster.
Port-to-Remote Node Queue Time (Receive) The average time in milliseconds that a receive operation spends in the queue before the operation is processed. This value represents the queue time for receive operations that are issued from a node that is in the remote cluster.
Port-to-Remote Node Queue Time (Send) The average time in milliseconds that a send operation spends in the queue before the operation is processed. This value represents the queue time for send operations that are issued to a node that is in the remote cluster.
Port-to-Remote Node Queue Time (Overall) The average number of milliseconds that a send or receive operation spends in the queue before the operation is processed. This value is for send and receive operations that are issued between the component and a node that is in the remote cluster.

IP Workload performance metrics

You can view the following metrics for IP Workload.

Table 25. IP Workload metrics for ports
Metric Definition
IP Replication Compressed Data Rate (Send) Average number of mebibytes per second that are transmitted after any compression (if active).
IP Replication Compressed Data Rate (Receive) Average number of mebibytes per second that are received before any decompression.
IP Replication-to-Remote Node Data Rate (Send) Average number of mebibytes per second that are transferred to other nodes in other clusters by the IP partnership driver. The rate is measured in MiB/second. Use this metric to measure the transferred data rate between the IP Replication to Remote Node.
IP Replication-to-Remote Node Data Rate (Receive) Average number of mebibytes per second that are received from other nodes in other clusters by the IP partnership driver. The rate is measured in MiB/second. Use this metric to measure the Received data rate between the IP Replication to Remote Node.
IP Replication-to-Remote Node Data Rate (Total) Average number of mebibytes per second that are re-transferred to other nodes in other clusters by the IP partnership driver. The rate is measured in MiB/second. Use this metric to measure the total data rate between the IP Replication to Remote Node.
IP Replication Latency Average round-trip time for the IP partnership link since the last statistics collection period. The rate is measured in milliseconds.
IP Replication Transfer Size (Send) Average number of mebibytes that are transferred by the IP partnership driver since the last statistics collection period. The rate is measured in MiB. Use this metric to measure the data size that is transferred by IP partnership driver.
IP Replication Transfer Size (Receive) Average number of mebibytes that are received by the IP partnership driver since the last statistics collection period. The rate is meas yer in MiB. Use this metric to measure the data size that is received by IP partnership driver.
IP Replication Transfer Size (Total) Total number of mebibytes that are transferred by the IP partnership driver since the last statistics collection period. The rate is measured in MiB. Use this metric to measure the total data size that is transferred by IP partnership driver.

Node performance metrics

Unless otherwise noted, you can view node metrics for the following resources:
  • Nodes
  • I/O Groups
  • Storage systems
Table 26. Metrics for nodes
Metric Definition
Compression CPU Utilization (Core 1 to Core 28) The approximate percentage of time that a processor core was busy with data compression tasks. The performance of each core is shown with a separate metric. Note that the value for this metric will be zero or close to zero if compression accelerator hardware is installed in the nodes.
CPU Utilization (Compression CPU) The average percentage of time that the processors used for data compression I/O tasks are busy.
CPU Utilization (System CPU) The average percentage of time that the processors on nodes are busy doing system I/O tasks.
Data Movement Rate (MiBs)1 The capacity, in MiBs per second, of the valid data in a reclaimed volume extent that garbage collection has moved to a new extent in the data reduction pool on the node. The valid data must be moved so that the whole extent can be freed up or reused to write new data. This metric corresponds to the collected mm statistic.
Data Rewrite Rate (MiBs)1 The rate, in MiBs per second, at which data is rewritten when a host overwrites data in data reduction pools on the node. The new version of the host data is written to a different location so that the capacity that was used by the previous version of the host data can be freed up and reclaimed. This metric corresponds to the collected cm statistic. You can view this metric for nodes only.
Extent Collection Rate (cnt/s)1 The number of volume extents per second that were processed for garbage collection. The reclaimable capacity in the volume extents is collected so that it can be reused in the data reduction pools on the node. This metric corresponds to the collected ext col statistic. You can view this metric for nodes only.
Logical Data Rate (Sent) The average number of logical mebibytes per second (both local node and remote node) that are sent to other nodes.
Logical Data Rate (Receive) The average number of logical mebibytes per second (both local node and remote node) that are received from other nodes.
Logical Data Rate (Total) The average number of logical mebibytes per second (both local node and remote node) that are transmitted between nodes.
Max Read Cache Fullness (%)2 The maximum amount of the lower cache which the cache partitions of the pools that are managed by the node are using for read operations. If the maximum value for the cache is 100%, the read cache partition for one or more of the pools is full. The read operations that pass through the node to the affected pools will be queued and the I/O response times will increase for the volumes in the affected pools. This metric corresponds to the collected lower cache rfmx statistic.
Max Write Cache Fullness (%)2 The maximum amount of the lower cache which the cache partitions of the pools that are managed by the node are using for write operations. If the maximum value for the cache is 100%, the write cache partition for one or more of the pools is full. The write operations that pass through the node to the affected pools will be queued and the I/O response times will increase for the volumes in the affected pools. This metric corresponds to the collected lower cache wfmx statistic.
New Address Write Rate (MiBs)1 The capacity in MiBs per second that was used to write the host's data to unallocated addresses in the data reduction pool on the node. Review this metric to determine which hosts are increasing the amount of capacity that is being written to data reduction pools on a node. This metric corresponds to the collected nm statistic. You can view this metric for nodes only.
Node Utilization by Node The average of the bandwidth percentages of those ports in the node that are actively used for host and MDisk send and receive operations. The average is weighted by port speed and adjusted according to the technology limitations of the node hardware.
Node-to-Local Node Physical Data Rate (Send) The average number of mebibytes per second that are sent to other nodes in the local cluster.
Node-to-Local Node Physical Data Rate (Receive) The average number of mebibytes per second that are received from other nodes in the local cluster.
Node-to-Local Node Physical Data Rate (Total) The average number of mebibytes per second that are transmitted between nodes in the local cluster.
Node-to-Local Node Logical Data Rate (Send) The average number of logical mebibytes per second that are sent to the other nodes in the local cluster.
Node-to-Local Node Logical Data Rate (Receive) The average number of logical mebibytes per second that are received from the other nodes in the local cluster.
Node-to-Local Node Logical Data Rate (Total) The average number of logical mebibytes per second that are transmitted between nodes in the local cluster.
Node-to-Remote Node Logical Data Rate (Send) The average number of logical mebibytes per second that are sent to nodes in the remote cluster
Node-to-Remote Node Logical Data Rate (Receive) The average number of logical mebibytes per second that are received from nodes in the remote cluster.
Node-to-Remote Node Logical Data Rate (Total) The average number of logical mebibytes per second that are transmitted between nodes in the remote cluster.
Node-to-Remote Node Physical Data Rate (Send) The average number of mebibytes per second that are sent to nodes in the remote cluster.
Node-to-Remote Node Physical Data Rate (Receive) The average number of mebibytes per second that are received from nodes in the remote cluster.
Node-to-Remote Node Physical Data Rate (Total) The average number of mebibytes per second that are transmitted between nodes in the remote cluster.
Physical Data Rate (Send) The average number of physical mebibytes per second that are sent to other nodes.
Physical Data Rate (Receive) The average number of physical mebibytes per second that are received from other nodes.
Physical Data Rate (Total) The average number of Physical mebibytes that are transferred per second.
Read Cache Fullness (%)2 The average amount of the lower cache which the cache partitions of the pools that are managed by the node are using for read operations. Monitor the average cache fullness for read operations to identify the nodes that are experiencing heavy cache usage. This metric corresponds to the collected lower cache rfav statistic.
Reclaimable Capacity (MiB)1 The capacity that can be reclaimed in the data reduction pools on the node. This metric corresponds to the collected rec statistic. You can view this metric for nodes only.
Recovered Capacity Rate (MiBs)1 The capacity in number of MiBs per second that was recovered by garbage collection for reuse in the data reduction pools on the node. This metric corresponds to the collected rm statistic. You can view this metric for nodes only.
System CPU Utilization (Core 1 to Core 48) The approximate percentage of time that a processor core was busy with system I/O tasks. The performance of each core is shown with a separate metric.
Write Cache Fullness (%)2 The average amount of the lower cache which the cache partitions of the pools that are managed by the node are using for write operations. Monitor the average cache fullness for write operations to identify the nodes that are experiencing heavy cache usage. This metric corresponds to the collected lower cache wfav statistic.
Journal Pages Resource Used The average periodic percentage of journal resources across all threads, which is used for replicating data.
Journal Pages Resources Used Worst The highest percentage of journal resources used across all threads.
Notes:
  1. This garbage collection metric applies to systems that are running IBM Storage Virtualize 8.1.2 or later. To view details about collected statistics, see External link icon Starting statistics collection.
  2. This cache fullness metric applies to systems that are running IBM Storage Virtualize 7.3 or later. To view details about collected statistics, see External link icon Starting statistics collection.