List of performance metrics

The performance monitoring tool can report the following metrics:

Network and general

All network and general metrics are native. There are no computed metrics in this section.

CPU

This section lists information about CPU in the system

cpu_contexts: Number of context switches across all CPU cores.
cpu_guest: Percentage of total CPU spent running a guest OS. Included in cpu_user.
cpu_guest_nice: Percentage of total CPU spent running as nice guest OS. Included in cpu_nice.
cpu_hiq: Percentage of total CPU spent serving hardware interrupts.
cpu_idle: Percentage of total CPU spent idling.
cpu_interrupts: Number of interrupts serviced.
cpu_iowait: Percentage of total CPU spent waiting for I/O to complete.
cpu_nice: Percentage of total CPU time spent in lowest-priority user processes.
cpu_siq: Percentage of total CPU spent serving software interrupts.
cpu_steal: Percentage of total CPU spent waiting for other OS when running in a virtualized environment.
cpu_system: Percentage of total CPU time spent in kernel mode.
cpu_user: Percentage of total CPU time spent in normal priority user processes.

DiskFree

This section contains information about the free disk. Each mounted directory will have a separate section, for example DiskFree|/boot/df_free

df_free: Amount of free disk space on the file system
df_total: Amount of total disk space on the file system
df_used: Amount of used disk space on the file system

Diskstat

This section contains Disk status information for each of the disks. For example, Diskstat|sda|disk_active_ios.

disk_active_ios: Number of I/O operations currently in progress.
disk_aveq: Weighted number of milliseconds spent doing I/Os.
disk_io_time: Number of milliseconds the system spent doing I/O operation.
disk_read_ios: Total number of read operations completed successfully.
disk_read_merged: Number of (small) read operations that have been merged into a larger read.
disk_read_sect: Number of sectors read.
disk_read_time: Amount of time in milliseconds spent reading.
disk_write_ios: Number of write operations completed successfully.
disk_write_merged: Number of (small) write operations that have been merged into a larger write.
disk_write_sect: Number of sectors written.
disk_write_time: Amount of time in milliseconds spent writing.

Load

jobs: The total number of jobs that currently exist in the system.
load1: The average load (number of jobs in the run queue) over the last minute.
load15: The average load (number of jobs in the run queue) over the last 15 minutes.
load5: The average load (number of jobs in the run queue) over the five minutes.

Memory

mem_active: Active memory that was recently accessed.
mem_active_anon: Active memory with no file association, that is, heap and stack memory.
mem_active_file: Active memory that is associated with a file, for example, page cache memory.
mem_buffers: Temporary storage used for raw disk blocks.
mem_cached: In-memory cache for files read from disk (the page cache). Does not include mem_swapcached.
mem_dirty: Memory which is waiting to get written back to the disk.
mem_inactive: Inactive memory that hasn't been accessed recently.
mem_inactive_anon: Inactive memory with no file association, that is, inactive heap and stack memory.
mem_inactive_file: Inactive memory that is associated with a file, for example, page cache memory.
mem_memfree: Total free RAM.
mem_memtotal: Total usable RAM.
mem_mlocked: Memory that is locked.
mem_swapcached: In-memory cache for pages that are swapped back in.
mem_swapfree: Amount of swap space that is currently unused.
mem_swaptotal: Total amount of swap space available.
mem_unevictable: Memory that cannot be paged out.

Netstat

ns_closewait: Number of connections in state TCP_CLOSE_WAIT
ns_established: Number of connections in state TCP_ESTABLISHED
ns_listen: Number of connections in state TCP_LISTEN
ns_local_bytes_r: Number of bytes received (local -> local)
ns_local_bytes_s: Number of bytes sent (local -> local)
ns_localconn: Number of local connections (local -> local)
ns_remote_bytes_r: Number of bytes sent (local -> remote)
ns_remote_bytes_s: Number of bytes sent (remote -> local)
ns_remoteconn: Number of remote connections (local -> remote)
ns_timewait: Number of connections in state TCP_TIME_WAIT

Network

netdev_bytes_r: Number of bytes received.
netdev_bytes_s: Number of bytes sent.
netdev_carrier: Number of carrier loss events.
netdev_collisions: Number of collisions.
netdev_compressed_r: Number of compressed frames received.
netdev_compressed_s: Number of compressed packets sent.
netdev_drops_r: Number of packets dropped while receiving.
netdev_drops_s: Number of packets dropped while sending.
netdev_errors_r: Number of read errors.
netdev_errors_s: Number of write errors.
netdev_fifo_r: Number of FIFO buffer errors.
netdev_fifo_s: Number of FIFO buffer errors while sending.
netdev_frames_r: Number of frame errors while receiving.
netdev_multicast_r: Number of multicast packets received.
netdev_packets_r: Number of packets received.
netdev_packets_s: Number of packets sent.

GPFS

GPFSDisk

For each NSD in the system, for example GPFSDisk|myMachine|myFilesystem|myNSD|gpfs_ds_bytes_read

gpfs_ds_bytes_read: Number of bytes read.
gpfs_ds_bytes_written: Number of bytes written.
gpfs_ds_max_disk_wait_rd: The longest time spent waiting for a disk read operation.
gpfs_ds_max_disk_wait_wr: The longest time spent waiting for a disk write operation.
gpfs_ds_max_queue_wait_rd: The longest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_ds_max_queue_wait_wr: The longest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_ds_min_disk_wait_rd: The shortest time spent waiting for a disk read operation.
gpfs_ds_min_disk_wait_wr: The shortest time spent waiting for a disk write operation.
gpfs_ds_min_queue_wait_rd: The shortest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_ds_min_queue_wait_wr: The shortest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_ds_read_ops: Number of read operations.
gpfs_ds_tot_disk_wait_rd: The total time in seconds spent waiting for disk read operations.
gpfs_ds_tot_disk_wait_wr: The total time in seconds spent waiting for disk write operations.
gpfs_ds_tot_queue_wait_rd: The total time spent between being enqueued for a read operation and the completion of that operation.
gpfs_ds_tot_queue_wait_wr: The total time spent between being enqueued for a write operation and the completion of that operation.
gpfs_ds_write_ops: Number of write operations.

GPFSFileset

For each independent fileset in the file system: Cluster name - GPFSFileset - filesystem name - fileset name.
For example: myCluster|GPFSFileset|myFilesystem|gpfs_fset_maxInodes.

gpfs_fset_maxInodes: Maximum number of inodes for this independent fileset.
gpfs_fset_allocInodes: Number of free inodes available for this independent fileset..
gpfs_fset_allocInodes: Number of inodes allocated for this independent fileset.

GPFSFileSystem

For each file system, for example GPFSFileSystem

gpfs_fs_bytes_read: Number of bytes read.
gpfs_fs_bytes_written: Number of bytes written.
gpfs_fs_disks: Number of disks in the file system.
gpfs_fs_max_disk_wait_rd: The longest time spent waiting for a disk read operation.
gpfs_fs_max_disk_wait_wr: The longest time spent waiting for a disk write operation.
gpfs_fs_max_queue_wait_rd: The longest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_fs_max_queue_wait_wr: The longest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_fs_min_disk_wait_rd: The shortest time spent waiting for a disk read operation.
gpfs_fs_min_disk_wait_wr: The shortest time spent waiting for a disk write operation.
gpfs_fs_min_queue_wait_rd: The shortest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_fs_min_queue_wait_wr: The shortest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_fs_read_ops: Number of read operations
gpfs_fs_tot_disk_wait_rd: The total time in seconds spent waiting for disk read operations.
gpfs_fs_tot_disk_wait_wr: The total time in seconds spent waiting for disk write operations.
gpfs_fs_tot_queue_wait_rd: The total time spent between being enqueued for a read operation and the completion of that operation.
gpfs_fs_tot_queue_wait_wr: The total time spent between being enqueued for a write operation and the completion of that operation.
gpfs_fs_write_ops: Number of write operations.

GPFSFileSystemAPI

gpfs_fis_bytes_read: Number of bytes read.
gpfs_fis_bytes_written: Number of bytes written.
gpfs_fis_close_calls: Number of close calls.
gpfs_fis_disks: Number of disks in the file system.
gpfs_fis_inodes_written: Number of inode updates to disk.
gpfs_fis_open_calls: Number of open calls.
gpfs_fis_read_calls: Number of read calls.
gpfs_fis_readdir_calls: Number of readdir calls.
gpfs_fis_write_calls: Number of write calls.

GPFSNSDDisk

gpfs_nsdds_bytes_read: Number of bytes read.
gpfs_nsdds_bytes_written: Number of bytes written.
gpfs_nsdds_max_disk_wait_rd: The longest time spent waiting for a disk read operation.
gpfs_nsdds_max_disk_wait_wr: The longest time spent waiting for a disk write operation.
gpfs_nsdds_max_queue_wait_rd: The longest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_nsdds_max_queue_wait_wr: The longest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_nsdds_min_disk_wait_rd: The shortest time spent waiting for a disk read operation.
gpfs_nsdds_min_disk_wait_wr: The shortest time spent waiting for a disk write operation.
gpfs_nsdds_min_queue_wait_rd: The shortest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_nsdds_min_queue_wait_wr: The shortest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_nsdds_read_ops: Number of read operations.
gpfs_nsdds_tot_disk_wait_rd: The total time in seconds spent waiting for disk read operations.
gpfs_nsdds_tot_disk_wait_wr: The total time in seconds spent waiting for disk write operations.
gpfs_nsdds_tot_queue_wait_rd: The total time spent between being enqueued for a read operation and the completion of that operation.
gpfs_nsdds_tot_queue_wait_wr: The total time spent between being enqueued for a write operation and the completion of that operation.
gpfs_nsdds_write_ops: Number of write operations.

GPFSNSDFS

gpfs_nsdfs_bytes_read: Number of NSD bytes read, aggregated to the file system.
gpfs_nsdfs_bytes_written: Number of NSD bytes written, aggregated to the file system.
gpfs_nsdfs_read_ops: Number of NSD read operations, aggregated to the file system.
gpfs_nsdfs_write_ops: Number of NSD write operations, aggregated to the file system.

GPFSNSDPool

gpfs_nsdpool_bytes_read: Number of NSD bytes read, aggregated to the file system.
gpfs_nsdpool_bytes_written: Number of NSD bytes written, aggregated to the file system.
gpfs_nsdpool_read_ops: Number of NSD read operations, aggregated to the file system.
gpfs_nsdpool_write_ops: Number of NSD write operations, aggregated to the file system.

GPFSNode

gpfs_ns_bytes_read: Number of bytes read.
gpfs_ns_bytes_written: Number of bytes written.
gpfs_ns_clusters: Number of clusters participating
gpfs_ns_disks: Number of disks in all mounted file systems
gpfs_ns_filesys: Number of mounted file systems
gpfs_ns_max_disk_wait_rd: The longest time spent waiting for a disk read operation.
gpfs_ns_max_disk_wait_wr: The longest time spent waiting for a disk write operation.
gpfs_ns_max_queue_wait_rd: The longest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_ns_max_queue_wait_wr: The longest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_ns_min_disk_wait_rd: The shortest time spent waiting for a disk read operation.
gpfs_ns_min_disk_wait_wr: The shortest time spent waiting for a disk write operation.
gpfs_ns_min_queue_wait_rd: The shortest time between being enqueued for a disk read operation and the completion of that operation.
gpfs_ns_min_queue_wait_wr: The shortest time between being enqueued for a disk write operation and the completion of that operation.
gpfs_ns_read_ops: Number of read operations.
gpfs_ns_tot_disk_wait_rd: The total time in seconds spent waiting for disk read operations.
gpfs_ns_tot_disk_wait_wr: The total time in seconds spent waiting for disk write operations.
gpfs_ns_tot_queue_wait_rd: The total time spent between being enqueued for a read operation and the completion of that operation.
gpfs_ns_tot_queue_wait_wr: The total time spent between being enqueued for a write operation and the completion of that operation.
gpfs_ns_write_ops: Number of write operations.

GPFSNodeAPI

gpfs_is_bytes_read: Number of bytes read.
gpfs_is_bytes_written: Number of bytes written.
gpfs_is_close_calls: Number of close calls.
gpfs_is_inodes_written: Number of inode updates to disk.
gpfs_is_open_calls: Number of open calls.
gpfs_is_readDir_calls: Number of readdir calls.
gpfs_is_read_calls: Number of read calls.
gpfs_is_write_calls: Number of write calls.

GPFSPool

For each pool in each file system: Cluster name - GPFSPool - filesystem name -pool name.
For example: myCluster|GPFSPool|myPool|gpfs_pool_free_dataKB.

gpfs_pool_total_dataKB: Total capacity for data (in KB) in this pool.
gpfs_pool_free_dataKB: Free capacity for data (in KB) in this pool.
gpfs_pool_total_metaKB: Total capacity for metadata (in KB) in this pool.
gpfs_pool_free_metaKB: Free capacity for metadata (in KB) in this pool.

GPFSPoolIO

For the system within each GPFS™ device:

gpfs_pool_bytes_rd: Total size of all disks for this usage type.
gpfs_pool_bytes_wr: Total available disk space in full blocks for this usage type.
gpfs_pool_free_fragkb: Total available space in fragments for this usage type.

GPFSVFS

gpfs_vfs_accesses: Number of accesses operations.
gpfs_vfs_accesses_t: Amount of time in seconds spent in accesses operations.
gpfs_vfs_aioread: Number of aioread operations.
gpfs_vfs_aioread_t: Amount of time in seconds spent in aioread operations.
gpfs_vfs_aiowrite: Number of aiowrite operations.
gpfs_vfs_aiowrite_t: Amount of time in seconds spent in aiowrite operations.
gpfs_vfs_clear: Number of clear operations.
gpfs_vfs_clear_t: Amount of time in seconds spent in clear operations.
gpfs_vfs_close: Number of close operations.
gpfs_vfs_close_t: Amount of time in seconds spent in close operations.
gpfs_vfs_create: Number of create operations.
gpfs_vfs_create_t: Amount of time in seconds spent in create operations.
gpfs_vfs_decodeFh: Number of decodeFh operations.
gpfs_vfs_decodeFh_t: Amount of time in seconds spent in decodeFh operations.
gpfs_vfs_detDentry: Number of detDentry operations.
gpfs_vfs_encodeFh: Number of encodeFh operations.
gpfs_vfs_encodeFh_t: Amount of time in seconds spent in encodeFh operations.
gpfs_vfs_flock: Number of flock operations.
gpfs_vfs_flock_t: Amount of time in seconds spent in flock operations.
gpfs_vfs_fsync: Number of fsync operations.
gpfs_vfs_fsyncRange: Number of fsyncRange operations.
gpfs_vfs_fsyncRange_t: Amount of time in seconds spent in fsyncRange operations.
gpfs_vfs_fsync_t: Amount of time in seconds spent in fsync operations.
gpfs_vfs_ftrunc: Number of ftrunc operations.
gpfs_vfs_ftrunc_t: Amount of time in seconds spent in ftrunc operations.
gpfs_vfs_getDentry_t: Amount of time in seconds spent in getDentry operations.
gpfs_vfs_getParent: Number of getParent operations.
gpfs_vfs_getParent_t: Amount of time in seconds spent in getParent operations.
gpfs_vfs_getattr: Number of getattr operations.
gpfs_vfs_getattr_t: Amount of time in seconds spent in getattr operations.
gpfs_vfs_getxattr: Number of getxattr operations.
gpfs_vfs_getxattr_t: Amount of time in seconds spent in getxattr operations.
gpfs_vfs_link: Number of link operations.
gpfs_vfs_link_t: Amount of time in seconds spent in link operations.
gpfs_vfs_listxattr: Number of listxattr operations.
gpfs_vfs_listxattr_t: Amount of time in seconds spent in listxattr operations.
gpfs_vfs_lockctl: Number of lockctl operations.
gpfs_vfs_lockctl_t: Amount of time in seconds spent in lockctl operations.
gpfs_vfs_lookup: Number of lookup operations.
gpfs_vfs_lookup_t: Amount of time in seconds spent in lookup operations.
gpfs_vfs_mapLloff: Number of mapLloff operations.
gpfs_vfs_mapLloff_t: Amount of time in seconds spent in mapLloff operations.
gpfs_vfs_mkdir: Number of mkdir operations.
gpfs_vfs_mkdir_t: Amount of time in seconds spent in mkdir operations.
gpfs_vfs_mknod: Number of mknod operations.
gpfs_vfs_mknod_t: Amount of time in seconds spent in mknod operations.
gpfs_vfs_mmapread: Number of mmapread operations.
gpfs_vfs_mmapread_t: Amount of time in seconds spent in mmapread operations.
gpfs_vfs_mmapwrite: Number of mmapwrite operations.
gpfs_vfs_mmapwrite_t: Amount of time in seconds spent in mmapwrite operation.
gpfs_vfs_mount: Number of mount operations.
gpfs_vfs_mount_t: Amount of time in seconds spent in mount operations.
gpfs_vfs_open: Number of open operations.
gpfs_vfs_open_t: Amount of time in seconds spent in open operations.
gpfs_vfs_read: Number of read operations.
gpfs_vfs_read_t: Amount of time in seconds spent in read operations.
gpfs_vfs_readdir: Number of readdir operations.
gpfs_vfs_readdir_t: Amount of time in seconds spent in readdir operations.
gpfs_vfs_readlink: Number of readlink operations.
gpfs_vfs_readlink_t: Amount of time in seconds spent in readlink operations
gpfs_vfs_readpage: Number of readpage operations.
gpfs_vfs_readpage_t: Amount of time in seconds spent in readpage operations.
gpfs_vfs_remove: Number of remove operations.
gpfs_vfs_remove_t: Amount of time in seconds spent in remove operations.
gpfs_vfs_removexattr: Number of removexattr operations.
gpfs_vfs_removexattr_t: Amount of time in seconds spent in removexattr operations.
gpfs_vfs_rename: Number of rename operations.
gpfs_vfs_rename_t: Amount of time in seconds spent in rename operations.
gpfs_vfs_rmdir: Number of rmdir operations.
gpfs_vfs_rmdir_t: Amount of time in seconds spent in rmdir operations.
gpfs_vfs_setacl: Number of setacl operations.
gpfs_vfs_setacl_t: Amount of time in seconds spent in setacl operations.
gpfs_vfs_setattr: Number of setattr operations.
gpfs_vfs_setattr_t: Amount of time in seconds spent in setattr operations.
gpfs_vfs_setxattr: Number of setxattr operations.
gpfs_vfs_setxattr_t: Amount of time in seconds spent in setxattr operations.
gpfs_vfs_statfs: Number of statfs operations.
gpfs_vfs_statfs_t: Amount of time in seconds spent in statfs operations.
gpfs_vfs_symlink: Number of symlink operations.
gpfs_vfs_symlink_t: Amount of time in seconds spent in symlink operations.
gpfs_vfs_sync: Number of sync operations.
gpfs_vfs_sync_t: Amount of time in seconds spent in sync operations.
gpfs_vfs_tsfattr: Number of tsfsattr operation.
gpfs_vfs_tsfattr_t: Amount of time in seconds spent in tsfattr operations.
gpfs_vfs_tsfsattr: Number of tsfattr operations.
gpfs_vfs_tsfsattr_t: Amount of time in seconds spent in tsfsattr operations.
gpfs_vfs_unmap: Number of unmap operations.
gpfs_vfs_unmap_t: Amount of time in seconds spent in unmap operations.
gpfs_vfs_vget: Number of vget operations.
gpfs_vfs_vget_t: Amount of time in seconds spent in vget operations.
gpfs_vfs_write: Number of write operations.
gpfs_vfs_write_t: Amount of time in seconds spent in write operations.
gpfs_vfs_writepage: Number of writepage operations.
gpfs_vfs_writepage_t: Amount of time in seconds spent in writepage operations.

GPFSWaiters

For each independent fileset in the file system: Node- GPFSWaiters - waiters_time_threshold (all, 0.1s, 0.2s, 0.5s, 1.0s, 30.0s, 60.0s).

Note: Here 'all' implies a waiting time greater than or equal to 0 seconds.

For example: myNode|GPFSWaiters|all|gpfs_wt_count_all.

gpfs_wt_count_all : Count of all threads with waiting time greater than or equal to waiters_time_threshold seconds.
gpfs_wt_count_local_io: Count of threads waiting for local I/O with waiting time greater than or equal to waiters_time_threshold seconds.
gpfs_wt_count_network_io: Count of threads waiting for network I/O with waiting time greater than or equal to waiters_time_threshold seconds.
gpfs_wt_count_thcond: Count of threads waiting for a GPFS condition variable to be signaled with waiting time greater than or equal to waiters_time_threshold seconds.
gpfs_wt_count_thmutex: Count of threads waiting to lock a GPFS mutex with waiting time greater than or equal towaiters_time_threshold seconds.
gpfs_wt_count_delay: Count of threads waiting for delay interval expiration with waiting time greater than or equal to waiters_time_threshold seconds.
gpfs_wt_count_syscall: Count of threads waiting for system call completion with waiting time greater than or equal towaiters_time_threshold seconds.

Computed Metrics

The following metrics are computed for GPFS:

gpfs_write_avg_lat (latency): gpfs_vfs_write_t / gpfs_vfs_write
gpfs_read_avg_lat (latency): gpfs_vfs_read_t / gpfs_vfs_read
gpfs_create_avg_lat (latency): gpfs_vfs_create_t / gpfs_vfs_create
gpfs_remove_avg_lat (latency): gpfs_vfs_remove_t / gpfs_vfs_remove

NFS

Native Metrics

nfs_read_req: Number of bytes requested for reading.
nfs_write_req: Number of bytes requested for writing.
nfs_read: Number of bytes transferred for reading.
nfs_write: Number of bytes transferred for writing.
nfs_read_ops: Number of total read operations.
nfs_write_ops: Number of total write operations.
nfs_read_err: Number of erroneous read operations.
nfs_write_err: Number of erroneous write operations.
nfs_read_lat: Time consumed by read operations (in ns).
nfs_write_lat: Time consumed by write operations (in ns).
nfs_read_queue: Time spent in the rpc wait queue.
nfs_write_queue: Time spent in the rpc wait queue.

Computed Metrics

The following metrics are computed for NFS:

nfs_total_ops: nfs_read_ops + nfs_write_ops
nfsIOlatencyRead: (nfs_read_lat + nfs_read_queue) / nfs_read_ops
nfsIOlatencyWrite: (nfs_write_lat + nfs_write_queue) / nfs_write_ops
nfsReadOpThroughput: nfs_read/nfs_read_ops
nfsWriteOpThroughput: nfs_write/nfs_write_ops

Object

Native Metrics

ObjectAccount

account_auditor_time: Timing data for individual account database audits.
account_reaper_time: Timing data for each reap_account() call.
account_replicator_time: Timing data for each database replication attempt not resulting in a failure.
account_DEL_time: Timing data for each DELETE request not resulting in an error.
account_DEL_err_time: Timing data for each DELETE request resulting in an error: bad request, not mounted, missing timestamp.
account_GET_time: Timing data for each GET request not resulting in an error.
account_GET_err_time: Timing data for each GET request resulting in an error: bad request, not mounted, bad delimiter, account listing limit too high, bad accept header.
account_HEAD_time: Timing data for each HEAD request not resulting in an error.
account_HEAD_err_time: Timing data for each HEAD request resulting in an error: bad request, not mounted.
account_POST_time: Timing data for each POST request not resulting in an error.
account_POST_err_time: Timing data for each POST request resulting in an error: bad request, bad or missing timestamp, not mounted.
account_PUT_time: Timing data for each PUT request not resulting in an error.
account_PUT_err_time: Timing data for each PUT request resulting in an error: bad request, not mounted, conflict, recently-deleted.
account_REPLICATE_time: Timing data for each REPLICATE request not resulting in an error.
account_REPLICATE_err_time: Timing data for each REPLICATE request resulting in an error: bad request, not mounted.

ObjectContainer

container_auditor_time: Timing data for each container audit.
container_replicator_time: Timing data for each database replication attempt not resulting in a failure.
container_DEL_time: Timing data for each DELETE request not resulting in an error.
container_DEL_err_time: Timing data for DELETE request errors: bad request, not mounted, missing timestamp, conflict.
container_GET_time: Timing data for each GET request not resulting in an error.
container_GET_err_time: Timing data for GET request errors: bad request, not mounted, parameters not utf8, bad accept header.
container_HEAD_time: Timing data for each HEAD request not resulting in an error.
container_HEAD_err_time: Timing data for HEAD request errors: bad request, not mounted.
container_POST_time: Timing data for each POST request not resulting in an error.
container_POST_err_time: Timing data for POST request errors: bad request, bad x-container-sync-to, not mounted.
container_PUT_time: Timing data for each PUT request not resulting in an error.
container_PUT_err_time: Timing data for PUT request errors: bad request, missing timestamp, not mounted, conflict.
container_REPLICATE_time: Timing data for each REPLICATE request not resulting in an error.
container_REPLICATE_err_time: Timing data for REPLICATE request errors: bad request, not mounted.
container_sync_deletes_time: Timing data for each container database row synchronization via deletion.
container_sync_puts_time: Timing data for each container database row synchronization via PUTing.
container_updater_time: Timing data for processing a container; only includes timing for containers which needed to update their accounts.

ObjectObject

object_auditor_time: Timing data for each object audit (does not include any rate-limiting sleep time for max_files_per_second, but does include rate-limiting sleep time for max_bytes_per_second).
object_expirer_time: Timing data for each object expiration attempt, including ones resulting in an error.
object_replicator_partition_delete_time: Timing data for partitions replicated to another node because they didn’t belong on this node. This metric is not tracked per device.
object_replicator_partition_update_time: Timing data for partitions replicated which also belong on this node. This metric is not tracked per-device.
object_DEL_time: Timing data for each DELETE request not resulting in an error.
object_DEL_err_time: Timing data for DELETE request errors: bad request, missing timestamp, not mounted, precondition failed. Includes requests which couldn’t find or match the object.
object_GET_time: Timing data for each GET request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine).
object_GET_err_time: Timing data for GET request errors: bad request, not mounted, header timestamps before the epoch, precondition failed. File errors resulting in a quarantine are not counted here.
object_HEAD_time: Timing data for each HEAD request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine).
object_HEAD_err_time: Timing data for HEAD request errors: bad request, not mounted.
object_POST_time: Timing data for each POST request not resulting in an error.
object_POST_err_time: Timing data for POST request errors: bad request, missing timestamp, delete-at in past, not mounted.
object_PUT_time: Timing data for each PUT request not resulting in an error.
object_PUT_err_time: Timing data for PUT request errors: bad request, not mounted, missing timestamp, object creation constraint violation, delete-at in past.
object_REPLICATE_time: Timing data for each REPLICATE request not resulting in an error.
object_REPLICATE_err_time: Timing data for REPLICATE request errors: bad request, not mounted.
object_updater_time: Timing data for object sweeps to flush async_pending container updates. Does not include object sweeps which did not find an existing async_pending storage directory.

ObjectProxy

proxy_account_latency: Timing data up to completion of sending the response headers, 200: standard response for successful HTTP requests.
proxy_container_latency: Timing data up to completion of sending the response headers, 200: standard response for successful HTTP requests.
proxy_object_latency: Timing data up to completion of sending the response headers, 200: standard response for successful HTTP requests.
proxy_account_GET_time: Timing data for GET request, start to finish, 200: standard response for successful HTTP requests
proxy_account_GET_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 200: standard response for successful HTTP requests.
proxy_account_HEAD_time: Timing data for HEAD request, start to finish, 204: request processed, no content returned.
proxy_account_HEAD_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 204: request processed, no content returned.
proxy_container_DEL_time: Timing data for DELETE request, start to finish, 204: request processed, no content returned.
proxy_container_DEL_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 204: request processed, no content returned.
proxy_container_GET_time: Timing data for GET request, start to finish, 200: standard response for successful HTTP requests.
proxy_container_GET_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 200: standard response for successful HTTP requests.
proxy_container_HEAD_time: Timing data for HEAD request, start to finish, 204: request processed, no content returned.
proxy_container_HEAD_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 204: request processed, no content returned. 1
proxy_container_PUT_time: Timing data for each PUT request not resulting in an error, 201: request has been fulfilled; new resource created.
proxy_container_PUT_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 201: request has been fulfilled; new resource created.
proxy_object_DEL_time: Timing data for DELETE request, start to finish, 204: request processed, no content returned.
proxy_object_DEL_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 204: request processed, no content returned.
proxy_object_GET_time: Timing data for GET request, start to finish, 200: standard response for successful HTTP requests.
proxy_object_GET_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 200: standard response for successful HTTP requests.
proxy_object_HEAD_time: Timing data for HEAD request, start to finish, 200: request processed, no content returned.
proxy_object_HEAD_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, , 200: request processed, no content returned.
proxy_object_PUT_time: Timing data for each PUT request not resulting in an error, 201: request has been fulfilled; new resource created.
proxy_object_PUT_bytes: The sum of bytes transferred in (from clients) and out (to clients) for requests, 201: request has been fulfilled; new resource created.

Note: For information about computed metrics for object, see Performance monitoring for object metrics.

SMB

SMBGlobalStats

connect count: Number of connections since startup of parent smbd process
disconnect count: Number of connections closed since startup
idle: Describes idling behavior of smbds
- count: Number of times the smbd processes are waiting for events in epoll
- time: Times the smbd process spend in epoll waiting for events
cpu_user time: The user time determined by the get_rusage system call in seconds
cpu_system time: The system time determined by the get_rusage system call in seconds
request count: Number of SMB requests since startup
push_sec_ctx: Smbds switch between the user and the root security context; push allows to put the current context onto a stack
- count: Number of time the current security context is pushed onto the stack
- time: The time it takes to put the current security context; this includes all syscalls required to save the current context on the stack
pop_sec_ctx: Getting the last security context from the stackand restore it
- count: Number of times the current security context is restored from the stack
- time: The time it takes to put the restore the security context from the stack; this includes all syscalls required to get restore the security context from the stack
set_sec_ctx:
- count: Number of times the security context is set for user
- time: The time it takes to set the security context for user
set_root_sec_ctx:
- count: Number of times the security context is set for user
- time: The time it takes to set the security context for user

SMB2 metrics

These metrics are available for all of the following areas:

op_count: Number of times the corresponding SMB request has been called.
op_idle
- for notify: Time between notification request and a corresponding notification being sent
- for oplock breaks: Time waiting until an oplock is broken
- for all others the value is always zero
op_inbytes: Number of bytes received for the corresponding request including protocol headers
op_outbytes: Number of bytes sent for the corresponding request including protocol headers.
op_time: The total amount of time spent for all corresponding SMB2 requests.

CTDB

CTDB version: Version of the CTDB protocol used by the node.
Current time of statistics: Time when the statistics are generated. This is useful when collecting statistics output periodically for post-processing.
Statistics collected since: Time when CTDB was started or the last time statistics was reset. The output shows the duration and the timestamp.
num_clients: Number of processes currently connected to CTDB's UNIX socket. This includes recovery daemon, CTDB tool and SMB processes (smbd, winbindd).
frozen: 1 if the databases are currently frozen, 0 if otherwise.
recovering: 1 if recovery is active, 0 if otherwise.
num_recoveries: Number of recoveries since the start of CTDB or since the last statistics reset.
client_packets_sent: Number of packets sent to client processes via UNIX domain socket.
client_packets_recv: Number of packets received from client processes via UNIX domain socket.
node_packets_sent: Number of packets sent to the other nodes in the cluster via TCP.
node_packets_recv: Number of packets received from the other nodes in the cluster via TCP.
keepalive_packets_sent: Number of keepalive messages sent to other nodes. CTDB periodically sends keepalive messages to other nodes. For more information, see the KeepAliveInterval tunable in CTDB-tunables(7) on the CTDB documentation website.
keepalive_packets_recv: Number of keepalive messages received from other nodes.
node: This section lists various types of messages processed which originated from other nodes via TCP.
- req_call: Number of REQ_CALL messages from the other nodes.
- reply_call: Number of REPLY_CALL messages from the other nodes.
- req_dmaster: Number of REQ_DMASTER messages from the other nodes.
- reply_dmaster: Number of REPLY_DMASTER messages from the other nodes.
- reply_error: Number of REPLY_ERROR messages from the other nodes.
- req_message: Number of REQ_MESSAGE messages from the other nodes.
- req_control: Number of REQ_CONTROL messages from the other nodes.
- reply_control: Number of REPLY_CONTROL messages from the other nodes.
client: This section lists various types of messages processed which originated from clients via UNIX domain socket.
- req_call: Number of REQ_CALL messages from the clients.
- req_message: Number of REQ_MESSAGE messages from the clients.
- req_control: Number of REQ_CONTROL messages from the clients.
timeouts: This section lists timeouts occurred when sending various messages.
- call: Number of timeouts for REQ_CALL messages.
- control: Number of timeouts for REQ_CONTROL messages.
- traverse: Number of timeouts for database traverse operations.
locks: This section lists locking statistics.
- num_calls: Number of completed lock calls. This includes database locks and record locks.
- num_current: Number of scheduled lock calls. This includes database locks and record locks.
- num_pending: Number of queued lock calls. This includes database locks and record locks.
- num_failed: Number of failed lock calls. This includes database locks and record locks.
total_calls: Number of req_call messages processed from clients. This number should be same as client --> req_call.
pending_calls: Number of req_call messages which are currently being processed. This number indicates the number of record migrations in flight.
childwrite_calls: Number of record update calls. Record update calls are used to update a record under a transaction.
pending_childwrite_calls: Number of record update calls currently active.
memory_used: The amount of memory in bytes currently used by CTDB using talloc. This includes all the memory used for CTDBÂ´s internal data structures. This does not include the memory mapped TDB databases.
max_hop_count: The maximum number of hops required for a record migration request to obtain the record. High numbers indicate record contention.
total_ro_delegations: Number of read-only delegations created.
total_ro_revokes: Number of read-only delegations that were revoked. The difference between total_ro_revokes and total_ro_delegations gives the number of currently active read-only delegations.
hop_count_buckets: Distribution of migration requests based on hop counts values.
lock_buckets: Distribution of record lock requests based on time required to obtain locks. Buckets are < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, > 64s.
locks_latency: The minimum, the average and the maximum time (in seconds) required to obtain record locks.
reclock_ctdbd: The minimum, the average and the maximum time (in seconds) required to check if recovery lock is still held by recovery daemon when recovery mode is changed. This check is done in ctdb daemon.
reclock_recd: The minimum, the average and the maximum time (in seconds) required to check if recovery lock is still held by recovery daemon during recovery. This check is done in recovery daemon.
call_latency: The minimum, the average and the maximum time (in seconds) required to process a REQ_CALL message from client. This includes the time required to migrate a record from remote node, if the record is not available on the local node.
childwrite_latency: The minimum, the average and the maximum time (in seconds) required to update records under a transaction.

Cross Protocol

nfs_iorate_read_perc: nfs_read_ops/(op_count+nfs_read_ops)
nfs_iorate_read_perc_exports: 1.0*nfs_read_ops/(op_count+nfs_read_ops)
nfs_iorate_write_perc: nfs_write_ops/(write|op_count+nfs_write_ops)
nfs_iorate_write_perc_exports: 1.0*nfs_write_ops/(op_count+nfs_write_ops)
nfs_read_throughput_perc: nfs_read/(read|op_outbytes+nfs_read)
nfs_write_throughput_perc: nfs_write/(write|op_outbytes+nfs_write)
smb_iorate_read_perc: op_count/(op_count+nfs_read_ops)
smb_iorate_write_perc: op_count/(op_count+nfs_write_ops)
smb_latency_read: read|op_time/read|op_count
smb_latency_write: write|op_time/write|op_count
smb_read_throughput_perc: read|op_outbytes/(read|op_outbytes+nfs_read)
smb_total_cnt: write|op_count+close|op_count
smb_tp: op_inbytes+op_outbytes
smb_write_throughput_perc: write|op_outbytes/(write|op_outbytes+nfs_write)
total_read_throughput: nfs_read+read|op_outbytes
total_write_throughput: nfs_write+write|op_inbytes

Cloud services

mcs_total_bytes: Total number of bytes uploaded to or downloaded from the cloud storage tier
mcs_total_requests: Total number of migration, recall, or remove requests
mcs_total_request_time: Time (in second) taken for all migration, recall, or remove requests
mcs_total_failed_requests: Total number of failed migration, recall, or remove requests
mcs_total_failed_requests_time: The total time (msec) spent in failed migration, recall, or remove requests
mcs_total_persisted_bytes: The total number of transferred bytes that are successfully persisted on the cloud provider. This is used for both migrate and recall operations.
mcs_total_retried_operations: The total number of retry PUT operations. This is used for both migrate and recall operations.
mcs_total_operation_errors: The total number of error PUT/GET operations based on the operation specified in the mcs_operation key