Monitoring IBM Storage Scale service using Cloudera Manager

This section lists the steps to monitor the IBM Storage Scale service using Cloudera Manager.

Prerequisites:

Ensure that the transparency.namenode.http.port and transparency.datanode.http.port parameters are correctly set within the IBM Spectrum® Scale service as described in Installing Cloudera Data Platform Private Cloud Base with IBM Storage Scale.

Steps

  1. Go to the Cloudera Manager GUI > Clusters > Your cluster view.
  2. Click the drop-down on the right side of page and select Add from Chart Builder.
  3. To list all the graphs for DataNode, in the query box enter the following:
    select * where roleType=TRANSPARENCY_DATANODE
  4. Click Build Chart.
  5. In Facets, select All Separate to see all the attributes in individual graphs.
  6. You can write the same query for NameNode as follows:
    select * where roleType=TRANSPARENCY_NAMENODE

For more information on the TSQuery format, see tsquery Syntax.

Following are the NameNode and DataNode graph lists with their meanings:
Attribute Name Meaning Regular expression matching to the JMX bean
spectrumscale_hdfs_block_checksum_op_avg_time Block Checksum Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::BlockChecksumOpAvgTime
spectrumscale_hdfs_block_checksum_op_num_ops Block Checksum Operations Hadoop:service=DataNode,name=DataNodeActivity-*::BlockChecksumOpNumOps
spectrumscale_hdfs_block_reports_avg_time Block Reports Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::BlockReportsAvgTime
spectrumscale_hdfs_block_reports_num_ops Block Reports Operations Hadoop:service=DataNode,name=DataNodeActivity-*::BlockReportsNumOps
spectrumscale_hdfs_block_verification_failures Block Verification Failures Hadoop:service=DataNode,name=DataNodeActivity-*::BlockVerificationFailures
spectrumscale_hdfs_blocks_cached The total number of HDFS blocks cached over the lifetime of the process. Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksCached
spectrumscale_hdfs_blocks_get_local_path_info Blocks Get Local Path Info Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksGetLocalPathInfo
spectrumscale_hdfs_blocks_read Blocks Read Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksRead
spectrumscale_hdfs_blocks_removed Blocks Removed Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksRemoved
spectrumscale_hdfs_blocks_replicated Blocks Replicated Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksReplicated
spectrumscale_hdfs_blocks_uncached The total number of HDFS blocks uncached over the lifetime of the process. Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksUncached
spectrumscale_hdfs_blocks_verified Blocks Verified Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksVerified
spectrumscale_hdfs_blocks_written Blocks Written Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksWritten
spectrumscale_hdfs_bytes_read Number of bytes read Hadoop:service=DataNode,name=DataNodeActivity-*::BytesRead
spectrumscale_hdfs_bytes_written Bytes Written Hadoop:service=DataNode,name=DataNodeActivity-*::BytesWritten
spectrumscale_hdfs_cache_reports_avg_time The average time to generate cache reports on the DataNode. Hadoop:service=DataNode,name=DataNodeActivity-*::CacheReportsAvgTime
spectrumscale_hdfs_cache_reports_num_ops The total number of generate cache reports operations on the DataNode. Hadoop:service=DataNode,name=DataNodeActivity-*::CacheReportsNumOps
spectrumscale_hdfs_copy_block_op_avg_time Copy Block Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::CopyBlockOpAvgTime
spectrumscale_hdfs_copy_block_op_num_ops Copy Block Operations Hadoop:service=DataNode,name=DataNodeActivity-*::CopyBlockOpNumOps
spectrumscale_hdfs_flush_nanos_avg_time Average Disk Flush Time Hadoop:service=DataNode,name=DataNodeActivity-*::FlushNanosAvgTime
spectrumscale_hdfs_flush_nanos_num_ops Disk Flushes Hadoop:service=DataNode,name=DataNodeActivity-*::FlushNanosNumOps
spectrumscale_hdfs_fsync_nanos_avg_time Average Disk Fsync Time Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncNanosAvgTime
spectrumscale_hdfs_fsync_nanos_num_ops Disk Fsyncs Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncNanosNumOps
spectrumscale_hdfs_fsync_num_ops Fsync Operations Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncCount
spectrumscale_hdfs_heartbeats_avg_time Heartbeat Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::HeartbeatsAvgTime
spectrumscale_hdfs_heartbeats_num_ops Heartbeats Hadoop:service=DataNode,name=DataNodeActivity-*::HeartbeatsNumOps
spectrumscale_hdfs_send_data_packet_blocked_on_network_nanos_avg_time Send Data Packet Blocked On Network Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketBlockedOnNetworkNanosAvgTime
spectrumscale_hdfs_send_data_packet_blocked_on_network_nanos_num_ops Send Data Packet Blocked On Network Operations Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketBlockedOnNetworkNanosNumOps
spectrumscale_hdfs_send_data_packet_transfer_nanos_avg_time Send Data Packet Transfer Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketTransferNanosAvgTime
spectrumscale_hdfs_send_data_packet_transfer_nanos_num_ops Send Data Packet Transfer Operations Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketTransferNanosNumOps
spectrumscale_hdfs_write_block_op_avg_time Write Block Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::WriteBlockOpAvgTime
spectrumscale_hdfs_write_block_op_num_ops Write Block Operations Hadoop:service=DataNode,name=DataNodeActivity-*::WriteBlockOpNumOps
spectrumscale_hdfs_writes_from_local_client Writes From Local Clients Hadoop:service=DataNode,name=DataNodeActivity-*::WritesFromLocalClient
spectrumscale_hdfs_writes_from_remote_client Writes From Remote Clients Hadoop:service=DataNode,name=DataNodeActivity-*::WritesFromRemoteClient
spectrumscale_hdfs_packet_ack_round_trip_time_nanos_avg_time Packet Ack Round Trip Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::PacketAckRoundTripTimeNanosAvgTime
spectrumscale_hdfs_packet_ack_round_trip_time_nanos_num_ops Packet Ack Round Trip Operations Hadoop:service=DataNode,name=DataNodeActivity-*::PacketAckRoundTripTimeNanosNumOps
spectrumscale_hdfs_read_block_op_avg_time Read Block Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::ReadBlockOpAvgTime
spectrumscale_hdfs_read_block_op_num_ops Read Block Operations Hadoop:service=DataNode,name=DataNodeActivity-*::ReadBlockOpNumOps
spectrumscale_hdfs_reads_from_local_client Reads From Local Clients Hadoop:service=DataNode,name=DataNodeActivity-*::ReadsFromLocalClient
spectrumscale_hdfs_reads_from_remote_client Reads From Remote Clients Hadoop:service=DataNode,name=DataNodeActivity-*::ReadsFromRemoteClient
spectrumscale_hdfs_replace_block_op_avg_time Replace Block Operation Average Time Hadoop:service=DataNode,name=DataNodeActivity-*::ReplaceBlockOpAvgTime
spectrumscale_hdfs_replace_block_op_num_ops Replace Block Operations Hadoop:service=DataNode,name=DataNodeActivity-*::ReplaceBlockOpNumOps
spectrumscale_hdfs_jvm_blocked_threads Blocked threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsBlocked
spectrumscale_hdfs_jvm_gc_count Number of garbage collections Hadoop:service=DataNode,name=JvmMetrics::GcCount
spectrumscale_hdfs_jvm_gc_time_ms Total time spent garbage collecting. Hadoop:service=DataNode,name=JvmMetrics::GcTimeMillis
spectrumscale_hdfs_jvm_heap_committed_mb Total amount of committed heap memory. Hadoop:service=DataNode,name=JvmMetrics::MemHeapCommittedM
spectrumscale_hdfs_jvm_heap_used_mb Total amount of used heap memory. Hadoop:service=DataNode,name=JvmMetrics::MemHeapUsedM
spectrumscale_hdfs_jvm_max_memory_mb Maximum allowed memory. Hadoop:service=DataNode,name=JvmMetrics::MemMaxM
spectrumscale_hdfs_jvm_new_threads New threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsNew
spectrumscale_hdfs_jvm_non_heap_committed_mb Total amount of committed non-heap memory. Hadoop:service=DataNode,name=JvmMetrics::MemNonHeapCommittedM
spectrumscale_hdfs_jvm_non_heap_used_mb Total amount of used non-heap memory. Hadoop:service=DataNode,name=JvmMetrics::MemNonHeapUsedM
spectrumscale_hdfs_jvm_pause_time The amount of extra time the jvm was paused above the requested sleep time. The JVM pause monitor sleeps for 500 milliseconds and any extra time it waited above this is counted in the pause time. Hadoop:service=DataNode,name=JvmMetrics::GcTotalExtraSleepTime
spectrumscale_hdfs_jvm_pauses_info_threshold_count Number of JVM pauses longer than the info threshold but shorter than the warning threshold. By default the info threshold is set to 1 second. To change use this configuration key JvmPauseMonitorService.info-threshold.ms Hadoop:service=DataNode,name=JvmMetrics::GcNumInfoThresholdExceeded
spectrumscale_hdfs_jvm_pauses_warn_threshold_count Number of JVM pauses longer than the warning threshold. By default the warning threshold is set to 10 second. To change use this configuration key JvmPauseMonitorService.warn-threshold.ms Hadoop:service=DataNode,name=JvmMetrics::GcNumWarnThresholdExceeded
spectrumscale_hdfs_jvm_runnable_threads Runnable threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsRunnable
spectrumscale_hdfs_jvm_terminated_threads Terminated threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsTerminated
spectrumscale_hdfs_jvm_timed_waiting_threads Timed waiting threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsTimedWaiting
spectrumscale_hdfs_jvm_waiting_threads Waiting threads Hadoop:service=DataNode,name=JvmMetrics::ThreadsWaiting
spectrumscale_hdfs_log_error Logged Errors Hadoop:service=DataNode,name=JvmMetrics::LogError
spectrumscale_hdfs_log_fatal Logged Fatals Hadoop:service=DataNode,name=JvmMetrics::LogFatal
spectrumscale_hdfs_log_info Logged Infos Hadoop:service=DataNode,name=JvmMetrics::LogInfo
spectrumscale_hdfs_log_warn Logged Warnings Hadoop:service=DataNode,name=JvmMetrics::LogWarn
spectrumscale_hdfs_login_failure_avg_time Average Failed Login Time Hadoop:service=DataNode,name=UgiMetrics::LoginFailureAvgTime
spectrumscale_hdfs_login_failure_num_ops Login Failures Hadoop:service=DataNode,name=UgiMetrics::LoginFailureNumOps
spectrumscale_hdfs_login_success_avg_time Average Successful Login Time Hadoop:service=DataNode,name=UgiMetrics::LoginSuccessAvgTime
spectrumscale_hdfs_login_success_num_ops Login Successes Hadoop:service=DataNode,name=UgiMetrics::LoginSuccessNumOps
spectrumscale_hdfs_metrics_dropped_pub_all Dropped Metrics Updates By All Sinks Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::DroppedPubAll
spectrumscale_hdfs_metrics_num_active_sinks Active Metrics Sinks Count Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumActiveSinks
spectrumscale_hdfs_metrics_num_active_sources Active Metrics Sources Count Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumActiveSources
spectrumscale_hdfs_metrics_num_all_sinks All Metrics Sinks Count Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumAllSinks
spectrumscale_hdfs_metrics_num_all_sources All Metrics Sources Count Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumAllSources
spectrumscale_hdfs_metrics_publish_avg_time Metrics Publish Average Time Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::PublishAvgTime
spectrumscale_hdfs_metrics_publish_num_ops Metrics Publish Operations Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::PublishNumOps
spectrumscale_hdfs_metrics_snapshot_avg_time Metrics Snapshot Average Time Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::SnapshotAvgTime
spectrumscale_hdfs_metrics_snapshot_num_ops Metrics Snapshot Average Operations Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::SnapshotNumOps
spectrumscale_hdfs_rpc_authentication_failures RPC Authentication Failures Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthenticationFailures
spectrumscale_hdfs_rpc_authentication_successes RPC Authentication Successes Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthenticationSuccesses
spectrumscale_hdfs_rpc_authorization_failures RPC Authorization Failures Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthorizationFailures
spectrumscale_hdfs_rpc_authorization_successes RPC Authorization Successes Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthorizationSuccesses
spectrumscale_hdfs_rpc_call_queue_length RPC Call Queue Length Hadoop:service=DataNode,name=RpcActivityForPort\\d+::CallQueueLength
spectrumscale_hdfs_rpc_num_open_connections Open RPC Connections Hadoop:service=DataNode,name=RpcActivityForPort\\d+::NumOpenConnections
spectrumscale_hdfs_rpc_processing_time_avg_time Average RPC Processing Time Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcProcessingTimeAvgTime
spectrumscale_hdfs_rpc_processing_time_num_ops RPCs Processed Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcProcessingTimeNumOps
spectrumscale_hdfs_rpc_queue_time_avg_time Average RPC Queue Time Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcQueueTimeAvgTime
spectrumscale_hdfs_rpc_queue_time_num_ops RPCs Queued Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcQueueTimeNumOps
spectrumscale_hdfs_rpc_received_bytes RPC Received Bytes Hadoop:service=DataNode,name=RpcActivityForPort\\d+::ReceivedBytes
spectrumscale_hdfs_rpc_sent_bytes RPC Sent Bytes Hadoop:service=DataNode,name=RpcActivityForPort\\d+::SentBytes
spectrumscale_hdfs_xceivers Transceivers Hadoop:service=DataNode,name=DataNodeInfo::XceiverCount
spectrumscale_hdfs_connections Current number of connections to NameNode Hadoop:service=NameNode,name=FSNamesystem::TotalLoad
spectrumscale_hdfs_fsnamesystem_lockqueuelength Number of threads waiting to acquire FSNameSystem lock Hadoop:service=NameNode,name=FSNamesystem::LockQueueLength
spectrumscale_hdfs_active_connection_holdinglease Number of active clients holding lease Hadoop:service=NameNode,name=FSNamesystem::NumActiveClients
spectrumscale_hdfs_state Current state of the file system: Safemode or Operational Hadoop:service=NameNode,name=FSNamesystem::FSState
spectrumscale_hdfs_ha_state Current state of the NameNode: initializing or active or standby or stopping state Hadoop:service=NameNode,name=FSNamesystem::tag.HAState
spectrumscale_hdfs_rpc_queue_time_num_ops RPCs Queued Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcQueueTimeNumOps
spectrumscale_hdfs_rpc_queue_time_avg_time Average RPC Queue Time Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcQueueTimeAvgTime
spectrumscale_hdfs_rpc_processing_time_num_ops RPCs Processed Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcProcessingTimeNumOps
spectrumscale_hdfs_rpc_processing_time_avg_time Average RPC Processing Time Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcProcessingTimeAvgTime
spectrumscale_hdfs_rpc_call_queue_length RPC Call Queue Length Hadoop:service=NameNode,name=RpcActivityForPort\\d+::CallQueueLength
spectrumscale_hdfs_rpc_num_open_connections Open RPC Connections Hadoop:service=NameNode,name=RpcActivityForPort\\d+::NumOpenConnections