Monitoring IBM Storage Scale service using Cloudera Manager
This section lists the steps to monitor the IBM Storage Scale service using Cloudera Manager.
Prerequisites:
Ensure that the transparency.namenode.http.port and transparency.datanode.http.port parameters are correctly set within the IBM Spectrum® Scale service as described in Installing Cloudera Data Platform Private Cloud Base with IBM Storage Scale.
Steps
- Go to the .
- Click the drop-down on the right side of page and select Add from Chart Builder.
- To list all the graphs for DataNode, in the query box enter the
following:
select * where roleType=TRANSPARENCY_DATANODE
- Click Build Chart.
- In Facets, select All Separate to see all the attributes in individual graphs.
- You can write the same query for NameNode as follows:
select * where roleType=TRANSPARENCY_NAMENODE
For more information on the TSQuery format, see tsquery Syntax.
Following are the NameNode and DataNode graph lists with their meanings:
Attribute Name | Meaning | Regular expression matching to the JMX bean |
---|---|---|
spectrumscale_hdfs_block_checksum_op_avg_time | Block Checksum Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::BlockChecksumOpAvgTime |
spectrumscale_hdfs_block_checksum_op_num_ops | Block Checksum Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::BlockChecksumOpNumOps |
spectrumscale_hdfs_block_reports_avg_time | Block Reports Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::BlockReportsAvgTime |
spectrumscale_hdfs_block_reports_num_ops | Block Reports Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::BlockReportsNumOps |
spectrumscale_hdfs_block_verification_failures | Block Verification Failures | Hadoop:service=DataNode,name=DataNodeActivity-*::BlockVerificationFailures |
spectrumscale_hdfs_blocks_cached | The total number of HDFS blocks cached over the lifetime of the process. | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksCached |
spectrumscale_hdfs_blocks_get_local_path_info | Blocks Get Local Path Info | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksGetLocalPathInfo |
spectrumscale_hdfs_blocks_read | Blocks Read | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksRead |
spectrumscale_hdfs_blocks_removed | Blocks Removed | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksRemoved |
spectrumscale_hdfs_blocks_replicated | Blocks Replicated | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksReplicated |
spectrumscale_hdfs_blocks_uncached | The total number of HDFS blocks uncached over the lifetime of the process. | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksUncached |
spectrumscale_hdfs_blocks_verified | Blocks Verified | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksVerified |
spectrumscale_hdfs_blocks_written | Blocks Written | Hadoop:service=DataNode,name=DataNodeActivity-*::BlocksWritten |
spectrumscale_hdfs_bytes_read | Number of bytes read | Hadoop:service=DataNode,name=DataNodeActivity-*::BytesRead |
spectrumscale_hdfs_bytes_written | Bytes Written | Hadoop:service=DataNode,name=DataNodeActivity-*::BytesWritten |
spectrumscale_hdfs_cache_reports_avg_time | The average time to generate cache reports on the DataNode. | Hadoop:service=DataNode,name=DataNodeActivity-*::CacheReportsAvgTime |
spectrumscale_hdfs_cache_reports_num_ops | The total number of generate cache reports operations on the DataNode. | Hadoop:service=DataNode,name=DataNodeActivity-*::CacheReportsNumOps |
spectrumscale_hdfs_copy_block_op_avg_time | Copy Block Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::CopyBlockOpAvgTime |
spectrumscale_hdfs_copy_block_op_num_ops | Copy Block Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::CopyBlockOpNumOps |
spectrumscale_hdfs_flush_nanos_avg_time | Average Disk Flush Time | Hadoop:service=DataNode,name=DataNodeActivity-*::FlushNanosAvgTime |
spectrumscale_hdfs_flush_nanos_num_ops | Disk Flushes | Hadoop:service=DataNode,name=DataNodeActivity-*::FlushNanosNumOps |
spectrumscale_hdfs_fsync_nanos_avg_time | Average Disk Fsync Time | Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncNanosAvgTime |
spectrumscale_hdfs_fsync_nanos_num_ops | Disk Fsyncs | Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncNanosNumOps |
spectrumscale_hdfs_fsync_num_ops | Fsync Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::FsyncCount |
spectrumscale_hdfs_heartbeats_avg_time | Heartbeat Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::HeartbeatsAvgTime |
spectrumscale_hdfs_heartbeats_num_ops | Heartbeats | Hadoop:service=DataNode,name=DataNodeActivity-*::HeartbeatsNumOps |
spectrumscale_hdfs_send_data_packet_blocked_on_network_nanos_avg_time | Send Data Packet Blocked On Network Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketBlockedOnNetworkNanosAvgTime |
spectrumscale_hdfs_send_data_packet_blocked_on_network_nanos_num_ops | Send Data Packet Blocked On Network Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketBlockedOnNetworkNanosNumOps |
spectrumscale_hdfs_send_data_packet_transfer_nanos_avg_time | Send Data Packet Transfer Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketTransferNanosAvgTime |
spectrumscale_hdfs_send_data_packet_transfer_nanos_num_ops | Send Data Packet Transfer Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::SendDataPacketTransferNanosNumOps |
spectrumscale_hdfs_write_block_op_avg_time | Write Block Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::WriteBlockOpAvgTime |
spectrumscale_hdfs_write_block_op_num_ops | Write Block Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::WriteBlockOpNumOps |
spectrumscale_hdfs_writes_from_local_client | Writes From Local Clients | Hadoop:service=DataNode,name=DataNodeActivity-*::WritesFromLocalClient |
spectrumscale_hdfs_writes_from_remote_client | Writes From Remote Clients | Hadoop:service=DataNode,name=DataNodeActivity-*::WritesFromRemoteClient |
spectrumscale_hdfs_packet_ack_round_trip_time_nanos_avg_time | Packet Ack Round Trip Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::PacketAckRoundTripTimeNanosAvgTime |
spectrumscale_hdfs_packet_ack_round_trip_time_nanos_num_ops | Packet Ack Round Trip Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::PacketAckRoundTripTimeNanosNumOps |
spectrumscale_hdfs_read_block_op_avg_time | Read Block Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::ReadBlockOpAvgTime |
spectrumscale_hdfs_read_block_op_num_ops | Read Block Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::ReadBlockOpNumOps |
spectrumscale_hdfs_reads_from_local_client | Reads From Local Clients | Hadoop:service=DataNode,name=DataNodeActivity-*::ReadsFromLocalClient |
spectrumscale_hdfs_reads_from_remote_client | Reads From Remote Clients | Hadoop:service=DataNode,name=DataNodeActivity-*::ReadsFromRemoteClient |
spectrumscale_hdfs_replace_block_op_avg_time | Replace Block Operation Average Time | Hadoop:service=DataNode,name=DataNodeActivity-*::ReplaceBlockOpAvgTime |
spectrumscale_hdfs_replace_block_op_num_ops | Replace Block Operations | Hadoop:service=DataNode,name=DataNodeActivity-*::ReplaceBlockOpNumOps |
spectrumscale_hdfs_jvm_blocked_threads | Blocked threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsBlocked |
spectrumscale_hdfs_jvm_gc_count | Number of garbage collections | Hadoop:service=DataNode,name=JvmMetrics::GcCount |
spectrumscale_hdfs_jvm_gc_time_ms | Total time spent garbage collecting. | Hadoop:service=DataNode,name=JvmMetrics::GcTimeMillis |
spectrumscale_hdfs_jvm_heap_committed_mb | Total amount of committed heap memory. | Hadoop:service=DataNode,name=JvmMetrics::MemHeapCommittedM |
spectrumscale_hdfs_jvm_heap_used_mb | Total amount of used heap memory. | Hadoop:service=DataNode,name=JvmMetrics::MemHeapUsedM |
spectrumscale_hdfs_jvm_max_memory_mb | Maximum allowed memory. | Hadoop:service=DataNode,name=JvmMetrics::MemMaxM |
spectrumscale_hdfs_jvm_new_threads | New threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsNew |
spectrumscale_hdfs_jvm_non_heap_committed_mb | Total amount of committed non-heap memory. | Hadoop:service=DataNode,name=JvmMetrics::MemNonHeapCommittedM |
spectrumscale_hdfs_jvm_non_heap_used_mb | Total amount of used non-heap memory. | Hadoop:service=DataNode,name=JvmMetrics::MemNonHeapUsedM |
spectrumscale_hdfs_jvm_pause_time | The amount of extra time the jvm was paused above the requested sleep time. The JVM pause monitor sleeps for 500 milliseconds and any extra time it waited above this is counted in the pause time. | Hadoop:service=DataNode,name=JvmMetrics::GcTotalExtraSleepTime |
spectrumscale_hdfs_jvm_pauses_info_threshold_count | Number of JVM pauses longer than the info threshold but shorter than the warning threshold. By default the info threshold is set to 1 second. To change use this configuration key JvmPauseMonitorService.info-threshold.ms | Hadoop:service=DataNode,name=JvmMetrics::GcNumInfoThresholdExceeded |
spectrumscale_hdfs_jvm_pauses_warn_threshold_count | Number of JVM pauses longer than the warning threshold. By default the warning threshold is set to 10 second. To change use this configuration key JvmPauseMonitorService.warn-threshold.ms | Hadoop:service=DataNode,name=JvmMetrics::GcNumWarnThresholdExceeded |
spectrumscale_hdfs_jvm_runnable_threads | Runnable threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsRunnable |
spectrumscale_hdfs_jvm_terminated_threads | Terminated threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsTerminated |
spectrumscale_hdfs_jvm_timed_waiting_threads | Timed waiting threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsTimedWaiting |
spectrumscale_hdfs_jvm_waiting_threads | Waiting threads | Hadoop:service=DataNode,name=JvmMetrics::ThreadsWaiting |
spectrumscale_hdfs_log_error | Logged Errors | Hadoop:service=DataNode,name=JvmMetrics::LogError |
spectrumscale_hdfs_log_fatal | Logged Fatals | Hadoop:service=DataNode,name=JvmMetrics::LogFatal |
spectrumscale_hdfs_log_info | Logged Infos | Hadoop:service=DataNode,name=JvmMetrics::LogInfo |
spectrumscale_hdfs_log_warn | Logged Warnings | Hadoop:service=DataNode,name=JvmMetrics::LogWarn |
spectrumscale_hdfs_login_failure_avg_time | Average Failed Login Time | Hadoop:service=DataNode,name=UgiMetrics::LoginFailureAvgTime |
spectrumscale_hdfs_login_failure_num_ops | Login Failures | Hadoop:service=DataNode,name=UgiMetrics::LoginFailureNumOps |
spectrumscale_hdfs_login_success_avg_time | Average Successful Login Time | Hadoop:service=DataNode,name=UgiMetrics::LoginSuccessAvgTime |
spectrumscale_hdfs_login_success_num_ops | Login Successes | Hadoop:service=DataNode,name=UgiMetrics::LoginSuccessNumOps |
spectrumscale_hdfs_metrics_dropped_pub_all | Dropped Metrics Updates By All Sinks | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::DroppedPubAll |
spectrumscale_hdfs_metrics_num_active_sinks | Active Metrics Sinks Count | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumActiveSinks |
spectrumscale_hdfs_metrics_num_active_sources | Active Metrics Sources Count | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumActiveSources |
spectrumscale_hdfs_metrics_num_all_sinks | All Metrics Sinks Count | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumAllSinks |
spectrumscale_hdfs_metrics_num_all_sources | All Metrics Sources Count | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::NumAllSources |
spectrumscale_hdfs_metrics_publish_avg_time | Metrics Publish Average Time | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::PublishAvgTime |
spectrumscale_hdfs_metrics_publish_num_ops | Metrics Publish Operations | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::PublishNumOps |
spectrumscale_hdfs_metrics_snapshot_avg_time | Metrics Snapshot Average Time | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::SnapshotAvgTime |
spectrumscale_hdfs_metrics_snapshot_num_ops | Metrics Snapshot Average Operations | Hadoop:service=DataNode,name=MetricsSystem,sub=Stats::SnapshotNumOps |
spectrumscale_hdfs_rpc_authentication_failures | RPC Authentication Failures | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthenticationFailures |
spectrumscale_hdfs_rpc_authentication_successes | RPC Authentication Successes | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthenticationSuccesses |
spectrumscale_hdfs_rpc_authorization_failures | RPC Authorization Failures | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthorizationFailures |
spectrumscale_hdfs_rpc_authorization_successes | RPC Authorization Successes | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcAuthorizationSuccesses |
spectrumscale_hdfs_rpc_call_queue_length | RPC Call Queue Length | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::CallQueueLength |
spectrumscale_hdfs_rpc_num_open_connections | Open RPC Connections | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::NumOpenConnections |
spectrumscale_hdfs_rpc_processing_time_avg_time | Average RPC Processing Time | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcProcessingTimeAvgTime |
spectrumscale_hdfs_rpc_processing_time_num_ops | RPCs Processed | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcProcessingTimeNumOps |
spectrumscale_hdfs_rpc_queue_time_avg_time | Average RPC Queue Time | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcQueueTimeAvgTime |
spectrumscale_hdfs_rpc_queue_time_num_ops | RPCs Queued | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::RpcQueueTimeNumOps |
spectrumscale_hdfs_rpc_received_bytes | RPC Received Bytes | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::ReceivedBytes |
spectrumscale_hdfs_rpc_sent_bytes | RPC Sent Bytes | Hadoop:service=DataNode,name=RpcActivityForPort\\d+::SentBytes |
spectrumscale_hdfs_xceivers | Transceivers | Hadoop:service=DataNode,name=DataNodeInfo::XceiverCount |
spectrumscale_hdfs_connections | Current number of connections to NameNode | Hadoop:service=NameNode,name=FSNamesystem::TotalLoad |
spectrumscale_hdfs_fsnamesystem_lockqueuelength | Number of threads waiting to acquire FSNameSystem lock | Hadoop:service=NameNode,name=FSNamesystem::LockQueueLength |
spectrumscale_hdfs_active_connection_holdinglease | Number of active clients holding lease | Hadoop:service=NameNode,name=FSNamesystem::NumActiveClients |
spectrumscale_hdfs_state | Current state of the file system: Safemode or Operational | Hadoop:service=NameNode,name=FSNamesystem::FSState |
spectrumscale_hdfs_ha_state | Current state of the NameNode: initializing or active or standby or stopping state | Hadoop:service=NameNode,name=FSNamesystem::tag.HAState |
spectrumscale_hdfs_rpc_queue_time_num_ops | RPCs Queued | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcQueueTimeNumOps |
spectrumscale_hdfs_rpc_queue_time_avg_time | Average RPC Queue Time | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcQueueTimeAvgTime |
spectrumscale_hdfs_rpc_processing_time_num_ops | RPCs Processed | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcProcessingTimeNumOps |
spectrumscale_hdfs_rpc_processing_time_avg_time | Average RPC Processing Time | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::RpcProcessingTimeAvgTime |
spectrumscale_hdfs_rpc_call_queue_length | RPC Call Queue Length | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::CallQueueLength |
spectrumscale_hdfs_rpc_num_open_connections | Open RPC Connections | Hadoop:service=NameNode,name=RpcActivityForPort\\d+::NumOpenConnections |