Cluster state events

The following table lists the events that are created for the Cluster state component.

Table 1. Events for the cluster state component
Event Event
Type
Severity Call Home Details
cluster_state_manager_resend INFO INFO no Message: The CSM requests resending all information.
Description: The CSM requests resending all information.
Cause: The CSM is missing information about this node.
User Action: N/A
cluster_state_manager_reset INFO INFO no Message: Clear memory of cluster state manager for this node.
Description: A reset request for the monitor state manager was received.
Cause: A reset request for the monitor state manager was received.
User Action: N/A
component_state_change INFO INFO no Message: The state of component {0} changed to {1}.
Description: The state of a component changed.
Cause: An event was detected by the system health framework that triggered a state change for a component.
User Action: N/A
entity_state_change INFO INFO no Message: The state of {0} {1} of the component {2} changed to {3}.
Description: The state of an entity changed.
Cause: An event was detected by the system health framework that triggered a state change for an entity.
User Action: N/A
eventlog_cleared INFO INFO no Message: On the node {0}, the eventlog was cleared.
Description: The user cleared the eventlog with the mmhealth node eventlog --clearDB command. This command also clears the events of the mmces events list command.
Cause: The user cleared the eventlog.
User Action: N/A
heartbeat STATE_CHANGE INFO no Message: Node {0} sent a heartbeat.
Description: The node is alive.
Cause: The cluster node sent a heartbeat to the CSM.
User Action: N/A
heartbeat_missing STATE_CHANGE ERROR no Message: CSM is missing a heartbeat from the node {0}.
Description: The Cluster State Manager (CSM) is missing a heartbeat from the specified node.
Cause: The specified cluster node did not send a heartbeat to the Cluster State Manager (CSM).
User Action: Check network connectivity of the node. Check whether the monitor is running there (by using the mmsysmoncontrol status command).
heartbeat_missing_server_unreachable STATE_CHANGE ERROR no Message: CSM is missing a heartbeat from node {0}, which might be due to the node, the network, and the processes being down.
Description: The Cluster State Manager (CSM) is missing a heartbeat from the specified node, which might be because the server or network is down.
Cause: The specified cluster node cannot be contacted to have it send a heartbeat to the Cluster State Manager (CSM).
User Action: Check network connectivity to the node. Check the operational state of the node. Check that the IBM Storage Scale processes are running and communicating with the cluster (mmgetstate -Lv, mmsysmoncontrol status).
node_resumed STATE_CHANGE INFO no Message: Node {0} is not suspended anymore.
Description: The node is resumed after it was suspended.
Cause: The cluster node was resumed after being suspended.
User Action: N/A
node_state_change INFO INFO no Message: The state of this node is changed to {0}.
Description: The state of this node changed.
Cause: An event was detected by the system health framework that triggered a state change for this node.
User Action: N/A
node_suspended STATE_CHANGE INFO no Message: Node {0} is suspended.
Description: The node is suspended.
Cause: The cluster node is now suspended.
User Action: Run the mmces node resume command to stop the node from being suspended.
service_added INFO INFO no Message: On the node {0}, the {1} monitor was started.
Description: A new monitor was started by the Sysmonitor daemon.
Cause: A new monitor was started.
User Action: N/A
service_disabled STATE_CHANGE INFO no Message: The service {0} is disabled.
Description: The service is disabled.
Cause: The service is disabled.
User Action: Run the mmces service enable <service> command to enable the CES service or resume monitoring by using the mmhealth config monitor resume command.
service_no_pod_data STATE_CHANGE WARNING no Message: A request to {id} did not yield expected health data.
Description: A check on the service did not work.
Cause: The service is running in a different POD and does not respond to requests that are regarding its health state.
User Action: Check that all pods are running in the container environment. The event can be manually cleared by using the mmhealth event resolve service_no_pod_data <id> command.
service_pod_data STATE_CHANGE INFO no Message: The request to {id} did return health data as expected.
Description: A check on the service did work as expected.
Cause: The service is running in a different POD and does respond to requests that are regarding its health state.
User Action: N/A
service_removed INFO INFO no Message: On the node {0} the {1} monitor was removed.
Description: A monitor was removed by Sysmonitor.
Cause: A monitor was removed.
User Action: N/A
service_reset STATE_CHANGE INFO no Message: The service {0} on node {1} was reconfigured, and its events were cleared.
Description: All current service events were cleared.
Cause: The service was reconfigured.
User Action: N/A
service_running STATE_CHANGE INFO no Message: The service {0} is running on node {1}.
Description: The service is not stopped or disabled anymore.
Cause: The service is not stopped or disabled anymore.
User Action: N/A
service_stopped STATE_CHANGE INFO no Message: The service {0} is stopped on node {1}.
Description: The service is stopped.
Cause: The service was stopped.
User Action: Run the mmces service start <service> command to start the service.
singleton_sensor_off INFO INFO no Message: The singleton sensors of pmsensors are turned off.
Description: The pmsensors' configuration is reloaded. This node is not configured to start the singleton sensors anymore.
Cause: The following node was assigned as singleton sensor before. However, it does not satisfy the requirements for a singleton sensor anymore (perfmon designation, PERFMON component HEALTHY, GPFS component HEALTHY).
User Action: N/A
singleton_sensor_on INFO INFO no Message: The singleton sensors of pmsensors are turned on.
Description: The pmsensors' configuration is reloaded. This node is now configured to start the singleton sensors.
Cause: Another node was assigned as a singleton sensor before. However, it does not satisfy the requirements for a singleton sensor anymore (perfmon designation, PERFMON component HEALTHY, GPFS component HEALTHY). This node was assigned as new singleton sensor node.
User Action: N/A
webhook_url_abort INFO WARNING no Message: Webhook URL {0} was disabled because a fatal runtime error was encountered. For more information, see the monitoring logs in /var/adm/ras/mmsysmonitor.log.
Description: The system health framework encountered a fatal runtime error that forced it to stop activity to this webhook URL.
Cause: The system health framework encountered a fatal runtime error when it was sending events to a webhook URL.
User Action: Check that the webhook URL is reachable and re-enable the URL by using the mmhealth config webhook add command.
webhook_url_communication INFO INFO no Message: Webhook URL {0} was not able to receive event information.
Description: The system health framework was not able to send event information to a configured webhook URL.
Cause: The system health framework was not able to send event information.
User Action: N/A
webhook_url_disabled INFO WARNING no Message: Webhook URL {0} was disabled as too many failures occurred.
Description: The system health framework could not repeatedly contact a webhook URL.
Cause: The system health framework could not repeatedly contact a webhook URL.
User Action: Check that the webhook URL is reachable and re-enable the URL by using the mmhealth config webhook add command.
webhook_url_reset INFO INFO no Message: Webhook URL {0} communication was set back to a HEALTHY state.
Description: The system health framework set this webhook URL status back to a HEALTHY state after being disabled because of repeated failures.
Cause: The system health framework set this webhook URL status back to a HEALTHY state.
User Action: N/A
webhook_url_restored INFO INFO no Message: Webhook URL {0} communication was restored and event information was successfully sent.
Description: The system health framework was able to send event information to the webhook URL after a previous failure.
Cause: The system health framework was able to send event information to the webhook URL.
User Action: N/A
webhook_url_ssl_validation INFO WARNING no Message: Communication to webhook URL {} was established, but Server-Side certificate validation failed and was disabled. Check the HTTPS server configuration to ensure that this disabling is the intended behavior.
Description: The system health framework failed to validate the Server-Side certificate.
Cause: The system health framework failed to validate the Server-Side certificate.
User Action: Check that the webhook URL has a valid SSL certificate and re- enable the URL by using the mmhealth config webhook add command.