Cluster state events
The following table lists the events that are created for the Cluster state component.
Event | Event Type |
Severity | Call Home | Details |
---|---|---|---|---|
cluster_state_manager_resend | INFO | INFO | no | Message: The CSM requests resending all information. |
Description: The CSM requests resending all information. | ||||
Cause: The CSM is missing information about this node. | ||||
User Action: N/A | ||||
cluster_state_manager_reset | INFO | INFO | no | Message: Clear memory of cluster state manager for this node. |
Description: A reset request for the monitor state manager was received. | ||||
Cause: A reset request for the monitor state manager was received. | ||||
User Action: N/A | ||||
component_state_change | INFO | INFO | no | Message: The state of component {0} changed to {1}. |
Description: The state of a component changed. | ||||
Cause: An event was detected by the system health framework that triggered a state change for a component. | ||||
User Action: N/A | ||||
entity_state_change | INFO | INFO | no | Message: The state of {0} {1} of the component {2} changed to {3}. |
Description: The state of an entity changed. | ||||
Cause: An event was detected by the system health framework that triggered a state change for an entity. | ||||
User Action: N/A | ||||
eventlog_cleared | INFO | INFO | no | Message: On the node {0}, the eventlog was cleared. |
Description: The user cleared the eventlog with the mmhealth node eventlog --clearDB command. This command also clears the events of the mmces events list command. | ||||
Cause: The user cleared the eventlog. | ||||
User Action: N/A | ||||
heartbeat | STATE_CHANGE | INFO | no | Message: Node {0} sent a heartbeat. |
Description: The node is alive. | ||||
Cause: The cluster node sent a heartbeat to the CSM. | ||||
User Action: N/A | ||||
heartbeat_missing | STATE_CHANGE | ERROR | no | Message: CSM is missing a heartbeat from the node {0}. |
Description: The Cluster State Manager (CSM) is missing a heartbeat from the specified node. | ||||
Cause: The specified cluster node did not send a heartbeat to the Cluster State Manager (CSM). | ||||
User Action: Check network connectivity of the node. Check whether the monitor is running there (by using the mmsysmoncontrol status command). | ||||
heartbeat_missing_server_unreachable | STATE_CHANGE | ERROR | no | Message: CSM is missing a heartbeat from node {0}, which might be due to the node, the network, and the processes being down. |
Description: The Cluster State Manager (CSM) is missing a heartbeat from the specified node, which might be because the server or network is down. | ||||
Cause: The specified cluster node cannot be contacted to have it send a heartbeat to the Cluster State Manager (CSM). | ||||
User Action: Check network connectivity to the node. Check the operational state of the node. Check that the IBM Storage Scale processes are running and communicating with the cluster (mmgetstate -Lv, mmsysmoncontrol status). | ||||
node_resumed | STATE_CHANGE | INFO | no | Message: Node {0} is not suspended anymore. |
Description: The node is resumed after it was suspended. | ||||
Cause: The cluster node was resumed after being suspended. | ||||
User Action: N/A | ||||
node_state_change | INFO | INFO | no | Message: The state of this node is changed to {0}. |
Description: The state of this node changed. | ||||
Cause: An event was detected by the system health framework that triggered a state change for this node. | ||||
User Action: N/A | ||||
node_suspended | STATE_CHANGE | INFO | no | Message: Node {0} is suspended. |
Description: The node is suspended. | ||||
Cause: The cluster node is now suspended. | ||||
User Action: Run the mmces node resume command to stop the node from being suspended. | ||||
service_added | INFO | INFO | no | Message: On the node {0}, the {1} monitor was started. |
Description: A new monitor was started by the Sysmonitor daemon. | ||||
Cause: A new monitor was started. | ||||
User Action: N/A | ||||
service_disabled | STATE_CHANGE | INFO | no | Message: The service {0} is disabled. |
Description: The service is disabled. | ||||
Cause: The service is disabled. | ||||
User Action: Run the mmces service enable <service> command to enable the CES service or resume monitoring by using the mmhealth config monitor resume command. | ||||
service_no_pod_data | STATE_CHANGE | WARNING | no | Message: A request to {id} did not yield expected health data. |
Description: A check on the service did not work. | ||||
Cause: The service is running in a different POD and does not respond to requests that are regarding its health state. | ||||
User Action: Check that all pods are running in the container environment. The event can be manually cleared by using the mmhealth event resolve service_no_pod_data <id> command. | ||||
service_pod_data | STATE_CHANGE | INFO | no | Message: The request to {id} did return health data as expected. |
Description: A check on the service did work as expected. | ||||
Cause: The service is running in a different POD and does respond to requests that are regarding its health state. | ||||
User Action: N/A | ||||
service_removed | INFO | INFO | no | Message: On the node {0} the {1} monitor was removed. |
Description: A monitor was removed by Sysmonitor. | ||||
Cause: A monitor was removed. | ||||
User Action: N/A | ||||
service_reset | STATE_CHANGE | INFO | no | Message: The service {0} on node {1} was reconfigured, and its events were cleared. |
Description: All current service events were cleared. | ||||
Cause: The service was reconfigured. | ||||
User Action: N/A | ||||
service_running | STATE_CHANGE | INFO | no | Message: The service {0} is running on node {1}. |
Description: The service is not stopped or disabled anymore. | ||||
Cause: The service is not stopped or disabled anymore. | ||||
User Action: N/A | ||||
service_stopped | STATE_CHANGE | INFO | no | Message: The service {0} is stopped on node {1}. |
Description: The service is stopped. | ||||
Cause: The service was stopped. | ||||
User Action: Run the mmces service start <service> command to start the service. | ||||
singleton_sensor_off | INFO | INFO | no | Message: The singleton sensors of pmsensors are turned off. |
Description: The pmsensors' configuration is reloaded. This node is not configured to start the singleton sensors anymore. | ||||
Cause: The following node was assigned as singleton sensor before. However, it does not satisfy the requirements for a singleton sensor anymore (perfmon designation, PERFMON component HEALTHY, GPFS component HEALTHY). | ||||
User Action: N/A | ||||
singleton_sensor_on | INFO | INFO | no | Message: The singleton sensors of pmsensors are turned on. |
Description: The pmsensors' configuration is reloaded. This node is now configured to start the singleton sensors. | ||||
Cause: Another node was assigned as a singleton sensor before. However, it does not satisfy the requirements for a singleton sensor anymore (perfmon designation, PERFMON component HEALTHY, GPFS component HEALTHY). This node was assigned as new singleton sensor node. | ||||
User Action: N/A | ||||
webhook_url_abort | INFO | WARNING | no | Message: Webhook URL {0} was disabled because a fatal runtime error was encountered. For more information, see the monitoring logs in /var/adm/ras/mmsysmonitor.log. |
Description: The system health framework encountered a fatal runtime error that forced it to stop activity to this webhook URL. | ||||
Cause: The system health framework encountered a fatal runtime error when it was sending events to a webhook URL. | ||||
User Action: Check that the webhook URL is reachable and re-enable the URL by using the mmhealth config webhook add command. | ||||
webhook_url_communication | INFO | INFO | no | Message: Webhook URL {0} was not able to receive event information. |
Description: The system health framework was not able to send event information to a configured webhook URL. | ||||
Cause: The system health framework was not able to send event information. | ||||
User Action: N/A | ||||
webhook_url_disabled | INFO | WARNING | no | Message: Webhook URL {0} was disabled as too many failures occurred. |
Description: The system health framework could not repeatedly contact a webhook URL. | ||||
Cause: The system health framework could not repeatedly contact a webhook URL. | ||||
User Action: Check that the webhook URL is reachable and re-enable the URL by using the mmhealth config webhook add command. | ||||
webhook_url_reset | INFO | INFO | no | Message: Webhook URL {0} communication was set back to a HEALTHY state. |
Description: The system health framework set this webhook URL status back to a HEALTHY state after being disabled because of repeated failures. | ||||
Cause: The system health framework set this webhook URL status back to a HEALTHY state. | ||||
User Action: N/A | ||||
webhook_url_restored | INFO | INFO | no | Message: Webhook URL {0} communication was restored and event information was successfully sent. |
Description: The system health framework was able to send event information to the webhook URL after a previous failure. | ||||
Cause: The system health framework was able to send event information to the webhook URL. | ||||
User Action: N/A | ||||
webhook_url_ssl_validation | INFO | WARNING | no | Message: Communication to webhook URL {} was established, but Server-Side certificate validation failed and was disabled. Check the HTTPS server configuration to ensure that this disabling is the intended behavior. |
Description: The system health framework failed to validate the Server-Side certificate. | ||||
Cause: The system health framework failed to validate the Server-Side certificate. | ||||
User Action: Check that the webhook URL has a valid SSL certificate and re- enable the URL by using the mmhealth config webhook add command. |