Event type and monitoring status for system health
An event might trigger a change in the state of a system.
- State-changing events: The state-changing events change the state of a component or entity from
good to bad or from bad to good depending on the corresponding state of the event. Note: An event is raised when the health status of the component goes from good to bad. For example, an event is raised that changes the status of a component from
HEALTHY
toDEGRADED
. However, if the state was alreadyDEGRADED
based on another active event, there is no change in the status of the component. Also, when the state of the entity wasFAILED
, aDEGRADED
event would not change the component's state because aFAILED
status is more dominant than theDEGRADED
status. - Tip: The tips are similar to state-changing events, but can be hidden by the user. Like
state-changing events, a tip is removed automatically when the problem is resolved. A tip event
always changes the state to of a component from
HEALTHY
toTIPS
if the event is not hidden.Note: If the state of a component changes toTIPS
, it can be hidden. However, you can still view the active hidden events by using the mmhealth node show ComponentName --verbose command, if the cause for the event still exists. - Information events: The information events are notices that are shown in the event log or in brackets in the mmhealth node show command. They do not change the state of the component. They disappear after 24 hours or when they are resolved by the mmhealth event resolve command.
The monitoring interval is 15 - 30 seconds, depending on the component. However, the services that are monitored less often (for example, once per 30 minutes), save the system resources. You can find more information about the events from the IBM Storage Scale GUI or by issuing the mmhealth event show command.
page in theUNKNOWN
- Status of the node or the service that is hosted on the node is not knownbecause of a problem with monitoring. In most cases, this is accompanied by an exception in the /var/adm/ras/mmsysmonitor.log file where the root cause of the problem is described.
HEALTHY
- The node or the service that is hosted on the node is working as expected. There are no active error events.CHECKING
- The monitoring of a service or a component that is hosted on the node is starting at the moment. This state is a transient state, which changes to another state when the mmsysmon daemon initialization is completed.TIPS
- An issue might be reported with the configuration and tuning of the components. This status is only assigned to a tip event.DEGRADED
- The node or the service that is hosted on the node is not working as expected.This means that a problem with the component did not cause a complete component failure.
FAILED
- The node or the service that is hosted on the node failed due to errors or cannot be reached anymore.DEPEND
- The node or the services that are hosted on the node failed due to the failure of some components. For example, an NFS or SMB service shows this status whether authentication failed.Figure 1. IBM Storage Scale components dependency
The status is graded as follows: HEALTHY
< TIPS
<
DEGRADED
< FAILED
. For example, the status of the service that
is hosted on a node becomes FAILED
if there is at least one active event in the
FAILED
status for that corresponding service. The FAILED
status
gets more priority than the DEGRADED
, which is followed by TIPS
and then HEALTHY
, while setting the status of the service. That is, if a service
has an active event with a HEALTHY
status and another active event with a
FAILED
status, then the system sets the status of the service as
FAILED
.
Some directed maintenance procedures or DMPs are available to solve issues caused by tip events. For information, see Directed maintenance procedures for tip events.
New encryption events are added that are identified by their unique ID. Events with different IDs can be raised multiple times, but they are listed only once for each unique ID. Therefore, multiple events can be displayed at the same time, but only one for each unique ID, regardless of how many times they are raised.
These events are cleared by using the mmhealth event resolve <event name> <event id> command.
For more information, see the Encryption events.