Network Manager failover and failback
Failover can be initiated by either the primary or backup domain, and is triggered when a health check problem event is generated for the primary domain. Failback is triggered by a subsequent health check resolution event for the primary domain.
An ItnmFailover event is generated by ncp_virtualdomain when a Network Manager domain fails over or fails back.
Failing over
When failover occurs, the primary Network Manager domain goes into standby mode (if it is still running), and the backup domain becomes active.
The following changes occur when the backup domain becomes active:
- The Event Gateway synchronizes the events with the ObjectServer.
- The ncp_poller process resumes polling.
- The Event Gateway switches from the standby filter (StandbyEventFilter) to the incoming event filter (EventFilter).
- Network Manager continues to monitor the network and perform RCA. However, network discovery is not performed, and the network topology remains static.
When a primary Network Manager server goes into standby mode, the following changes occur:
- The Event Gateway switches from the incoming event filter (EventFilter) to the standby filter (StandbyEventFilter).
- The ncp_poller process suspends all polls.
- If it is then the first Apache Storm server goes into a standby state.
- if not, then the Storm assumes the responsibility of processing the poll data.
Failing back
When a primary Network Manager server in standby mode resumes normal operation, it generates a health check resolution event.
The health check resolution event passes through the system, and the recovered Network Manager server becomes active again.
When the Virtual Domain process on the backup Network Manager server receives the health check resolution event, Virtual Domain switches the backup server back to standby mode.
The GenericClear automation in the ObjectServer is triggered by the health check resolution event, and clears the existing health check problem event.