SevOne Best Practices Guide - Cluster, Peer, and HSA
About
This document provides conceptual information about your SevOne cluster implementation, peers, and Hot Standby Appliance (HSA) technologies. The intent is to help you prepare and troubleshoot a multi-appliance SevOne NMS cluster. Please refer to SevOne NMS Installation Guide on how to rack up SevOne appliances and SevOne NMS Implementation Guide on how to begin using SevOne Network Management System (NMS).
In this guide if there is,
- [any reference to master] OR
- [[if a CLI command contains master] AND/OR
- [its output contains master]],
it means leader.
And, if there is any reference to slave, it means follower.
SevOne Scalability
SevOne NMS is a scalable system that is able to suit any network needs. SevOne appliances work together in an environment that uses a transparent peer-to-peer technology. You can peer multiple SevOne appliances together to create a SevOne cluster that can monitor an unlimited number of network elements and flow data. The number of peers in a cluster must not exceed 200 peers; this includes the HSAs. This limit is due to MySQL replication maintainer.
There are several models of the appliances that run the SevOne software.
- PAS - Performance Appliance System (PAS) appliances can be rated to monitor up to 200,000 network elements.
- DNC - Dedicated NetFlow Collector (DNC) appliances monitor flow technologies. Standard DNC supports up to 80K flows per second whereas, DNC HF supports up to 200K flows per second.
- HSA - Hot Standby Appliances (HSA) enable you to create a pair of appliances that act as one peer to provide redundancy.
SevOne appliances scale linearly to monitor the world’s largest networks. Each SevOne appliance is both a collector and a reporter. In a multi-peer cluster, all appliances are aware of which peer monitors each device and can efficiently route data requests to the user. The system is flexible enough to enable indicators from different devices to appear on the same graph when different SevOne peers monitor the devices.
The peer-to-peer architecture not only scales linearly, without loss of efficiency as you add additional peers, but actually gets faster. When a SevOne peer receives a request from one of its peers, it processes the request locally and sends the requesting peer only the data it needs to complete the report. Thus if a report spans multiple peers, they all work on the report simultaneously to retrieve the results much faster than any single appliance can.
You can create redundancy for each peer with a Hot Standby Appliance that works in tandem with the active appliance to create a peer appliance pair. If the active appliance in the Hot Standby Appliance peer pair fails, the passive appliance in the peer pair takes over. The passive appliance in the Hot Standby Appliance peer pair maintains a redundant copy of all poll data and configuration data for the peer pair. If a failover occurs, the transition from passive appliance to active appliance is transparent to all users and all polling and reporting continues seamlessly.
Each SevOne cluster has a Cluster Leader peer. The cluster leader peer is the SevOne appliance that stores the master copy of Cluster Manager settings, security settings, and other global settings in the config database. All other active peers in your SevOne cluster pull the configuration data from the leader peer config database. Some security settings and device edit functions are dependent upon the communication between the active peers in the cluster and the cluster manager peer. The cluster manager appliance hardware and software is no different from any other peer appliance so you can designate any appliance to be the cluster manager. Considerations for the cluster leader are geographic location and data center stability that affect latency and power supply which should be considered for any computer network implementation.
From the user point of view, there is only one peer because each peer in the cluster presents all data, no matter which peer collects it, with no reference to the peer that collects the data. You can log on to any SevOne peer in your cluster and retrieve data that spans any or all devices in your network.
All SevOne communication protocols assume a single hop adjacency which means that a request or replication must be directly sent to the IP address of the destination SevOne peer and cannot be routed in any way by the members of the cluster.
All peers must be able to communicate with the Cluster Leader. Peers can only report and move devices to peers that it can communicate with. The SevOne cluster is designed to be able to gracefully operate in a degraded state when one or more of the peers are disconnected from the rest of the cluster.
- A peer that cannot communicate with the cluster leader peer can run reports that return results for all data available on the peers with which it maintains communication.
- A peer that cannot communicate with the cluster leader cannot make any configuration changes for the duration of being disconnected.
- The peers in the cluster that remain in communication with the cluster leader can make configuration changes and can run report for all data available on peers with which they maintain communication.
There are two types of SevOne cluster deployment architectures:
- Full Mesh - (Recommended) All peers can talk to each other. Recommended for enterprises and service providers to monitor their internal network.
- Hub and Spoke - (Not Recommended due to many caveats, but supported) All peers can
communicate with the Cluster Leader. Peers can only report and move devices to peers that it can
communicate with. Subsets of the other interconnected peers can be defined. Useful for managed
service providers. Hub and spoke implementations provide the following:
- Reports run from the cluster leader peer return fully global result sets.
- Reports run from a peer in a partially connected subset of peers return result sets from itself, the cluster leader peer, and any other peers with which the peer can directly communicate.
- Most configuration changes at both the peer level and the cluster level can be accomplished from all peers.
- Updates and other administrative functions that affect all peers must be run from the cluster leader peer.
Database Replication Explanation
The SevOne NMS application peer-to-peer architecture has two fundamental databases.
- Config Database - The config database stores configuration settings such as cluster settings, security settings, device settings, etc. SevOne NMS saves the configuration settings you define (on any peer in the cluster) in the config database on cluster leader peer. All active appliances in the cluster pull config database changes from the cluster leader peer's config database. Each passive appliance in a Hot Standby Appliance peer pair pulls its active appliance's config database to replicate the config database onto the passive peer.
- Data Database - The data database stores a copy of the config database plus all poll data for the devices/objects that the peer polls. The config database on an active appliance replicates to the data database on the appliance. Each passive appliance in a Hot Standby Appliance peer pair pulls the active appliance's data database to replicate the data database on the passive appliance.
Peer Port Assignments
Please refer to SevOne NMS Port Number Requirements Guide for details on all required port numbers.
REST API Ports
The appropriate port(s) must be open for access across all SevOne NMS peers in the cluster to ensure proper operation. Please refer to SevOne NMS Port Number Requirements Guide for a list of used ports.
To check the REST API version, SSH into your SevOne NMS appliance or cluster and execute the following command.
$ podman exec -it nms-nms-nms /bin/bash
$ SevOne-show-version
Manage Peers
In a single appliance/single peer cluster implementation, all relevant information (cluster leader, discovery, alerts, reports, etc.) are all local to the one appliance. For a single peer cluster the SevOne appliance ships with the following configuration:
- Name: SevOne Appliance
- Leader: Yes
- IP: 127.0.0.1
In a multi-peer cluster implementation, one appliance in the cluster is the cluster leader peer. The cluster leader peer receives configuration data such as cluster settings, security settings, device peer assignments, archived alerts, and flow device lists. The cluster leader peer software is identical to the software on all other peers which enables you to designate any SevOne NMS appliance to be the cluster leader peer. After you determine which peer is to be the cluster leader then that peer remains the leader of the cluster and only SevOne Support engineers can change the cluster leader peer designation.
All active peers in the cluster replicate the configuration data from the cluster leader peer. All active SevOne NMS appliances in a cluster are peers of each other. One of the appliances in a Hot Standby Appliance peer pair is a passive appliance. Please refer to section Hot Standby Appliances on how to include Hot Standby Appliance peer pairs in your cluster.
All active peers perform the following actions.
- Provide a full web interface to the entire SevOne NMS cluster.
- Update the cluster leader peer with configuration changes.
- Pull config database changes from the cluster leader peer.
- Communicate with other peers to get non-local data.
- Discover and/or delete the devices you assign to the peer.
- Poll data from the objects and indicators on the devices you assign to the peer.
When you add a network device for SevOne NMS to monitor, you assign the device to a specific SevOne NMS peer. The assigned peer lets the cluster mater know that it (the assigned peer) is responsible for the device. The cluster leader maintains the device assignment list in the config database which is replicated to the other peers in the cluster. From this point forward, all modifications to the device, including discovery, polling, and deletion are the responsibility of the SevOne NMS peer to which you assign the device. The peer collects and stores all data for the device. This is in direct contrast to the way that other network management systems collect data. They collect data remotely and then ship the data back to a leader server for storage. SevOne NMS stores all local data on the local peer which enables for greater link efficiency and speed.
Join Cluster
For a new appliance, follow the steps in the SevOne NMS Installation Guide to rack up the appliance and to assign the appliance an IP address. The SevOne NMS Implementation Guide describes the steps to log on to the new appliance. On the Startup Wizard, select the Add This Appliance to an Existing SevOne NMS Cluster and click Next to navigate to the Cluster Manager at the cluster level. For details on how to join a peer to the cluster, please refer to section Peers in SevOne NMS System Administration Guide.
Port Checker
Ports can be checked by using a standalone Command Line Interface utility or from the SOA REST API. It allows you to check the status of a predefined list of pings (specifically, ICMP ping), TCP ports, and UDP ports.
Command Line Interface
The following command provides help on how to check the peers in a cluster from SevOne NMS > Command Line Interface utility.
$ podman exec -it nms-nms-nms /bin/bash
Provides a list of options / flags available
$ SevOne-act check peers --help
To test a connection to a specific remote peer which is not part of the cluster, you must be in incorporate mode. Please use --peer-ips option.
$ SevOne-act check peers --enable-logging true --peer-ips <Remote Peer IP address>
Example
$ SevOne-act check peers --enable-logging true --peer-ips 10.168.136.75
A detailed example
$ SevOne-act check peers --enable-logging true --udp-ports 53 \
--tcp-ports 22 --port-timeout 3000 --port-try-count 4 \
--strip-success false
SOA REST API
To perform administrative checks, enter https://<IP address>/api/v3/docs in your browser > select SEVONE.API.V3.ADMINISTRATION. For example, for a cluster-wide mesh port check between all peers, click on PortCheckCluster request. By default, UDP port 18063 is used internally - this port must be open otherwise all UDP ports will appear blocked. Alternatively, specify a communication port, commPort, which is open and unused.
Please refer to SevOne NMS Port Number Requirements Guide for the expected ports to be in open status.
Update Default Port List
To update the default port list to include the new ports or to remove the old ones, modify the following MySQL tables.
- net.port_checker_tcp
- net.port_checker_udp
Troubleshooting
When addition to a new peer fails, execute the following command to return a list of ports failing between the peers.
$ podman exec -it nms-nms-nms /bin/bash
$ SevOne-act check peers --enable-logging true --peer-ips <Remote Peer IP address>
Once you have the list of failing ports between the peers, open them in your firewall. Or, although not recommended, you may remove the failing ports from the default port list if necessary.
Port checking can be disabled by removing all entries from MySQL database. For details, please see Update Default Port List. When port checking is disabled and if you attempt to join a peer to a cluster, it will disable all port checking and the peer may peer improperly. Also, all port checks in the future will succeed even if they should not.
Leave Cluster
A peer can leave the cluster. However, caution must be used when considering the use of this feature. For details on how a peer can leave the cluster, please refer to section Peers in SevOne NMS System Administration Guide.
Hot Standby Appliances
SevOne appliances can collect gigabytes of data each day. You can create redundancy for each peer with a Hot Standby Appliance to create a peer that is made up of a pair of appliances. If the active appliance in the peer pair fails there is no significant loss of poll data. The passive appliance in the peer pair assumes the role of the active appliance in the peer pair. Having a peer pair in a peer to peer cluster implementation has its own terminology.
Terminology
- Primary Appliance - Implemented to be the active, normal, polling appliance. If the primary appliance fails, it is still the primary appliance but it becomes the passive appliance.
- Secondary Appliance - Implemented to be the passive appliance in a Hot Standby Appliance peer pair. If the active appliance fails, it is still the secondary appliance but it assumes the active role.
- Active Appliance - The currently polling appliance. Upon initial set up the primary appliance is the active appliance. If the primary appliance fails, the secondary appliance becomes the active appliance in the peer pair.
- Passive Appliance - The appliance that currently replicates from the active appliance. Upon initial set up the secondary appliance is the passive appliance.
- Neighbor - The other appliance in the peer pair. The primary appliance’s neighbor is the secondary appliance and vice versa.
- Fail Over – From the perspective of the active appliance, fail over occurs when the currently passive appliance assumes the active role.
- Take Over – From the perspective of the passive appliance, take over occurs when the passive appliance assumes the active role.
- Split Brain – When both appliances in a peer pair are active or both appliances in a peer pair are passive this is known as split brain. Split brain can occur when the communication between the appliances is interrupted and the passive appliance becomes active. When an administrative user logs on, an administrative message appears to let you know that the split brain condition exists. Please refer to section Troubleshoot Cluster Management for details.
Both appliances in the peer pair must have the same hardware/software configuration to prevent performance problems in the event of a failover (for example, a 60K HSA for a 60K PAS). Each Hot Standby Appliance peer pair can have only the two appliances in the peer pair. A Hot Standby Appliance cannot have a Hot Standby Appliance attached to it.
In a Hot Standby Appliance peer pair implementation, the role of the active appliance in the peer pair is to:
- Provide a full web interface to the entire SevOne NMS cluster.
- Update the leader peer with configuration changes.
- Pull config database changes from the leader peer.
- Communicate with other peers to get any non-local data.
- Discover and/or delete the devices you assign to the peer.
- Poll data from the assigned devices.
The role of the passive appliance in the peer pair is to:
- Replicate the config database information and the data database information from the active appliance.
- Switch to the active role when the current active appliance is not reachable.
An HSA implementation creates an appliance relationship that is designed to have the passive appliance receive a continuous stream of database replication (commonly known as binary logs of database actions) from the active appliance neighbor. The passive appliance replays this stream and makes precisely the same changes that the active appliance made (i.e., record collected data, add a new device configuration, delete a monitored device and all historic data, etc.). The passive appliance performs a continuous heartbeat check to the active appliance. In the event that the passive appliance determines that the active appliance’s heartbeat has gone away for any reason for a specified time period (referred to as the minimum dead time), the passive appliance assumes that the active appliance has indeed suffered a catastrophic failure and the passive appliance takes over polling at that point for its respective devices and notifies the cluster leader that it has taken over for what was the active appliance.
The passive appliance in the Hot Standby Appliance peer pair can take over for the active appliance at any time. Any changes you make to the active appliance (including settings updates, new polls, or device deletions) are replicated to the passive appliance as frequently as possible.
Hot Standby Appliance Peer Pair Ports
The primary appliance and the secondary appliance in a Hot Standby Appliance peer pair need to communicate with each other to maintain a consistent environment. The appliances need to have the following ports open between each other:
- TCP 3306 - MySQL replication
- TCP 3307 - MySQL replication
Add an Appliance to Create a Hot Standby Appliance Peer Pair
There are two ways to implement a Hot Standby Appliance peer pair: VIP and non-VIP. Each has its advantages and disadvantages, and each works best in certain implementations.
Virtual IP Configuration (VIP Configuration)
VIP Configuration requires each appliance to have two Ethernet cards and a dedicated IP address. The two appliances share a virtual IP, address which is the IP address of the appliance that is currently active. This works when the two appliances are on the same subnet.
Advantages
- Transparent to the end user; if a failover occurs, the IP address does not change.
- Appliances are not separated by a firewall.
- Replicated data is restricted to a subnet.
Disadvantages
- Requires three IP addresses.
- Only works if the two appliances are in the same subnet. Barring complicated routing this set up generally means that the two appliances are close to each other so if the building goes down, both appliances go down.
In a VIP configuration, the access lists necessary for SNMP, ICMP, port monitoring, and other polled monitoring are only set for the virtual IP address.
VIP Configuration requires you to set up two interfaces.
- eth0: - This is the virtual interface and should be the same on both appliances. This interface is brought up and down appropriately by the system.
- eth1: - This is the administrative interface for the appliance. The system does not alter this interface.
The Cluster Manager provides a Peer Settings tab to enable you to view the Primary, Secondary, and Virtual IP addresses in a VIP configuration Hot Standby Appliance peer pair.
- Primary Appliance IP Address - The administrative IP address for the primary appliance (eth1).
- Secondary Appliance IP Address - The administrative IP address for the secondary appliance (eth1).
- Virtual IP Address - The virtual IP address for the peer pair (eth0).
Non-VIP Configuration
Non-VIP Configuration requires each appliance to have an IP address and be able to communicate with each other. In the event of a failover or takeover when the appliances switch roles the IP address of the peer pair changes accordingly.
Advantages
- Requires two IP addresses.
- Appliances do not need to be on the same subnet.
Disadvantages
- IP address of peer pair changes when the appliances failover.
You need to be aware that you must check two IP addresses or include a DNS load balancer to update the DNS record.
- A firewall separates the appliances.
- Data replicates across a WAN.
In a non-VIP configuration, all access lists need to include the IP addresses of both appliances in the event of a failover.
Non-VIP Configuration requires you to set up only one interface.
- eth0: - This is the administrative interface for the appliance. The system does not alter this interface.
The Cluster Manager provides a Peer Settings tab to enable you to view the Primary and Secondary IP addresses in a non-VIP configuration Hot Standby Appliance peer pair.
- Primary Appliance IP Address - The administrative IP address for the primary appliance (eth0).
- Secondary Appliance IP Address - The administrative IP address for the secondary appliance (eth0).
- Virtual IP Address - (this field should be blank).
Add or Remove a HSA
When performing a add / remove of a HSA, please use the IP address / hostname of your own Cluster Master / peers.
This section describes how to add a Hot Standby Appliance (HSA) to create a peer pair in your SevOne cluster. The HSA must be a clean box that has no attached devices and contains no historical data in its database.
The HSA becomes a secondary to a peer node in the cluster. The HSA must be prepared with the same model type (PAS or DNC) as the node that it will be peered with. That is, if the primary node is a DNC then the secondary must be prepared as a DNC before joining the cluster. It is best practice to configure the capacity to match that of the primary. For example, if the primary is a PAS5K then the node to be added to the cluster as its secondary should be prepared as a PAS5K.
Follow the steps below to add a HSA to the cluster. The HSA will be added to act as a secondary to the peer queen-01.
- Using a web browser of your choice, log into SevOne NMS with the IP address or hostname of your
Cluster Leader. For example, 10.49.11.53.
- From Cluster Manager, click the Integration tab.
- Click Get Token and click Copy Token button to copy the token.
- Log out from the Cluster Leader.
- Navigate to Administration > API Docs > Version 3.
- Click ClusterOrchestrator.
- Select POST /api/v3/cluster-orchestrator/become-hsa.
- Under Parameters, all the way to the right, locate the Model Schema field. Click
on the field to copy its content to the http
body value.
- After "primaryIp":, replace string with the IP address or hostname of the cluster's existing peer that will be the primary. Please make sure to enter it within the quotes.
- After "secondaryIp":, replace string with the IP address or hostname of the new peer that will be the secondary. Please make sure to enter it within the quotes.
- After "token":, replace string with the integration token obtained above.
Please make sure to enter it within the quotes. For example,
- Click Try it out! button to submit the HSA add request.
- If the action was successful, it will return the following response.
- Now, log out.
- After a peer joins the cluster, you must log out of SevOne NMS graphical user interface.
- Log back in to any node in the cluster. From the Cluster Manager,
you should see the new secondary node, 10.49.8.77 (passive).
Note: It may take some time for the secondary node to be populated with data from the primary.Important: Join Cluster process overwrites all the data on the current peer.
Remove HSA
This section describes how to remove a Hot Standby Appliance (HSA) from a peer pair in
your SevOne cluster.
- Using a web browser of your choice, log into SevOne NMS with the IP address or hostname of your
Cluster Leader. For example, 10.49.11.53.
- From Cluster Manager, click the Integration tab.
- Click Get Token and click Copy Token button to copy the token.
- Log out from the Cluster Leader.
- Navigate to Administration > API Docs > Version 3.
- Click ClusterOrchestrator.
- Select POST /api/v3/cluster-orchestrator/leave-cluster-hsa.
- Under Parameters, all the way to the right, locate the Model Schema field. Click
on the field to copy its content to the http body value.
- After "ip":, replace string with the IP address or hostname of the secondary that will be leaving the cluster. Please make sure to enter it within the quotes.
- After "token":, replace string with the integration token obtained above.
Please make sure to enter it within the quotes. For example,
- Click Try it out! button to submit the HSA add request.
- If the action was successful, it will return the following response.
- Now, log out.
- After a peer leaves the cluster, you must log out of SevOne NMS graphical user interface.
- Log back in to any node in the cluster. From the Cluster Manager,
you will see that the secondary node, 10.49.8.77 (passive) is no longer present.
DNC Hot Standby Appliance Implementation
When your cluster requires a Hot Standby Appliance for a Dedicated NetFlow Collector there are a few considerations. During normal operations, the active appliance collects raw flow data and aggregates flow data for all FlowFalcon views that you define to use aggregated flow data. When a user runs a FlowFalcon report that uses an aggregated view, the report draws from the aggregated flow database on the active appliance. The aggregated flow data from the active appliance is replicated to the passive appliance. Raw flow data is not replicated from the active appliance to the passive appliance.
Like the HSA implementation for a PAS, there are two ways to implement a HSA peer pair for a DNC: Virtual IP Address (VIP) and non-VIP.
Virtual IP Address Configuration (VIP)
To reiterate, VIP configuration requires each appliance to have two Ethernet cards and a dedicated IP address. The two appliances share a virtual IP, address which is the IP address of the appliance that is currently active.
Advantages
- Transparent to the end user; if a failover occurs, the IP address does not change.
- Appliances are not separated by a firewall.
- Replicated data is restricted to a subnet.
Disadvantages
- Requires three IP addresses.
- Only works if the two appliances are in the same subnet.
VIP Configuration requires you to set up two interfaces.
- eth0: - This is the virtual interface and should be the same on both appliances. This interface is brought up and down appropriately by the system.
- eth1: - This is the administrative interface for the appliance. The system does not alter this interface.
The Cluster Manager provides a Peer Settings tab to enable you to view the Primary, Secondary, and Virtual IP addresses in a VIP configuration Hot Standby Appliance peer pair.
- Primary Appliance IP Address - The administrative IP address for the primary appliance (eth1).
- Secondary Appliance IP Address - The administrative IP address for the secondary appliance (eth1).
- Virtual IP Address - The virtual IP address for the peer pair (eth0).
Non-VIP Configuration
Non-VIP Configuration requires each appliance to have an IP address and be able to communicate with each other. You must configure the devices to send all raw flow data to the IP address of both appliances so that raw flows that are not replicated are still available from both appliances. The passive appliance does not store or process raw flow data unless there is a failover.
Advantages
- Requires two IP addresses.
- Appliances do not need to be on the same subnet.
Disadvantages
- IP address of peer pair changes when the appliances failover.
- Devices must be configured to send flow data to both IP addresses.
You need to be aware that you must check two IP addresses or include a DNS load balancer to update the DNS record.
- A firewall separates the appliances.
- Data replicates across a WAN.
In a non-VIP configuration, all access lists need to include the IP addresses of both appliances in the event of a failover.
Non-VIP Configuration requires you to set up only one interface.
- eth0: - This is the administrative interface for the appliance. The system does not alter this interface.
Technical Details
The Bandwidth requirements are as follows using a DNC1000 for the example.
1000 interfaces x 10 aggregated FlowFalcon views x 200 results x 2 directions = 4,000,000 records
every minute.
200 bytes per record means 800,000,000 bytes per
minutes.
That is 3.6 Gigabits per minute. Or 60 Mb per second.
The
connection must sustain at least a 60Mb TCP connection.
Troubleshoot Cluster Management
The Cluster Manager displays statistics and enables you to define application settings. The Cluster Manager enables you to integrate additional SevOne NMS appliances into your cluster, to resynchronize the databases, and to change the roles of Hot Standby Appliances (fail over, take over, rectify split brain). The following is a subsection of the comprehensive Cluster Manager documentation. See the SevOne NMS System Administration Guide for additional details.
To access the Cluster Manager from the navigation bar, click the Administration menu and select Cluster Manager.
The left side enables you to navigate your SevOne NMS cluster hierarchy. When the Cluster Manager appears, the default display is the Cluster level with the Cluster Overview tab selected.
- Cluster Level - The Cluster level enables you to view cluster wide statistics, to view statistics for all peers in the cluster and to define cluster wide settings.
- Peer Level - The Peer level enables you to view peer specific information and to define peer specific settings. The cluster leader peer name displays at the top of the peer hierarchy in bold font and the other peers display in alphabetical order.
- Appliance Level - Click to display appliance level information including database replication details. Each Hot Standby Appliance peer pair displays the two appliances that act as one peer in the cluster.
Peer Level Cluster Management
Peer Level – At the Peer level, on the Peer Settings tab, the Primary/Secondary subtab enables you to view the IP addresses for the two appliances that act as one SevOne NMS peer in a Hot Standby Appliance (HSA) peer pair implementation. The Primary appliance is initially set up to be the active appliance. If the Primary appliance fails, it is still the Primary appliance but its role changes to the passive appliance. The Secondary appliance is initially set up to be the passive appliance. If the Primary appliance fails, the Secondary appliance is still the Secondary appliance but it becomes the active appliance. You define the appliance IP address upon initial installation and implementation. See the SevOne NMS Installation Guide and SevOne NMS Implementation Guide for details.
- The Primary Appliance IP Address field displays the IP address of the primary appliance.
- The Secondary Appliance IP Address field displays the IP address of the secondary appliance.
- The Virtual IP Address field appears empty unless you implement the primary appliance and the secondary appliance to share a virtual IP address (VIP HSA implementation).
- The Failover Time field enables you to enter the number of seconds for the passive appliance to wait for the active appliance to respond before the passive appliance takes over. SevOne NMS pings every 2 seconds and the timeout for a ping is 5 seconds.
Appliance Level Cluster Management
Appliance Level – At the Appliance level the appliance IP address displays. For a Hot Standby Appliance peer pair implementation two appliances appear.
- The Primary appliance appears first in the peer pair.
- The Secondary appliance appears second in the peer pair.
- The passive appliance in the peer pair displays (passive).
- The active appliance that is actively polling does not display any additional indicators.
Click on an appliance and appears in the upper-right corner. Click to display options that are dependent on the appliance you select in the hierarchy on the left side.
- Select Device Summary to access the Device Summary for the appliance. When there are report templates that are applicable for the device, a link appears to the Device Summary along with links to the report templates.
- Select Fail Over to have the active appliance in a Hot Standby Appliance peer pair become the passive appliance in the peer pair. This option appears when you select the active appliance in a Hot Standby Appliance peer pair in the hierarchy.
- Select Take Over to have the passive appliance in a Hot Standby Appliance peer pair become the active appliance in the peer pair. This option appears when you select the passive appliance in a Hot Standby Appliance peer pair in the hierarchy.
- Select Resynchronize Data Database to have an active appliance pull the data from its own config database to its data database or to have the passive appliance in a Hot Standby Appliance peer pair pull the data from the active appliance's data database. This is the only option that appears when you select the cluster leader peer's appliance in the hierarchy.
- Select Resynchronize Config Database to have an active appliance pull the data from the cluster leader peer's config database to the active peer's config database or to have the passive appliance in a Hot Standby Appliance peer pair pull the data from the active appliance's config database.
- Select Rectify Split Brain to rectify situations when both appliances in a Hot Standby Appliance peer pair think they are active or both appliances think they are passive. For details, please refer to section Split Brain Hot Standby Appliances.
Appliance Overview
Click next to a peer in the cluster hierarchy on the left side, click on the IP address of an appliance, and then select the Appliance Overview tab on the right side to display appliance level information.
Data Database Information
- Source Host - Displays the IP address of the source from where the appliance replicates the data database. In a single appliance implementation and on an active appliance, this is the IP address of the appliance itself. HSA passive appliance data database replicates from the active appliance data database.
- I/O Thread - Displays Running when an active appliance is querying its config database for updates for the data database. Displays Not Running when the appliance is not querying the config database. HSA passive appliance data database queries the active appliance data database.
- Update Thread - Displays Running when the appliance is in the process of replicating the config database to the data database. Displays Not Running when the appliance is not currently replicating to the data database.
- Source Log File - Displays the name of the log file the appliance reads to determine if it needs to replicate the config database to the data database.
- Seconds Behind - Displays 0 (zero) when the data database is in sync with the config database or displays the number of seconds that the synchronization is behind.
Config Database Information
- Source Host - Displays the IP address of the source from where the appliance replicates the config database. In a single appliance implementation and on the cluster leader active appliance, this is the IP address of the appliance itself. HSA passive appliance config database replicates from the active appliance config database.
- I/O Thread - Displays Running when an active appliance is querying the cluster leader peer config database for updates. Displays Not Running when the appliance is not querying the cluster leader peer config database. HSA passive appliance config database queries the active appliance config database.
- Update Thread - Displays Running when the appliance is in the process of replicating the config database. Displays Not Running when the appliance is not replicating the config database.
- Source Log File - Displays the name of the log file the appliance reads to determine if it needs to replicate the config database.
- Seconds Behind - Displays 0 (zero) when the config database is in sync with the cluster leader peer config database or displays the number of seconds that the synchronization is behind.
Split Brain Hot Standby Appliances
The typical split brain is due to a fail over and then a fail back where the appliance that was active goes down and then comes back up again as the active appliance. The lack of communication from the active appliance causes the passive appliance to become the active appliance, which makes both appliances in the Hot Standby Appliance pair active. A split brain can also be when both appliances are passive.
When a user with an administrative role logs on to SevOne NMS and there is a Hot Standby Appliance peer pair that is in a split brain state, an administrative message appears.
- Neither appliance in your Hot Standby Appliance peer pair with IP addresses <n> and <n> is in an active state.
- Both appliances in your Hot Standby Appliance peer pair with IP addresses <n> and <n> are either active or both appliances are passive.
There are two methods to resolve a split brain.
- SevOne NMS User Interface - Cluster Manager
- Command Line Interface
SevOne NMS User Interface
The Cluster Manager displays the cluster hierarchy on the left and enables you to rectify split brain occurrences. From the navigation bar click the Administration menu and select Cluster Manager to display the Cluster Manager.
In the cluster hierarchy, click on the name of the peer pair that is the Hot Standby Appliance peer pair to display the two IP addresses of the two appliances that make up the peer pair. The Primary appliance displays first.
- In a split brain situation where both appliances are active, neither appliance displays Passive.
- In a split brain situation where both appliances are passive, both appliances display Passive.
Select one of the affected appliances in the hierarchy on the left side. Click the to display a Rectify Split Brain option.
- When both appliances think they are passive and you select this option, the appliance for which you select this option becomes the active appliance in the Hot Standby Appliance peer pair.
- When both appliances think they are active and you select this option, the appliance for which you select this option becomes the passive appliance in the Hot Standby Appliance peer pair.
Example: Select 192.129.14.168 and click Rectify Split Brain to make 192.129.14.168 the passive appliance when both appliances are active.
Command Line Interface Method
Perform the following steps to fix a split brain situation from the command line interface.
- Log on to the peer that is the Secondary appliance that is in active/leader mode.
- Enter masterslaveconsole
- When in masterslaveconsole you can enter Help for a list of commands.
- To check the appliance type, enter GET TYPE.
- To check the appliance status, enter GET STATUS.
- To make the appliance passive, enter the following command BECOME SLAVE (i.e., to become the follower).
- After you run the command, enter GET JOB STATUS to check the status. This tells you if the process is still running and for how long.
After the process completes, check SevOne-masterslave-status to confirm that the changes were made and that the appliances are in their original configuration.
Scale Specifications
The scale specifications below are based on what SevOne has tested with, approves, and recommends to you.
Users on NMS - Cluster
Category | Max Tested |
---|---|
Number of Users (User Manager) | 10,000 |
Number of Users Logged In (Session Manager) | 10,000 Estimated max |
Number of User Roles (User Role Manager) | 1,000 |
Number of Active Users | 1,000 |
Number of Concurrent Users | 400 |
Alerts
Category | Max Tested |
Total Active Alerts | 50,000 |
Total Archived Alerts Allowed on System | 2 million |
Total Archived Alerts Max Display(All Time/Display All) | 50,000 |
Maximum Number of Policies | 1,000 |
Maximum Number of Threshold processed in 3 Min - Peer | 38,000 |
Maximum Number of Status Maps | 1,000 |
Traps - Peer
SevOne-trapd Thread Count | Maximum Processed | Maximum Received |
---|---|---|
Default = 10 | 1,000 | 1,000 |
Maximum setting = 99 | 1,500 | 4,000 |