Implementing a GPFS HA SNMP configuration using Callbacks

This page has not been liked. Updated 12/9/13 12:27 PM by ScottGPFSTags: None



The GPFS SNMP implementation is not designed to be highly available. There is only one SNMP monitor node at a time. If that node fails the SNMP task does not fail over to the other node. If you would like the SNMP monitor function to fail over you can implement this using the GPFS callback mechanism. You can do this on GPFS 3.3 or later.

To do this you need to:

  1. Install Net-SNMP on two nodes
  2. Write a script that changes the SNMP monitor node
  3. Create a callback that triggers on a "nodeleave" event
 

Install NetSNMP on two nodes

 

This topic is covered in the GPFS documentation and in an article SNMP-Based monitoring for GPFS clusters.

 

Write a script that changes the SNMP monitor node

 

The script should be designed to check to see if the event involved the other SNMP monitor node. The following script is an example of one way to fail oever. Simply look to see if the other SNMP manager is "active" and if it is not move the snmp-agent role to this server.

Disclaimer: This script is provided as an example, it has not been tested.

 
 

This is a very simple script to demonstrate the concept. This script assumes it is installed only on the two SNMP nodes. In addition it assumes that if both nodes are up that one of them is running the SNMP monitoring, if one of them is not up then the other should make sure it is monitoring. The nice thing about this script is that if only two nodes are running it you do not have to figure out which callback should do the snmp-agent change. Though this script could be made much more intelligent. The callback could be installed to receive the %myNode parameter, and if %myNode is equal to the current collector node (verify snmp-agent in mmlscluster), then do the switch to one of the other active (according to mmgetstate) potential collectors. That way the callback could be installed clusterwide or incorporated into an existing clusterwide nodeLeave callback.

 

Create a callback that triggers on a "nodeleave" event

 

The callback should be defined to run on the two SNMP nodes and defined to run the SNMP takeover script. That way when a nodeleave event occurs they both get the message. For information on creating a callback see: Exercise 6: Creating a Callback.