Configuring performance monitoring tool sensors

The IBM Storage Scale System fabric hospital uses custom performance monitoring tool sensors to relay path error information back to the performance monitoring tool collector node. By default, this node is the EMS node.

About this task

Due to current scaling concerns, it is recommended to install and configure only the performance monitoring tool pmsensors on 5-10 IBM Storage Scale System building blocks. Where, each building block is an IBM Storage Scale System server pair.

Procedure

  1. Identify building blocks to install the sensors on them. The blocks are identified by passing a list of IBM Storage Scale System nodes or a node class. This node class contains the nodes that will be monitored.
  2. Run the following command from the EMS or I/O node for each recovery group definition for all building blocks that are being monitored:

    This step is necessary to prevent spurious RAS events and discard data that existed before the performance monitoring tool sensor is being installed.

    RGNAMES=rg1 rg2 rg3
    for rg in $RGNAMES
    do
      tsgnrgethospdata $rg  --reap --all-node-path
    done
    sleep 151
    mmperfmon config add --sensors /opt/IBM/zimon/defaults/ZIMonSensors_GPFSFabricHospital.cfg
    
  3. Instead of using the code block in Step 2, it is recommended to use a sample script that is provided in /usr/lpp/mmfs/samples/vdisk/install_essfabrichospital_sensor.
    install_essfabrichospital_sensor [-h] [-f FILE][--override-sleep OVERRIDE_SLEEP] node_class_list
    where,
    node_class_list
    Specifies node class list (one per building block), which is separated by commas.
    The node_class_list argument is different from the restrict argument that is provided in the /opt/IBM/zimon/defaults/ZIMonSensors_GPFSFabricHospital.cfg file and it is used to determine a sleeping period to synchronize the sensors.
    -h, --help
    Shows the help message and exits.
    -f FILE, --file FILE
    Specifies the sensor configuration file (default /opt/IBM/zimon/defaults/ZIMonSensors_GPFSFabricHospital.cfg).
    --override-sleep OVERRIDE_SLEEP
    Overrides the sleep delay when you are installing a sensor (debug).
    The provided sensor file in /opt/IBM/zimon/defaults/ZIMonSensors_GPFSFabricHospital.cfg has the following contents:
    sensors = {
            name = "GPFSFabricHospital"
            # This sensor should be activated only when ECE/GNR is configured
            # Only supported value is 900 seconds (15 minutes)
            period = 900
            restrict = "nsdNodes"
            type = "Generic"
    }
    
    Important: Only values of 900 (15 minutes in seconds) are supported.
  4. Verify the new sensor.
    # mmperfmon config show | grep GPFSFabricHospital -A3
    A sample output is as follows:
    name = "GPFSFabricHospital"
            period = 900
            restrict = "FabricHospital10BB"
            type = "Generic"
    Note: In its current form, it is recommended to limit the sensor configuration to 10 IBM Storage Scale System building block node pairs. Here, the system provided node class is "FabricHospital10BB", which contains all IBM Storage Scale System server nodes by default. If your cluster has more than 10 IBM Storage Scale System building block node pairs, then a custom node class should be defined that encapsulates up to 10 node pairs.
    For example, assume that the "FabricHospital10BB" node class is created to contain the nodes that you want to monitor. The performance monitoring tool sensor file would be modified as follows:
    # mmperfmon config show | grep GPFSFabricHospital -A3
    A sample output is as follows:
     name = "GPFSFabricHospital"
     period = 900
     restrict = "FabricHospital10BB"
     type = "Generic"
  5. Ensure that the performance monitoring component is healthy.
    # mmhealth node show perfmon
    A sample output is as follows:
    Node name:      c145f11san06a-ib0.gpfs.net
    Component     Status        Status Change     Reasons & Notices
    -------------------------------------------------------------------------------
    PERFMON       HEALTHY       1 day ago         -
    Note: Sensors can be configured only once. If you want to reinstall the sensor from scratch to change its behavior, you can run the following command to remove the sensor so that the sensor can be added again:
    # mmperfmon config delete {--all |--sensors  Sensor[,Sensor...] }