Retrieving platform state information with SNMP

This task describes how to retrieve Cloud Pak for Data System hardware and software registries, opened and closed issues, and events using SNMP. snmpget, snmptable, and snmpwalk are used to retrieve Cloud Pak for Data System state.

Before you begin

In order to successfully use the feature, the following requirements must be met:
  • The system has to have snmpd.service up and running.
  • User needs to get credentials, which will allow to communicate with snmpd.service.
  • User needs to start SNMP sub-agent, which is responsible for responding to requests.

Procedure

  1. Verify that the snmpd.service is up and running on Cloud Pak for Data System:
    [apuser@e1n1 root]$ service snmpd status
    Redirecting to /bin/systemctl status snmpd.service
    
    ● snmpd.service - Simple Network Management Protocol (SNMP) Daemon.
    
       Loaded: loaded (/usr/lib/systemd/system/snmpd.service; enabled; vendor preset: disabled)
    
       Active: active (running) since Fri 2019-07-12 11:29:22 UTC; 3 weeks 4 days ago
    
    Main PID: 172815 (snmpd)
    
        Tasks: 1
    
       Memory: 11.3M
    
       CGroup: /system.slice/snmpd.service
    
               └─172815 /usr/sbin/snmpd -LS0-6d -f
  2. Get the credentials, which will allow to communicate with snmpd.service:
    Note: Superuser access in required for this step.
    1. From the output of the service snmpd status command in the previous step, you can read that snmpd was started with default configuration file snmpd.conf which is located in /etc/snmp/snmpd.conf (172815 /usr/sbin/snmpd -LS0-6d -f).

      For details, see snmpd(1) man pages section called CONFIGURATION FILES. In case some custom configuration file was used for the service, you would see the following options: -C -c <some-path> in the line /usr/sbin/snmpd -LS0-6d -f. The -C means: do not read any configuration files except the ones optionally specified by the -c option.

      If there was -C -c /root/.snmp/snmpd.conf, it would mean that the service reads configuration from the file located in /root/.snmp/, omitting the one located in /etc/snmp/.

    2. Find the file that is used for configuration on your system, and search for the lines beginning with rocommunity. There is a community string defined, which allows to request snmpd.service:
      [root@e1n1 ~]# grep -m 1 rocommunity /etc/snmp/snmpd.conf 
      rocommunity ****** <some_ip> default

      Community strings were masked on the above snippet intentionally.

  3. Start SNMP sub-agent, which is responsible for responding to requests:

    Inventory retrieving is provided by SNMP sub-agent called magneto-snmp-agent.service, which should be up and running on Cloud Pak for Data System. The main daemon snmpd.service will delegate SNMP requests aiming the OIDs defined in IBM-GTv2-MIB module to the sub-agent. You use the apsnmpagent utility to manage the state of the sub-agent. Note that it allows you to switch on/switch off the sub-agent and the main snmpd.service as well. Use it carefully to avoid stopping snmpd.service by mistake.

    To enable and start snmpd.service after using the command apsnmpagent, run the command with optional argument --snmpd_only or -s as follows:
    apsnmpagent off && apsnmpagent on --snmpd_only
    In the above example, the first command stops and disables services magneto-snmp-agent and snmpd, and the second command enables and starts snmpd.

    Called with argument state, apsnmpagent collects information if magneto-snmp-agent is enabled on all nodes of hadomain1. If its state is inconsistent, a proper information is printed. apsnmpagent acts on all active nodes of hadomain1.

  4. Use Net-SNMP snmptable application to retrieve tables defined in IBM-GTv2-MIB.txt. IBM-GTv2-MIB.txt is located in /usr/share/snmp/mibs. You can use snmptranslate -Tp -IR IBM-GTv2-MIB::iias to see its structure as a tree. A node called applianceTables contains defined tables which could be used to retrieve system state provided by the ap command with different arguments. The following table shows the mapping:
         |  +--moduleTables(2)
          |     +--hardwareTable(11)         -> ap hw -d
          |     +--softwareTable(21)         -> ap sw -d
          |     +--openIssuesTable(31)       -> ap issues -i
          |     +--closedIssuesTable(41)     -> ap issues -c
          |     +--eventsTable(51)           -> ap issues -e
          |     +--nodesTable(61)            -> ap node -d
          |     +--sharedFSTable(71)         -> ap df (Shared filesystem utilization section)
          |     +--localFSTable(81)          -> ap df (Node local files systems utilization section)
          |     +--gpfsTable(91)             -> ap fs (GPFS filesystems section)
          |     +--mountsTable(101)          -> ap fs (Mounts section)

    Examples:

    Using snmpget or snmpwalk with hwInventoryTable, you can get the details column, where useful information is gathered in a in comma-separated format. For example, you can find node average power consumption:
    [root@node0101 ~]# snmpwalk -v2c -c $(grep rocommunity -m 1 /etc/snmp/snmpd.conf | cut -d' ' -f2) 
    address:port IBM-GTv2-MIB::hwUnitDetails.\"hadomain1\".\"node1\".\"\"
    
    IBM-GTv2-MIB::hwUnitDetails."hadomain1"."node1"."" = STRING: 
    cpu_clock_exp:3325.0MHz,inlet_temp_celsius:21,power:on,unrecoverable_events:0,cpu_clock_avg:3325.0MHz,
    led:on,base_temp1_celsius:28,cpu_clock_tuned:active,cpu_smt_config:SMT=4,memsize:512GB,cpu_cores_enab:24,
    position:P1,avgpwr:650 Watts,base_temp3_celsius:33,base_temp2_celsius:35
    In the following example, information from ap issues is retrieved:
    [root@e1n1 ~]# ap issues
    Open alerts (issues) and unacknowledged events
    +------+---------------------+--------------------+-----------------------------------------------------+----------------------+----------+--------------+
    | ID   |         Date (CEST) |               Type | Reason Code and Title                               | Target               | Severity | Acknowledged |
    +------+---------------------+--------------------+-----------------------------------------------------+----------------------+----------+--------------+
    | 1002 | 2021-10-08 04:42:42 | SW_NEEDS_ATTENTION | 451: Webconsole service is not ready                | sw://webconsole      |  WARNING |          N/A |
    | 1018 | 2021-10-08 13:12:29 | SW_NEEDS_ATTENTION | 436: Failed to collect status from resource manager | node@hw://e2n4.fbond |    MAJOR |          N/A |
    +------+---------------------+--------------------+-----------------------------------------------------+----------------------+----------+--------------+
    
    Generated: 2021-10-13 13:05:37
    You can use snmpget, snmpwalk and snmptable to get the same information as above:
    [root@e1n1 ~]# snmpget -c$(grep -m1 rocommunity /etc/snmp/snmpd.conf | awk '{ print $2 }') -v2c address:port IBM-GTv2-MIB::issueDate.1018
    IBM-GTv2-MIB::issueDate.1018 = STRING: 2021-10-08 13:12:29
    [root@e1n1 ~]# snmpget -c$(grep -m1 rocommunity /etc/snmp/snmpd.conf | awk '{ print $2 }') -v2c address:port IBM-GTv2-MIB::issueTarget.1018
    IBM-GTv2-MIB::issueTarget.1018 = STRING: node@hw://e2n4.fbond
    [root@e1n1 ~]# snmpget -c$(grep -m1 rocommunity /etc/snmp/snmpd.conf | awk '{ print $2 }') -v2c address:port IBM-GTv2-MIB::issueSeverity.1018
    IBM-GTv2-MIB::issueSeverity.1018 = STRING: MAJOR
    [root@e1n1 ~]#  
    [root@e1n1 ~]# snmpwalk -c$(grep -m1 rocommunity /etc/snmp/snmpd.conf | awk '{ print $2 }') -v2c address:port IBM-GTv2-MIB::openIssuesTable
    IBM-GTv2-MIB::issueDate.1002 = STRING: 2021-10-08 04:42:42
    IBM-GTv2-MIB::issueDate.1018 = STRING: 2021-10-08 13:12:29
    IBM-GTv2-MIB::issueType.1002 = STRING: SW_NEEDS_ATTENTION
    IBM-GTv2-MIB::issueType.1018 = STRING: SW_NEEDS_ATTENTION
    IBM-GTv2-MIB::issueReasonCode.1002 = Gauge32: 451
    IBM-GTv2-MIB::issueReasonCode.1018 = Gauge32: 436
    IBM-GTv2-MIB::issueTitle.1002 = STRING: Webconsole service is not ready
    IBM-GTv2-MIB::issueTitle.1018 = STRING: Failed to collect status from resource manager
    IBM-GTv2-MIB::issueTarget.1002 = STRING: sw://webconsole
    IBM-GTv2-MIB::issueTarget.1018 = STRING: node@hw://e2n4.fbond
    IBM-GTv2-MIB::issueSeverity.1002 = STRING: WARNING
    IBM-GTv2-MIB::issueSeverity.1018 = STRING: MAJOR
    IBM-GTv2-MIB::issueAcknowledged.1002 = STRING: N/A
    IBM-GTv2-MIB::issueAcknowledged.1018 = STRING: N/A
    [root@e1n1 ~]#
    [root@e1n1 ~]# snmptable -Ci -c$(grep -m1 rocommunity /etc/snmp/snmpd.conf | awk '{ print $2 }') -v2c address:port IBM-GTv2-MIB::openIssuesTable
    SNMP table: IBM-GTv2-MIB::openIssuesTable
    
     index           issueDate          issueType issueReasonCode                                     issueTitle          issueTarget issueSeverity issueAcknowledged
      1002 2021-10-08 04:42:42 SW_NEEDS_ATTENTION             451                Webconsole service is not ready      sw://webconsole       WARNING               N/A
      1018 2021-10-08 13:12:29 SW_NEEDS_ATTENTION             436 Failed to collect status from resource manager node@hw://e2n4.fbond         MAJOR               N/A
    [root@e1n1 ~]#
    
    
    Tip: It might be convenient to use -Cl optional argument in snmptable command to set 'left justify' to the output.