Monitoring

Universal Messaging provides a set of command line tools that allow you to perform many of the common actions available. Some of these tools can be used to monitor different aspects of a realm which indicate its health. Below is the description of these tools and their usage.

Environment State Check

Periodically (every 1 minute) run the Health Checker tool for checking the environment status.

Command Details

Usage:
  runUMTool HealthChecker -rname=<rname> -check=EnvironmentStateCheck

Examples:
  runUMTool HealthChecker -rname=nsp://localhost:9000 
     -check=EnvironmentStateCheck

Required arguments:
  rname : Name of a realm to check. 
								

Refer to the section Running a Configuration Health Check section of the Universal Messaging Administration Guide for more information on the HealthChecker tool.

Output of each run can be parsed to raise alerts for any line starting with WARN or ERROR. The tool checks what percentage of memory is taken by events from the whole heap memory. If the percentage is between 70 and 80, or between 80 and 90, or above 90, an appropriate warning will be displayed.

Sample output of the EnvironmentStateCheck command:

ENVIRONMENT STATE CHECK
Environment State [umserver] 
INFO: [umserver] Connections: 2 
INFO: [umserver] Queued threads: 0 
INFO: [umserver] Vended threads: 62 
INFO: [umserver] Total threads: 64 
INFO: [umserver] Total memory (MB): 981 
INFO: [umserver] Used memory (MB): 101 
INFO: [umserver] Free memory (MB): 880 (89.69%) 
INFO: [umserver] Total direct memory (MB): 1024 
INFO: [umserver] Used	direct memory (MB): 0 
INFO: [umserver] Free direct memory (MB): 1024 (100%)
INFO: [umserver] Max heap memory (MB): 981 
INFO: [umserver] Memory allocated for events (MB): 0

Store State Check

Periodically (every 1 minute) run the Health Checker tool for checking the store status.

The following checks have to be enabled in the HealthChecker:

  • StoreMemoryCheck
  • StoreMismatchCheck
  • StoreWarningsCheck

Command Details

Usage:
runUMTool HealthChecker -rname=<rname>
  -check=StoreMemoryCheck, StoreMismatchCheck, StoreWarningsCheck

Examples:
runUMTool HealthChecker -rname=nsp://localhost:9000
  -check=StoreMemoryCheck, StoreMismatchCheck, StoreWarningsCheck

Required arguments:rname : Name of a realm to check.

Refer to the section Running a Configuration Health Check of the Administration Guide for more information on the HealthChecker tool.

Output of each run can be parsed to raise alerts for any line starting with WARN or ERROR.

Cluster State

Check the cluster state by a given RNAME, which is part of a cluster.

Command Details

Usage:
runUMTool ClusterState -rname=<rname> optional_args]

Examples:
runUMTool ClusterState -rname=nsp://localhost:9000

Required arguments:
rname : Name of a realm, which is part of a cluster.

Optional Parameters:
username : Your UM server username.
password : Your UM server password. 
								

As seen in the sample output below, the statuses of the cluster nodes can be parsed and appropriate alerts can be raised.

Sample output of ClusterState command

--------------------------------------
Cluster Name: Cluster1
-------------------------------------- 
Cluster Nodes: 
Node name: umserver (Master) 
Realm rnames: nhp://10.42.96.207:9000/
  nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9000/ 
Is node clustered: true 

Node	name: umserver2 (Slave) 
Realm rnames: nhp://10.42.96.207:9000/
  nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9000/ nhp://10.42.96.207:9001/
  nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9001/ 
Is node clustered: true

-------------------------------------- 
Cluster Statuses
-------------------------------------- 
Server name: umserver 
Server status: online 
Cluster state: Master
Last broadcast time: 0 
Client Request size: 0 
Comms Queue size: 0 
Queue size: 0 
Last response time: 0

-------------------------------------- 
Server name: umserver2 
Server status: online 
Cluster state: Slave 
Last broadcast time: 0 
Client Request size: 0 
Comms Queue size: 0 
Queue size: 0 
Last response time: 0
--------------------------------------
								

Alternative Java Sample

The sample code nClusterWatch.java found in <InstallDir>\UniversalMessaging\java\examples\com\pcbsys\nirvana\nAdminAPI\apps\nClusterWatch.java, demonstrates how the Java Admin API can be used to monitor the Cluster State.

Monitor Channels

Monitors the channels and queues in a realm and prints totals.

Command Details

Usage:
runUMTool MonitorChannels -rname=<rname> [optional_args]

Examples:

runUMTool MonitorChannels -rname=nsp://localhost:9000
  -channelname=channel0 -format=plaintext

runUMTool MonitorChannels -rname=nsp://localhost:9000
  -channelname=queue1 -format=plaintext

Required arguments:
rname : URL of the realm to monitor channels and queues for.

Optional Parameters:
channelname : Name of a specific channel or queue to monitor
format : Format to print output in (plaintext/xml/json)
username : Your UM server username.
password : Your UM server password.

As seen in the sample output below, channel and queue statuses of the cluster nodes can be parsed and appropriate alerts can be raised.

Sample output of the MonitorChannels command

Name : channel0
Total Events Published : 10000
Total Events Consumed : 0
Last Event ID : 10277
Current Connections : 0
Total Connections : 0
Used Space : 781K
Events : 10000
Memory Usage : 1M
% Free : 0%
Cache Hit : 0.0

Alternative Java Sample

Sample code using Java Admin API to monitor Channel and Queue depths

package com.pcbsys.nirvana.nAdminAPI.apps;
 
import com.pcbsys.nirvana.client.nSessionAttributes;
import com.pcbsys.nirvana.nAdminAPI.nContainer;
import com.pcbsys.nirvana.nAdminAPI.nLeafNode;
import com.pcbsys.nirvana.nAdminAPI.nNode;
import com.pcbsys.nirvana.nAdminAPI.nRealmNode;
 
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;
 
/**
 * Scans the provided Realm for Channels and Queues and displays their
 * attributes (Current Depth, Total Published and Total Consumed) 
 * every 10 seconds.
 *
 * Expects Realm Name (nsp://hostname:port) as runtime argument
 */
public class GetChannelsAndQueuesInfo {
 
    private nRealmNode realmNode;
    private List<nLeafNode> channels = new ArrayList<>();
    private List<nLeafNode> queues = new ArrayList<>();
 
    public GetChannelsAndQueuesInfo(String realmName) 
    throws Exception {
        realmNode = new nRealmNode(new nSessionAttributes(realmName));
 
        scanRealmForChannelsAndQueues(realmNode.getNodes());
    }
 
    /**
     * Recursively scans the Realm namespace for Channels and Queues
     *
     * @param realmNamespaceNodes
     */
    private void scanRealmForChannelsAndQueues(
        final Enumeration realmNamespaceNodes) {
        while (realmNamespaceNodes.hasMoreElements()) {
            final nNode child = (nNode) realmNamespaceNodes.nextElement();
 
            if (child instanceof nLeafNode) {
                final nLeafNode leafNode = (nLeafNode) child;
 
                if (leafNode.isChannel()) {
                    channels.add(leafNode);
                } else if(leafNode.isQueue()) {
                    queues.add(leafNode);
                }
            }
            else if (child instanceof nContainer) {
                scanRealmForChannelsAndQueues(((nContainer) child).getNodes());
            }
        }
    }
 
    public nRealmNode getRealmNode() {
        return realmNode;
    }
 
    public List<nLeafNode> getChannels() {
        return channels;
    }
 
    public List<nLeafNode> getQueues() {
        return queues;
    }
 
    public static void main(String[] args) throws Exception {
        if(args.length == 0) {
            throw new Exception("Realm Name startup argument is missing.");
        }
 
        GetChannelsAndQueuesInfo getChannelsAndQueuesInfo = 
            new GetChannelsAndQueuesInfo(args[0]);
        System.out.println();
        System.out.println("Connected to Realm : " + 
           getChannelsAndQueuesInfo.getRealmNode().getRealm().getName());
        System.out.println();
 
 
        while(true) {
            StringBuilder displayString = new StringBuilder();
 
            displayString.append(
              "Channels (Name | Current Depth | Total Published | Total Consumed) \n");
            displayString.append(
              "------------------------------------------------------------------ \n");
            for (nLeafNode oneLeaf : getChannelsAndQueuesInfo.getChannels()) {
                printLeafNode(displayString, oneLeaf);
            }
 
            displayString.append(
              "\nQueues (Name | Current Depth | Total Published | Total Consumed) \n");
            displayString.append(
              "---------------------------------------------------------------- \n");
            for (nLeafNode oneLeaf : getChannelsAndQueuesInfo.getQueues()) {
                printLeafNode(displayString, oneLeaf);
            }
            displayString.append(
              "==================================================================");
            System.out.println();
 
            System.out.println(displayString);
 
            Thread.sleep(10000);
        }
    }
 
    private static void printLeafNode(StringBuilder displayString, 
        nLeafNode oneLeaf) {
        displayString.append(oneLeaf.getAbsolutePath())
                     .append(" | ")
                     .append(oneLeaf.getCurrentNumberOfEvents())
                     .append(" | ")
                     .append(oneLeaf.getTotalPublished())
                     .append(" | ")
                     .append(oneLeaf.getTotalConsumed())
                     .append("\n");
    }
}

Identify Large Durable Outstanding Events

Identifies channels containing Durables with a large number of outstanding events.

Command Details

Usage:
runUMTool IdentifyLargeDurableOutstandingEvents
  -rname=<rname> -threshold=<threshold>
  [optional_args]

Examples:
runUMTool IdentifyLargeDurableOutstandingEvents
  -rname=nsp://localhost:9000 -threshold=100

Required arguments:
rname : URL of the realm to list the details of all the channels within.
threshold : Long value representing the tolerated number of outstanding events.

Optional Parameters:
username : Your UM server username.
password : Your UM server password. 
								

Periodic Logging of Server Status

The Universal Messaging server writes status information to the log file at regular intervals. The default interval can be configured using the StatusBroadcast realm configuration property, and the default value is 5 seconds.

For information on realm configuration properties, see the section Realm Configuration in the Enterprise Manager part of the Administration Guide.

Sample status log message

ServerStatusLog> Memory=3577, Direct=3925, EventMemory=0,
  Disk=277070, CPU=0.2, Scheduled=29, Queued=0, 
  Connections=5, BytesIn=12315, BytesOut=19876, 
  Published=413, Consumed=1254, QueueSize=0, 
  ClientsSize=0, CommQueueSize=0

The log file can be parsed to extract the server status and take appropriate preemptive actions if these parameters are deviating from set thresholds.

For more information, see the section Periodic Logging of Server Status section of the Universal Messaging Concepts Guide.

Other Parameters to Monitor

Apart from the parameters mentioned above, some more system parameters which need to be monitored are:

  • Disk utilization : Any rapid increase in the disk usage should be tracked and alerted. An appropriate threshold needs to be set and monitored.
  • CPU utilization
  • Memory utilization
  • Network activity