Technical Blog Post
Monitoring of MSMQ queue in cluster env with OS agent
What is the best approach when you need to monitor message count for all the MSMQ queues and send alert if
count value greater than a specific value in a clustered MSMQ environment ?
Let's suppose the cluster consists of two systems: 1 active and 1 passive.
In the two nodes we have Windows OS Agent and we can use MSMQ Queue attribute group for this purpose.
Looking at the returned info you can notice that some MSMQ queues are showed only on a single node, that is the active one.
Does it depends on the cluster role of MSMQ or is Windows OS Agent cluster-aware ?
First of all we have to consider that Window OS Agent monitors resources that are related to Cluster (disks, processes) and resources that
are node-specific (CPU, memory) and for this reason it must run (and so must be active) an all the nodes of the cluster.
It is not impacted by the node status (passive, active) because it must be always running.
So it is not actually cluster-aware.
If a specific resource is not available on a node because it is active on the other cluster node, simply it will not be
showed in the list of retrieved information for the "passive" node, because the related perfmon object will not return
the needed data, but the agent will continue to run and request data collection for the related attribute group.
So, if you don't see MSMQ Queues on an OS node, this is related to the MSMQ architecture, it does not depend on Windows OS agent configuration.
Usually in a two-node cluster, there are three instances of MSMQ.
Two local instances (one on each node) and the clustered instance which runs on the Active Node.
The clustered instance is failed over between nodes from Active to Passive and vice-versa.
On the Active node from the perfmon I would expect to see the queues for the local instance and the queues for the clustered instance.
The queues belonging to the clustered instance are prefixed with "os:", the others (default queues) belong to the local instance.
The attribute group MSMQ Queues retrieves the entries (perfmon intances) and the related metrics directly from Windows Performance Monitor (perfmon) objects, it does nothing on the data other than sending it to TEMS/TEPS for visualization.
If we want to be sure Agent is showing the correct data, we can check what perfmon tool shows for Performance Object "MSMQ Queue" for the active node.
Let's have a look at the perfmon output on the server having Active role for MSMQ: launch perfmon from command prompt by using perfmon command.
Select Performance Monitor and click "+"
Scroll down the Perfmon counters until MSMQ Queues and select it:
In the Object instances panel you will see the same value showed in Queue Instance column of the MSMQ Queue workspace view for the server.
The above image is taken from a non-clustered environment and actually it shows only local queues.
In a clustered environment we will see here also the cluster private queues, that are the one prefixed with "os:".
If we see here the same set of queue instances we see on TEP (I don't expect to see differences, knowing how data collection occurs),
then it means this is what the underlying MSMQ framework is providing to the Windows OS performance monitor layer, and so the agent is showing the correct information.
From ITM perspective, we can still monitor the condition independently from the location of the queue instance.
If a queue instance belongs to the clustered instance, this will always appear only on a single agent node, the one running on the machine that has the Active role for MSMQ cluster.
In that case, deploying the situation on both nodes will help you not worrying about the real location of the queue:
the Queue instance name will be the same, independently from its location, so you don't have to create multiple situations.
You can create just one situation and distribute it to all the wanted OS Agent nodes.
If you want to monitor only the queues for the clustered instance, you can use Situation functions to filter in only the queue instance beginning with: "os:"
Something like this:
In this way the situation will fire only for queues belonging to the clustered instance, independently where it is running (primary or secondary node).
Hope it helps
Subscribe and follow us for all the latest information directly on your social feeds:
|Academy Twitter :||https://goo.gl/GsVecH|