Performance issues Part 2 Situations

Technical Blog Post

Abstract

Body

Performance issues Part 2 Situations

The major point here is to think about the situations that are being written and how the output is being used.

It is understood that in a lot of environments different parts of the organization are allowed to write or at least request their own situations, however there a few simple guidelines that help:

1) Often it is thought "oh we can report on all this, so we will have say reports on disk space at 50 60 70 80 90 % full", so that is 5 possible events, but which ones are going to be really noticed and acted on?
There is a danger with too many events that some will be ignored or sometimes missed if there are too many coming in.

Situations need to be thought of as an information feed, too much information and the good important messages are drowned out.
So try and educate the teams that less is more.
What do they really need to know about? How often would this really change (to cover the frequency of the event)? How quickly will it change if it does change?

2) Make sure that different teams are not basically looking at the same issue, and both writing their own situations.
It could be one situation could be used for both groups or maybe the message actually needs to go to one particular team who will be able to solve the issue when it happens.

3) Frequency of events, yes you can get a situation to check every 3 minutes for an issue (some even less) , but is it really that critical?
There may be some areas that are critical, but really you don't need every issue checked that often, so most issues may be able to be on a 15 to 20 minute check or even longer.

4) What happens to the events that are triggered, who is going to see them?
Is someone going to be watching a console, or is an email to be sent? How quickly will the issue need to be responded to?
It has been seen that ITM can get flooded with one or two events which are raised at too high a frequency when something does go wrong, as the events are either ignored or the issue takes time to solve.

5) The same with sampled situations, like the LO agent reading a log file and sending the messages in, it needs thought out which messages are picked from the file and sent as an event.
If there is a critical issue in the log, check which is the most critical message that gives the information about what is really wrong.
Suddenly the agent sending in every message from the file can mean overwhelming the people that are trying to find and fix the issue with messages.
Not to mention the load it puts on the ITM environment, so that other issues may go unnoticed or in the worst case scenario a TEMS fails.

So any situation should be looking for a non normal occurrence of what ever it is monitoring . This means that it only fires when an issue is there.
The danger of it firing too often (as well as the load on the environment) is that the event is ignored as it is normal to see.

Once a situation does fire, what happens next should also be thought of. So once the event is seen, is someone expected to do an action on the machine?
There could be an automatic Take Action done as well or instead of a manual process.
However there should be some process for the event, so that the condition that made it fire is resolved.

Once it has been decided that a situation is needed, how it is written also makes a difference on the performance of the environment.

Basic rule here is to keep it simple.
If the situation is complex it may need to be filtered at the TEMS not the agent, that doesn't matter if it is one or two agents, but if you have a TEMS with 500 agents all with the same situation needing TEMS power to calculate, then the TEMS may take a performance hit.

The following article gives a report that can be run to check for this:

https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitworld_itm_situation_audit?lang=en

This article talks about situation limits:

https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitworld_situation_limits?lang=en

and this article talks about writing efficient situations:

https://www.ibm.com/developerworks/community/blogs/gwang/entry/writing_efficient_situations?lang=en

there are other articles out there, check to see, there is a lot of advice on writing situations if you look.

Subscribe and follow us for all the latest information directly on your social feeds:

Check out all our other posts and updates:
Academy Blogs:	https://goo.gl/U7cYYY
Academy Videos:	https://goo.gl/TLfMoF
Academy Google+:	https://goo.gl/HnTs0w
Academy Twitter :	https://goo.gl/HnTs0w

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11083777

Tips

Performance issues Part 2 Situations

Technical Blog Post

Abstract

Body

UID

Share your feedback

Need support?