Netcool OMNIbus is in use in more than 2000 enterprises and service providers worldwide. The business issues that OMNIbus addresses will be different, at least in detail, across such a range of users, so it's quite likely that a business issue which OMNIbus could address has already been seen elsewhere and a solution put together. OMNIbus and the other Netcool products are customisable to a high degree and tools and automations can be, and have been, written by partners and customers as well as by IBM's own Best Practice and development teams. Within IBM there are OMNIbus experts whose experience in implementing Netcool solutions goes back some 15 years and covers a wide range of industries, and on the Network Management and Service Assurance blog we will gather and post some of that accumulated knowledge and experience in the hope that it may prove useful to readers.
If we can assist in the implementation of a solution by providing an install script or rulesfile modifications then they will be posted on Service Management Connect and a link will be provided from the blogs
Creating Event Containers
We've all seen the scenario where something happens, there are a number of alarms but not actually one from the root cause. For example in a data centre a power breaker trips and as a result a dozen or two alarms from servers and UPS boxes hit the Active Event List. However generally not an alarm from the actual circuit board reporting the power trip. This means that our root cause analysis doesn't actually have the root cause event to pin the fault to. This can be a problem if our procedures require such a root cause event to source the trouble ticket. That's a simple example that might apply to an enterprise, but as we move towards cloud computing or start to manage more and more "things" it's likely we will encounter situations where the root cause is in a domain not visible to us more frequently. In those situations it would be useful to be able to synthetically create a container event which can be used as the root cause in problem management processes.
The scenario we initially looked at was a smart grid one. Smart meters can alarm on loss of power, and that is an important indication of service loss. However since a power failure is likely to hit an entire street many smart meters will report this but usually the local substation will not be instrumented to generate an alarm, and even if it was, in the fragmented world of power distribution ordained by the authorities to encourage competition, substation and customers may be owned and managed by different companies. So what we wanted was an event that could be used as the container event for multiple smart meter events, that would be the one displayed in the higher level UIs and be used to launch problem management processes. An automation was written that would:
- react when two or more smart meters with the same postcode (zip code) reported the same alarm
- if it is known that only one meter is installed in a postcode then create a container alarm for that
- create a synthetic alarm with the postcode as the node name and recording the number of alarms covered by the container
- record the serial of the synthetic alarm in the reported alarms
- if Service Request Manager responds to a trouble ticket creation by inserting a TT number in the synthetic alarm that that number is cascaded down to the relevant reported alarms
- maintain currency of the synthetic alarm as new alarms come in or existing ones are cleared
- clear the synthetic alarm when all or all but one of the reported alarms in the container are cleared
Although the scenario here is one of smart meters, the automation uses the contents of standard fields such as @Class and @Location. This means that by modifying the filter set by @Class to look at a different range of alarms the automation can work for other alarm and equipment types.
The result can be seen in this screen shot
The district alarm panel contains five synthetic alarms which summarise 42 individual meter alarms, and as shown, a work order number is cascaded down to the individual alarms covered.
The automation and its install script are posted here along with a capture file and the necessary rules and lookups for a stdin probe so that the automation can be trialled and/or demoed