IBM Support

Starting and Stopping ITM situations using external operations

Question & Answer


Question

How can situations be stopped and started from non-ITM processing?

Answer

.Overview
This document shows how to start and stop ITM situations via external operations according to a precise schedule. 'External operations' means some programmed process separate from ITM.

The Overseer Workflow Policy Pattern detailed in this technote

http://www.ibm.com/support/docview.wss?uid=swg21390602

achieves the goal of starting and stopping situations, but has the imprecision of a sampled time situation. The precision can be increased to plus or minus 30 seconds, but with the side effect of increasing utilization on the TEMSs involved. With a less intensive sampling time of 5 minutes, the action time can be up to 5 minutes after the desired time.
Using the external operations detailed here, if the ITM environment is behaving normally, situations will stop in a few seconds of the target time. This new scheme extends the Overseer scheme by adding precise timing sources. The Overseer scheme is still present – it is needed for cases when the TEMS has just started and also when the precise time source fails for some reason. However the Overseer scheme is split into three slices, so that a precise timing function can inject start and stop schedule requests.

The Overseer technote shows exactly how to create a workflow policy step by step. You can use it if you are new to authoring workflow policies.


Slices
A slice here means one part of the whole solution. A slice consists of one or two situations, and a workflow policy. In this example, the slices communicate using Universal Messages, but they could use other techniques like marker files or logs.

Situations used by workflow policies do not have to be started and generate events, etc. They are used in an evaluation-only mode, and each result is processed by the workflow policy. The situations can still be started if they are needed for another purpose.

If you are unfamiliar with Universal Messages and the Universal Message Console, here is a technote which shows how to view them and how to set environment variables that increase the maximum rows from the default of 256:

http://www.ibm.com/support/docview.wss?uid=swg21377737

One quirk with the situation formula for universal messages is that the first test must be against the attribute “Originnode”. In the example, situations use a formula which checks to see that the attribute is not null, which satisfies the requirements for such cases.


Schedules and Situations
In production use, you will have multiple situations that are stopped and started at the same time on the same type of agent; we'll call this a "schedule." For this example a single schedule and a single situation will be controlled for simplicity.

If a situation is in started status, it can be started again with no problem. It is also fine to stop a situation already in stopped status. The start and stop process updates some TEMS tables and going from Start to Start or Stop to Stop does not influence the running of the situation. In addition, the start/stop situation process must be performed on the TEMS to which the agent is or may be connected. Thus one policy definition may be running on multiple TEMS's at the same time.

This example splits the Overseer solution into three slices. In addition to the Overseer start/stop process, platform specific precise timing services are used to start and stop situations. The Overseer solution acts as backup in case the precise timing facilities stop operating for any reason. Having both operating is no problem as explained above.


Goal 1 - Situations stopped or started using a workflow policy
The original overseer design uses a time schedule situation and the workflow policy looks like this:


See the Overseer technote for a complete discussion.

For this example the overseer solution is broken into three slices each consisting of a situation and a workflow policy.


Slice 1 – Recognition of time schedule
The situation driving the workflow policy is named IBM_time_schedule1 and evaluates true when the schedule1 situations should have Start status, and false otherwise. It will be evaluated in the workflow policy and never run stand-alone. The following shows the situation should be running from 8am to 6pm every day.

Attribute Group: Local Time
Formula: Formula: Time >= 80000 AND Time < 180000
Sampling Interval: 5 minutes
Run at Startup: Off
Distribution: *ALL_CMS
Action: none

Practical time schedule situations will be more complex. Develop and test such schedule situations thoroughly. The workflow policy is named IBM_policy_schedule1 and looks like this:



Run on Startup: Yes
Restart: Yes
Distribution: each TEMS named separately
Correlated: Managed System

The first activity waits until the situation IBM_time_schedule1 evaluates True. When that happens a universal message is sent with the following characteristics:

Location of message writing: the Agent, which is the TEMS in this case.
Category: SCH
Severity: 0
Message text:
start schedule1 &WaitOnSituation1:Local_Time.Timestamp

A second activity waits until the situation IBM_time_schedule1 goes false.
When that happens a universal message is sent with the following characteristics.

Location of message writing: the Agent, which is the TEMS in this case.
Category: SCH
Severity: 0
Message text:
stop schedule1 &WaitOnSituation1:Local_Time.Timestamp

Note: the timer schedule situation and the workflow policy distributions must include the TEMSs to which the agents running the situation are connected. It doesn’t cost much to run them on all TEMSs, but please limit the situations and policies to just the required TEMSs if it is not all of them.

Output of Slice 1: A universal message of category SCH which indicates that the schedule1 should be started or stopped. That is produced on all TEMS.


Slice 2 – Start schedule1 situations
Situation IBM_start_schedule1 is a situation that waits for the SCH universal message with the correct text.

Attribute Group: Universal Message
Formula: Formula:
( Originnode != '' AND Category == SCH AND
SCAN(Message Text) == 'start schedule1')
Sampling Interval: 0 [Pure situation, not sampled]
Run at Startup: Off
Distribution: Each TEMS, hub and remote named individually
Action: none

The workflow policy is named IBM_schedule1_ start and looks like this:




Run on Startup: Yes
Restart: Yes
Distribution: Each TEMS, hub and remote named individually
Correlated: Managed System

The first activity waits until the situation IBM_start_schedule1 evaluates True. That comes from the Universal message produced by the IBM_policy_schedule1 policy or from a precise timing universal message.

The second activity starts the situation. This will put the situation in 'Started' status for all the Agents connected to this TEMS. If the situation was previously stopped, it will now be started. If an agent connects later on, it will put the situation into 'Started' status if needed. Many situations can be started via added Start Situation activities linked from the initial wait for situation activity.

The IBM_uptime situation is against the Linux System Statistics attribute group and the formula is “System Uptime >= 1 Day”. Testing with a formula that is predictably true simplifies testing.

Output of Slice 2: situation started on agents and some messages.


Slice 3 – Stop schedule1 situations
Situation IBM_stop_schedule1 is a situation that waits for the SCH universal message. It also checks for schedule1 in the text.

Attribute Group: Universal Message
Formula: Formula:
( Originnode != '' AND Category == SCH AND
Message Text == 'stop schedule1')
Sampling Interval: 0 [Pure situation, not sampled]
Run at Startup: Off
Distribution: Each TEMS, hub and remote named individually
Action: none

The workflow policy is named IBM_schedule1_ stop and looks like this:




Run on Startup: Yes
Restart: Yes
Distribution: Each TEMS, hub and remote named individually
Correlated: Managed System

The first activity waits until the situation IBM_stop_schedule1 evaluates True. That comes from the Universal message produced by the IBM_policy_schedule1 policy or from a precise timing universal message.

The second activity stops the situation. This will put the situation in 'Stopped' status for all the Agents connected to this TEMS. If the situation was previously started, it will now be stopped. If an agent connects later on, it will put the situation into 'Stopped' status if needed. If multiple situations are started earlier, the same situations should be stopped now.

Output of Slice 3: situation stopped on agents and some messages.


Summary so far
At this point we have just replicated the function of the Overseer policy, which by itself has not yet gained us anything. However, the logic has been broken into Slices so that the start/stop situation logic can be triggered through a second mechanism.


Goal 2 – Create a precise timing universal message
The Windows and the Linux/Unix platform have timing schedulers. Linux/Unix has cron (and accompanying crontab editor) and the cron daemon. Windows has the AT command. This example will use cron and the kshsoap command, which is used to create a universal message at the correct time. kshsoap is documented here in the Administrators guide:

http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.2fp2/soapclient_kshsoap.htm

It takes two files as input: the first file is the soap request and the second file is the URL of the SOAP process on the hub TEMS. For this example, the files are stored in the /opt/IBM/ITM/HTML directory. The files for starting the situations associated with schedule1 are "schedule1_start.txt" and "url.txt". Here are the contents:

schedule1_start.txt:
<CT_WTO>
<data>start schedule1 kshsoap</data>
<category>SCH</category>
<severity>0</severity>
<userid>user</userid>
<password>password</password>
</CT_WTO>

url.txt:
http://<hub_tems_hostname>:1920///cms/soap

The files for stopping the situations associated with schedule1 are "schedule1_stop.txt" and the same "url.txt." Here is the content of "schedule1_stop.txt":

<CT_WTO>
<data>stop schedule1 kshsoap</data>
<category>SCH</category>
<severity>0</severity>
<userid>user</userid>
<password>password</password>
</CT_WTO>

Assuming CANDLEHOME is the standard "/opt/IBM/ITM", to execute the soap request, do the following:

1) export CH=/opt/IBM/ITM
2) cd into directory where kshsoap executable is located
3) ./kshsoap $CH/HTML/schedule1_start.txt $CH/HTML/url.txt >/dev/null 2>&1

While STDOUT is suppressed via ">/dev/null", during development you should look to check for signs of failure. When working correctly one of the last lines will be

</SOAP-CHK:Success></SOAP-ENV:Body></SOAP-ENV:Envelope>

The next testing step is to run the above command and observe the results. In my test environment, I have an added Enterprise level workspace which displays the Universal Message Console from the hub only and the result looked like:



Nothing was waiting on the universal message, so there was no subsequent activity. Curiously, I had to navigate away from the display and then back to get it updated.
Now you'll want to add the above to your crontab table. FYI, to locate the "kshsoap" executable, issue:

find /opt/IBM/ITM -name kshsoap

Use crontab -e to get the cron table into edit mode and then add lines to it as follows for an every day, 8am to 6pm schedule ($KSHSOAP_PATH represents the output of the above find command):

0 8 * * * <user> $KSHSOAP_PATH/kshsoap /opt/IBM/ITM/HTML/schedule1_start.txt /opt/IBM/ITM/HTML/url.txt >/dev/null 2>&1
0 18 * * * <user> $KSHSOAP_PATH/kshsoap /opt/IBM/ITM/HTML/schedule1_stop.txt /opt/IBM/ITM/HTML/url.txt >/dev/null 2>&1

For Windows the AT command is used.

schedule1.bat:
at 08:00 c: & cd \IBM\ITM\cms & kshsoap html\schedule1_start.txt html\url.txt
at 18:00 c: & cd \IBM\ITM\cms & kshsoap html\schedule1_stop.txt html\url.txt

The functionality is the same although the file references are different. In Windows, a single "&" rather than a "&&" can be used to string multiple commands together.


Goal 3 – Link the precise timing trigger to the start/stop
For the hub TEMS we are ready to test. The Universal Message will be seen by the start or stop schedule workflow policy. If remote TEMS's are involved, the universal message must be relayed from the hub to the remotes. Start with a situation IBM_watch_sched which monitors the Universal Message Console.

Attribute Group: Universal Messages
Formula: Formula: ( Originnode != '' AND Category == SCH)
Sampling Interval: 0, pure situation
Run at Startup: Off
Distribution: *HUB
Action: none

When testing this situation, start it manually, then run a kshsoap command and see if it alerts. The Workflow policy IBM_relay_schedule looks like this:



The write message activity looks like this



For each of the remote TEMSs involved, a universal message is sent. The Category is SCH, the Severity is 0, and the Message text is copied from the incoming one from the situation event. This policy will have to be updated when more remote TEMS are added. The Workflow Policy Take Action activities have “Execute the action at” target option to let you do this.

In this way, the kshsoap universal message is relayed to the remote TEMS's. This Slice can be used for multiple schedules.


Conclusion
At this point, your implementation needs to be thoroughly tested to make sure all the pieces work together. In production environments, you will be defining multiple schedules – the times involved, the agents involved, the situation involved and then creating the situations and workflow policies and precise timing functions to make it all work smoothly.


Notes:
1) When agents are configured to report to multiple remote TEMSes, you must review the overseer workflow policy technote section “Agent Switching Issues.” As a quick summary, the situations considered for workflow policy start/stop activity are limited to the ones which were known to be needed at the remote TEMS at TEMS startup time. In an agent switching environment, the secondary remote TEMS may have no agents recorded at startup time. When the agent switches from primary remote TEMS to secondary remote TEMS, the situation start/stop activities will fail. The above technote section shows how to avoid that issue using dummy agents. [A future version of ITM is expected to eliminate this quirk.]
2) This scheme does not have to be timer driven. If you have another process that maintains an “event calendar” and you can cause a command line to be run, then the same scheme can be used to start and stop situations.
3) Workflow policies can also start and stop other workflow policies, which may be useful in some circumstances.
4) This technote has attached files which contain the files and situations and workflow policies. The distributions are empty. This is purely an example of a useful scheme. You will have to develop and test and maintain your own versions based on your own requirements.
5) This scheme makes heavy use of the Universal Message Console. Since this is an internal wrap-around table defaulting to 256 rows, you may want to increase the size. This testing used a size of 1024 rows. The environment variable is documented in the UMC technote mentioned earlier.
6) The kshsoap command does not have to run on the hub TEMS. That might, in fact, be impossible if the hub TEMS is on z/OS. It can run anywhere the program objects are present. This setup does reduce reliability since the alternate server or communications could fail and thus prevent the universal messages from being delivered. For high reliability, you could set the crontab or at commands to run at multiple other servers.
7) The kshsoap is not the only option. You could use a Perl/SOAP solution to generate the universal message. The choice for kshsoap for this example was to avoid creating an unnecessary prerequisite.

[{"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"ITM Tivoli Enterprise Mgmt Server V6","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF035","label":"z\/OS"}],"Version":"All Versions","Edition":"All Editions","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg21462251