Sitworld: An Efficient Design for Starting a Background Process
John Alvord, IBM Corporation
A customer wanted to start an agent related background task. The Linux/Unix systems involved had an OS Agent installed but the customer did not have system administrator rights. They could not login and add things like crontab tasks. They could transfer scripts using either tacmd putfile or using a remote deploy process with a non-agent package.
The first design was to have an always true situation, for example a formula like
The Local Time attribute is available in most agents and certainly in OS Agents. The sampling interval would be long, like 999 days. The action command would start the process in the background using a trailing ampersand. The action command would the the same one as in the next section.
The performance problem was that a situation event would be created, potentially for 20,000+ agents. Even if it was hidden from the Portal client display [not associated] and not sent to an event receiver like Omnibus, a record had to be kept and several TEMS database tables would be a lot larger in that case. The TEMS process size would be larger and SITMON processing would be slowed a bit. The memory issue is probably more important than CPU use issue.
The sample program objects discussed below are here. The solution uses a workflow policy [Edit/Workflow Editor...] That will be new to many but it is a standard part of the Portal Client and works well. Don't be afraid of the unknown!! This specific case is the simplest possible - two activities with a single link. The workflow policy export is part of the example program objects.
This efficient design uses a situation IBM_start_ibmon123 which is not auto started and an auto-started workflow policy IBM_policy_start_123 which runs one time [per agent connection]. The Situation sampling interval is set to 999 days. The situation distribution is to all the agents where the background script should be running.
When a workflow policy waits for a situation true result, the situation is started at the agents according to the situation distribution but only for evaluation and delivery of results to the workflow policy. The Policy distribution is set to the same as the agent. [The Policy distribution can include more agents but the policy activity is driven only by the incoming situation results]. This policy is correlated by managed systems, policy auto start is on and restart is off.
In this usage. the situation will never create an event [unless otherwise started as a situation]. Any situation action command is ignored. It only returns results according to its sampling interval. Since the sampling interval is 999 days, results are sent to the TEMS when started and then not again for 999 days so the performance impact on the TEMS is minimal. There is a minor cost of maintaining the situation objects at the Agent.
The performance saving no events are created or stored at the TEMSes.
Here is a screen capture of the workflow policy editor.
A typical workflow policy waits for a situation event and then a link to one or more activities – in this case a Take Action activity. This sort of action command is limited to 255 characters. By comparison a situation action command run at the agent is limited to roughly 440 characters.
The action command in this case has three requirements
- Test for process already running, if yes exit
- Test for script present in the expected file location, if not exit
- Start script in the background.
For completeness add a *MISSING process situation running to identify the cases where the background process is not running… perhaps because the script was not yet loaded into that environment.
Here is the example action command and then a detailed explanation. In this example the process name includes “ibmon123”.
ps -e -o args | grep -v grep | grep ibmon123 || find $CANDLEHOME/tmp/. | grep ibmon123\.pl && (perl $CANDLEHOME/tmp/ibmon123.pl &)
ps -e -o args | grep -v grep | grep ibmon123 ||
==> Check running processes for ibmon123. If not present || then continue else exit.
find $CANDLEHOME/tmp/. | grep ibmon123\.pl &&
==> Check expected location for the ibmmon.123.pl script
==> If present && continue else exit
(perl $CANDLEHOME/tmp/ibmon123.pl &)
==> run the Perl program in the background [trailing &]
There was no need to work out a solution for Windows. If you want to experiment, here is how to start a command in the background.
start /min cmd /c perl $CANDLE_HOME\bin\ibmon123.pl
$CANDLE_HOME is the Windows environment variable for installation path.
Workflow Policy Notes
The Policy receives results and not situation events. If you want to model the logic of a situation event, then the layout would look like this:
<wait for sit true>--><wait for sit false>
That is not important here since the sampling interval is 999 days. However if a situation sampling interval is 5 minutes, a series of true results will end up driving a series of action commands. If you want just one command, you need to wait for the sit false case. If you want a series of commands then leave it out.
The policy Take Action command ends when the command successfully starts. For example if the command had a sleep 120, the command itself would not actually complete for two minutes plus the other command time. That is important if you need to perform two commands in a coordinated way. For that case you could follow the first action command with a Take Action Delay for maybe 150 seconds and then the second command all connected together.
In your policy testing, it is very useful to review the Agent Operations log, e.g. <agentname>:KUX.LG0 on Unix. You will see the command being performed and the start status code. Status 0 is good and non-zero is some failure. That is NOT the command exit code. Usually you need to build out complicated commands slowly. Use the echo command to a /tmp log file to watch results. On Linux/Unix the echo $? will show the most recent exit code. It takes time to thoroughly test a complex action command in line mode and then in an action command. Even the one above - which looks pretty easy - had two errors during development. Slow and careful wins the day.
A final note, the workflow policy is sensitive to DisplayItem. If multiple results are returned they are all handled in separate policy threads.
This post showed how to start a background task in Linux/Unix using ITM facilities
Kudos to Bernie Garness at Mayo Clinic who called attention to this area of inefficiency.
Note: Double Rainbow on Coast Ridge Road - 15 March 2013
Triple and quadruple rainbows have been observed. See here for scientific background and other details.