Question & Answer
Question
Set HOST_PRE_EXEC in the queue to run a bash script on all execution hosts which starts an additional background process. When the parent script on each host finishes, the background process is killed on all hosts except for the first execution host, yet they should continue running.
Example
- Your job is dispatched to host1, host2 and host3.
- Before the actual job runs, the parent bash script starts up and creates a background process on all 3 hosts.
#!/bin/bash ./background_process & sleep 20 - After the parent script finishes sleeping, the "background_process" is killed on hosts2 and hosts3, but still runs on host1.
Cause
The background process in HOST_PRE_EXEC is started by the job sbatchd. By LSF design,
1. On the server nodes, the job sbatchd will kill the process group (include himself) after the
pre_exec finishes, so the background process is also killed.
2. On the head node, sbatchd works until the job is finished, so the background process
exists when the job is running.
1. On the server nodes, the job sbatchd will kill the process group (include himself) after the
pre_exec finishes, so the background process is also killed.
2. On the head node, sbatchd works until the job is finished, so the background process
exists when the job is running.
Answer
Workarounds
- Start the background process in the job script instead of the pre_exec script.
- Use C shell instead of bash. This disassociates the background process from the parent and not treat it as under the same pgid.
#!/bin/csh
./background_process &
sleep 20
[{"Product":{"code":"SSETD4","label":"Platform LSF"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"9.1.3","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Was this topic helpful?
Document Information
Modified date:
21 April 2021
UID
isg3T1024572