IBM Support

Background process started by HOST_PRE_EXEC exits when its parents exit on server nodes

Question & Answer


Question

Set HOST_PRE_EXEC in the queue to run a bash script on all execution hosts which starts an additional background process. When the parent script on each host finishes, the background process is killed on all hosts except for the first execution host, yet they should continue running.
Example
  1. Your job is dispatched to host1, host2 and host3.
  2. Before the actual job runs, the parent bash script starts up and creates a background process on all 3 hosts.
    #!/bin/bash ./background_process & sleep 20
  3. After the parent script finishes sleeping, the "background_process" is killed on hosts2 and hosts3, but still runs on host1.

Cause

The background process in HOST_PRE_EXEC is started by the job sbatchd. By LSF design,

1. On the server nodes, the job sbatchd will kill the process group (include himself) after the
pre_exec finishes, so the background process is also killed.

2. On the head node, sbatchd works until the job is finished, so the background process
exists when the job is running.

Answer

Workarounds
  • Start the background process in the job script instead of the pre_exec script.
  • Use C shell instead of bash. This disassociates the background process from the parent and not treat it as under the same pgid.
#!/bin/csh
./background_process &
sleep 20

[{"Product":{"code":"SSETD4","label":"Platform LSF"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"9.1.3","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
21 April 2021

UID

isg3T1024572