Troubleshooting
Problem
When LSF clean period arrives, LSF sends an incorrect email to show "Unable to remove spool file" although files under JOB_SPOOL_DIR have been removed successfully after the job finishes.
Symptom
1. Defined JOB_SPOOL_DIR in lsb.params
$: bparams -l | grep -i spool
JOB_SPOOL_DIR = /tmp/
2. Wrote a job command under /tmp/
$: cat /tmp/job.sh
#!/bin/sh
ls
sleep 20
3. Submitted a command spooled job like below:
bsub -Zs /tmp/job.sh
The job finished successfully and the spooled files were cleaned as well.
4. When CLEAN_PERIOD reached LSF admin would receive an email like below:
$: mail
U22105 LSF Mon Apr 4 21:57 53/1773 "Job 1018: </tmp/job.sh> in cluster <cluster_name> Done"
>N22106 LSF Mon Apr 4 22:02 18/681 "mbatchd on <hostA> in cluster <cluster_name>: childRemoveSpoolFile: Unable to remove spool file:"
Cause
It is a product defect.
Resolving The Problem
Download the patch from IBM Fixcentral to fix it. You can get the patch from the following link:
http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=Platform%2BComputing&product=ibm/Other+software/Platform+LSF&release=All&platform=All&function=fixId&fixids=lsf-9.1.3-build399905&includeSupersedes=0
Refer to the patch readme for how to install the patch.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1023900