Troubleshooting
Problem
Ideally, log file is created immediately upon program execution and is updated later. Customer reports that they observe significant delay from the time job is triggered and the time log file creation starts.
Symptom
Based on bhist command output, job started at 10:37 but log file was created at 10:49 which is the timestamp of job completion.
The delay just happens in one environment (e.g. LEV1) while the other (e.g., LEV2) works as expected.
Diagnosing The Problem
As long as the job is properly dispatched to LSF and completes successfully, the delay in creating job log file is probably not related to an LSF problem.
To confirm if the delay in creating job file is related to LSF, we can run the same job command (date&bsub <job command>) on the same execution host that are shown in bhist output of the problematic job outside LSF(directly from console) to see if the job log file can be created immediately when the job starts.
Also, we can run "date&bsub<job command>" for one of the jobs that have no delay issue outside LSF in the other environment (LEV2) to further confirm if the job log will be created immediately.
For this case, it turns out that job itself can create log file immediately on the good environment (LEV2) and there is delay in creating log file in LEV1. So customer needs to contact their system admin to further investigate the issue, for example, the delay may be caused by metadata caching which prevents writing flushes to disk immediately.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1027136