IBM® Spectrum LSF job directories
LSF uses different types of directories for jobs to use during execution.
- Job current working directory
- Job spool directory
- Job temporary directory
- Job output directory
Job current working directory (JOBCWD)
The job current working directory (JOBCWD) is the directory where job processes run on. Typically, all job data and results will be generated under JOBCWD. LSF sets the proper JOBCWD before job execution. By default, LSF tries to use the job submission directory as JOBCWD. If the submission directory does not exist, LSF uses /tmp as JOBCWD. Use the bsub -cwd option to allow users to choose a different JOBCWD from the submission directory.
For example:
$ pwd
/home/user1
$ bsub -I pwd
Job <322> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on s1node1>>
/home/user1
$ bsub -I -cwd "/pcc/cust_data" pwd
Job <323> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on s1node1>>
/pcc/cust_data
LSF also supports a dynamic JOBCWD, which allows users to create and manage the JOBCWD dynamically based on configuration parameters, and any dynamic patterns included in the path. Use bsub -cwd with a dynamic pattern to enable a dynamic JOBCWD. LSF dynamically creates JOBCWD if the path for the current working directory includes dynamic patterns for both absolute and relative paths. LSF cleans the created JOBCWD based on the time to live value set in the JOB_CWD_TTL parameter of the application profile in lsb.applications, or in lsb.params.
For example:
$ bsub -I -cwd "/tmp/%J_%I" pwd
Job <324> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on s1node1>>
/tmp/324_0
LSF also supports several other ways to define a static and dynamic JOBCWD.
- JOB_CWD in lsb.applications specifies the current working directory for the job in the application profile. The path can be absolute or relative to the submission directory. The path can include dynamic directory patterns.
- DEFAULT_JOB_CWD in lsb.params specifies the cluster wide current working directory for the job. The path can be absolute or relative to the submission directory. The path can include dynamic directory patterns.
- LSB_JOB_CWD environment variable specifies the directory on the execution host from where the job starts. The path can include dynamic directory patterns.
Job spool directory
LSF creates a job script file and redirects job standard output and standard error to files under the job spool directory. All files under this directory are temporary files and are deleted after the job completes. By default, LSF uses $HOME/.lsbatch as the job spool directory. In most cases, the $HOME/.lsbatch directory is located on a shared file system with limited disk quotas and performance constraints. It is not suitable for a cluster environment where jobs generate a huge amount standard output.
Use JOB_SPOOL_DIR in lsb.params to define an alternative location as job spool directory. This is a cluster-wide setting. When JOB_SPOOL_DIR is defined, LSF creates a job script file and redirects job standard output and error files under the directory specified by JOB_SPOOL_DIR. If your site has a scalable parallel file system like GPFS, you can define JOB_SPOOL_DIR on some directory of the parallel file system. You can also define JOB_SPOOL_DIR=/tmp to use temporary disk space on the local host to reduce I/O competition from multiple jobs. The directory defined in JOB_SPOOL_DIR must already exist and must be accessible by all compute nodes in the cluster.
The following example enables JOB_SPOOL_DIR=/tmp/smc:
- Add JOB_SPOOL_DIR=/tmp/smc in lsb.params, then run
badmin mbdrestart and badmin hrestart all to restart
mbatchd and sbatchd on all hosts. Use bparams
-a to verify the
setting:
$ bparams -a | grep JOB_SPOOL_DIR JOB_SPOOL_DIR = /tmp/smc - To verify that the JOB_SPOOL_DIR function works properly, submit a simple
interactive shell on one host. When the job starts, list the contents of
/tmp/smc to verify that job-specific output files and error files have been
created as follows:
$ bsub -Is /bin/sh Job <314> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on s1node1>> sh-4.1$ cd /tmp/smc sh-4.1$ ls 1379966620.314 1379966620.314.hostAffinityFile 1379966620.314.hostfile 1379966620.314.out
Job temporary directory (TMPDIR)
By default, LSF creates a job-specific temporary directory under $TMPDIR or /tmp for jobs to store their temporary output files and working files. LSF deletes this temporary directory after the job completes. LSF administrators can customize the top-level temporary directory by defining LSF_TMPDIR in lsf.conf. LSF creates a job specific temporary directory as $LSF_TMPDIR/<jobID>.tmpdir for regular jobs and $LSF_TMPDIR/<jobID>_<job_index>.tmpdir for each job array element.
To make the job-specific temporary directory available in the job execution environment, the LSF administrator must define LSB_SET_TMPDIR in lsf.conf. When LSB_SET_TMPDIR=Y, LSF will use the TMPDIR environment variable and overwrite the current value with the job-specific temporary directory. If this parameter is set to the name of another environment variable (for example, MY_TMPDIR), LSF sets the value of this environment variable to the job-specific temporary directory. Either way, user applications can use the environment variable within their code.
The following example enables an LSF job temporary directory:
- Add LSF_TMPDIR=/tmp/smctmp and
LSB_SET_TMPDIR=LSF_TMPDIR in lsf.conf. Run badmin
hrestart all to restart all sbatchd daemons in the cluster. Use
badmin showconf sbd
host_name to verify the configuration
setting.
$ badmin showconf sbd s1node1 | grep TMPDIR LSB_SET_TMPDIR = LSF_TMPDIR LSF_TMPDIR = /tmp/smctmp - To verify that the LSF temporary job directory is working properly, submit a simple interactive
shell on host. When the job starts to run, use echo $LSF_TMPDIR to verify if
LSF set job specific job directory, then go to $LSF_TMPDIR to verify that the
directory exists.
$ bsub -Is /bin/sh Job <315> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on s1node1 >> sh-4.1$ echo $LSF_TMPDIR /tmp/smctmp/315.tmpdir sh-4.1$ cd $LSF_TMPDIR sh-4.1$ pwd /tmp/smctmp/315.tmpdir
Job output directory
By default, when jobs require files copied with the bsub -f option after job completes, LSF will copy files from the job current working directory to the job submission directory if the destination directory is not a full path. You can create and manage job specific output directories with the bsub -outdir submission option and DEFAULT_JOB_OUTDIR in lsb.params. This feature is very useful if you are running applications that have specific job output directory requirements.
The directory path defined in DEFAULT_JOB_OUTDIR or bsub -outdir can be absolute or relative to the submission directory and can include dynamic directory patterns. The job -outdir option overwrites the setting in the DEFAULT_JOB_OUTDIR parameter. Once specified, LSF creates the directory with the 700 permission and with the ownership of a submission user at the start of the job on the submission host. If it fails to create the directory, LSF deletes all created directories and uses the submission directory for file copying.
LSF supports following dynamic directory patterns:
- %J: Job ID
- %JG: Job group (if not specified, it will be ignored)
- %I: Index (default value is 0)
- %EJ: Execution job ID
- %EI: Execution index
- %P: Project name
- %U: User name
- %G: User group
The following example uses a job output directory:
- Check the current submission directory.
$ pwd /tmp - Submit a job specifying job output directory, then copy files back to output
directory.
$ bsub -outdir "outputdir/%J_%I" -o outputfile -f " outputfile < outputfile" hostname Job <320> is submitted to default queue <normal>. - After the job completes, check the directory and
files.
$ bjobs 320 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 320 user1 DONE normal master s1node1 hostname Sep 23 21:02 $ cd outputdir/320_0 $ pwd /tmp/outputdir/320_0 $ ls outputfile