Transferring data requirement files with bsub -f

LSF supports the transfer of job data between the submission host and the execution host as part of the job through the -f of the bsub command. If your workflow already includes data transfer with -f, you can use LSF Data Manager to transfer files instead of the default lsrcp command.

Before you begin

Check that the following files are installed to support the bsub -f command through LSF Data Manager:
  • An external submission (esub) script esub.datamanager is installed in the LSF_SERVERDIR directory. This esub converts -f options to -data options.
  • A wrapper script lsrcp.wrapper.datamanager that runs either the original lsrcp binary, or the bstage command is in the LSF_BINDIR directory.

About this task

The esub script can be run manually or automatically. To run esub.datamanager automatically, configure the LSB_ESUB_METHOD=”datamanager” parameter in the lsf.conf file. To run the esub manually, or to override the current LSB_ESUB_METHOD configuration, specify bsub -a datamanager when you submit the job. The esub must be named esub.datamanager. The following steps assume that LSB_ESUB_METHOD=”datamanager” is configured, so no -a option is needed.

Important: You cannot specify a data specification file with -f. You must use -data with data specification files.

Procedure

Use the bsub -f command to submit your job.
bsub -f "[local_file operator [remote_file]]"
To specify multiple files, repeat the -f option.
  • local_file is the file in the staging area.
  • remote_file is the file on the execution host.

local_file and remote_file can be absolute or relative file path names. You must specify at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file.

The following values are valid for operator:
>, @>

The > operator copies the local_file in the staging area to remote_file on the execution host before job execution. remote_file is overwritten if it exists. The @> operator creates a symbolic link from local_file in the staging area to remote_file on the execution host before job execution.

With the > operator, the following bstage command is used:
bstage in –src src_file –dst dstfile
With the @> operator, the following bstage command is used:
bstage in –src src_file –dst dstfile -link
<, @<

The < operator copies remote_file on the execution host to local_file in the staging area after the job completes. local_file is overwritten if it exists. The @< operator creates a symbolic link from remote_file on the execution host to local_file in the staging area after the job completes.

With the < operator, the following bstage command is used:
bstage out –src src_file –dst dstfile
With the @< operator, the following bstage command is used:
bstage out –src src_file –dst dstfile -link
<<

The << operator appends the remote_file to local_file after the job completes. If the local_file does not exist, it is created.

Runs the lsrcp -a command instead of bstage.

><, <>

Copies the file local_file to remote_file before the job runs, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><.

Runs the bstage in –src src_file –dst dstfile command before the job starts, and runs the bstage out –src src_file –dst dstfile command after the job completes.

Example

The following example assumes that LSF Data Manager is enabled and running properly, and that LSB_ESUB_METHOD=”datamanager” is configured in lsf.conf as the default esub method, so -a datamanager is not needed.

To submit the job myjob to LSF, with input from the file /data/data3 and the output that is copied back to /data/out3, run the following command:
bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3

For transfer from the staging area to the execution host before job execution (with >), esub.datamanager converts the -f options to the appropriate -data options.

The esub modifies the job environment variables by setting LSB_LSRCP_DO_BSTAGE="Y" in the job execution environment.

If the esub finds no errors, the job submission is accepted and mbatchd sends the data requirement to the data manager to start a transfer job. After the transfer job completes, the user job is scheduled and dispatched.

LSF sets the job environment on the execution host, and runs the lsrcp.wrapper.datamanager script before the actual job runs. The lsrcp.wrapper.datamanager script runs the bstage in command to copy the file /data/data3 to the data manager cache before the job starts.

For transfer from the staging area to the submission host (with the < operator), no -data option is needed since the < transfer operator copies a file from the execution host to the submission host. The lsrcp.wrapper.datamanager script runs after the job completes. The script runs the bstage out command to copy the /data/out3 file to the submission host.