Transferring data requirement files with bsub -f
LSF supports the transfer of job data between the submission host and the execution host as part of the job through the -f of the bsub command. If your workflow already includes data transfer with -f, you can use LSF Data Manager to transfer files instead of the default lsrcp command.
Before you begin
- An external submission (esub) script esub.datamanager is installed in the LSF_SERVERDIR directory. This esub converts -f options to -data options.
- A wrapper script lsrcp.wrapper.datamanager that runs either the original lsrcp binary, or the bstage command is in the LSF_BINDIR directory.
About this task
The esub script can be run manually or automatically. To run esub.datamanager automatically, configure the LSB_ESUB_METHOD=”datamanager” parameter in the lsf.conf file. To run the esub manually, or to override the current LSB_ESUB_METHOD configuration, specify bsub -a datamanager when you submit the job. The esub must be named esub.datamanager. The following steps assume that LSB_ESUB_METHOD=”datamanager” is configured, so no -a option is needed.
Procedure
bsub -f "[local_file operator [remote_file]]"
- local_file is the file in the staging area.
- remote_file is the file on the execution host.
local_file and remote_file can be absolute or relative file path names. You must specify at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file.
- >, @>
-
The > operator copies the local_file in the staging area to remote_file on the execution host before job execution. remote_file is overwritten if it exists. The @> operator creates a symbolic link from local_file in the staging area to remote_file on the execution host before job execution.
With the > operator, the following bstage command is used:bstage in –src src_file –dst dstfileWith the @> operator, the following bstage command is used:bstage in –src src_file –dst dstfile -link - <, @<
-
The < operator copies remote_file on the execution host to local_file in the staging area after the job completes. local_file is overwritten if it exists. The @< operator creates a symbolic link from remote_file on the execution host to local_file in the staging area after the job completes.
With the < operator, the following bstage command is used:bstage out –src src_file –dst dstfileWith the @< operator, the following bstage command is used:bstage out –src src_file –dst dstfile -link - <<
-
The << operator appends the remote_file to local_file after the job completes. If the local_file does not exist, it is created.
Runs the lsrcp -a command instead of bstage.
- ><, <>
-
Copies the file local_file to remote_file before the job runs, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><.
Runs the
bstage in –src src_file –dst dstfilecommand before the job starts, and runs thebstage out –src src_file –dst dstfilecommand after the job completes.
Example
The following example assumes that LSF Data Manager is enabled and running properly, and that LSB_ESUB_METHOD=”datamanager” is configured in lsf.conf as the default esub method, so -a datamanager is not needed.
bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3For transfer from the staging area to the execution host before job execution (with >), esub.datamanager converts the -f options to the appropriate -data options.
The esub modifies the job environment variables by setting LSB_LSRCP_DO_BSTAGE="Y" in the job execution environment.
If the esub finds no errors, the job submission is accepted and mbatchd sends the data requirement to the data manager to start a transfer job. After the transfer job completes, the user job is scheduled and dispatched.
LSF sets the job environment on the execution host, and runs the lsrcp.wrapper.datamanager script before the actual job runs. The lsrcp.wrapper.datamanager script runs the bstage in command to copy the file /data/data3 to the data manager cache before the job starts.
For transfer from the staging area to the submission host (with the < operator), no -data option is needed since the < transfer operator copies a file from the execution host to the submission host. The lsrcp.wrapper.datamanager script runs after the job completes. The script runs the bstage out command to copy the /data/out3 file to the submission host.