bstage in

Stages in data files for jobs with data requirements. bstage copies or symbolically links files from the data manager staging area to the job execution host.

Synopsis

bstage in -all [-dst path] [-link]
bstage in -src "[host_name:]/abs_file_path/file_name" [-dst path[/file_name]] [-link]
bstage in -src "[host_name:]/abs_folder_path/[*]" [-dst path[/file_name]] [-link]
bstage in -tag tag_name [-u user_name] [-dst path] [-link]

Description

Copy or symbolically link files from the data manager staging cache to the job execution host. You must specify one of the following options: -all, -src, or -tag tag_name.

If the containing job is an array element, the bstage command checks that a subdirectory exists in the job staging area that corresponds to the array index of the containing job.

By default, the required files are staged into the local staging area cache as soon as the job is submitted. The bstage in command inside the job finds the location of the file in the cache. bstage in copies (cp or scp) or links (ln) the file from the cache location to the job current working directory.

Options

-all

Copy all the files that are requested with the job submission to the job current working directory. The command finds the location of each requested stage in file in the cache. All files are copied to the folder in a flat directory structure. Input files with the same name overwrite one another.

Essentially this option is a shortcut for the following command:
bstage in -src "host_name:/abs_file_path/file_name" -dst path/file_name

To copy entire folders but preserve the directory structure, use the -src option with a directory wildcard (either / or /*).

When you use the asterisk character (*) at the end of the path, the data requirements string must be in quotation marks.

-dst path

The destination folder for the staged files.

The target of the copy can be a relative path to the job current working directory, or an absolute path. If any directories in the path do not exist, bstage in attempts to create them. If you do not specify -dst, the default is the job execution current working directory.

If the path does not exist and -src specifies a single file, the path is interpreted as the destination file to copy to.

If the path exists and -src is a single file, the file is either copied or replaced:
  • If path is a file, the file is replaced with the new file.
  • If path is a directory, the file is copied into the directory under its original name.

If you specify -tag or -all, or you specify -src with a directory wildcard, the destination is interpreted as a folder name relative to the job current working directory. If this directory does not exist, LSF attempts to create it.

-src "[host_name:]/abs_file_path/file_name"
Copy only the file that is requested with the host_name:abs_file_path option in the job submission to your current working directory. The host and file path specification must match the requirement that was specified when the job was submitted. Use the bjobs -data command to see the exact host and file path specification:
bjobs -data 1962
JOBID   USER    STAT  QUEUE  FROM_HOST  EXEC_HOST  JOB_NAME   SUBMIT_TIME
1962    user1   PEND  normal hostA                 *p 1000000 Sep 20 16:31 
FILE                                 SIZE   MODIFIED 
datahost:/proj/user1/input1.dat      500 M   Jun 27 16:37:52 
datahost:/proj/user1/input2.dat      100 M   Jun 27 16:37:52 
datahost:/proj/user1/input3.dat      -      -

You can omit the host name to request files locally accessible on the submission host.

-src "[host_name:]/abs_folder_path/[*]"

Copy the contents of the folder that is requested with the host_name:/abs_folder_path/ option in the job submission to your current working directory. The host and file path specification must match the requirement that was specified when the job was submitted.

You can omit the host name to request files locally accessible on the submission host.

If you specify a folder name without a file name, the absolute path must terminate in a directory (/*) or recursive directory (/) wildcard character. In this case, the -dst option is interpreted as a folder, and all files are downloaded to the appropriate subdirectories, replicating the underlying structure.

When you use the asterisk character (*) at the end of the path, the data requirements string must be in quotation marks.

For example, the following job has a data requirement that requests a recursive directory:

 bsub -data “hostA:/tmp/” ...

LSF stages the entire /tmp directory and all subdirectories on hostA. Your job can then call that directory with the bstage in command:

bstage in -src "hostA:/tmp/" -dst directory

LSF replicates the entire subdirectory structure under directory in the job execution current working directory.

-link
Create symbolic links to the requested source files from the staging area cache location instead of copying them. Use the -link option to avoid unnecessary file copying between the execution host and the staging area. The staging area must be directly mounted on the job execution host to create the link.
-tag tag_name

Copy all files in the local cache that are associated with the specified tag name to the folder specified by the destination option (-dst). If the -dst option is specified, the destination is interpreted as a folder, and the entire directory structure under this tag folder is replicated in the destination.

Use the -tag option when a job uses an intermediate data file that is created by an earlier job. You must have read permission on the tag directory.

Valid tag names can contain only alphanumeric characters ([A-z|a-z|0-9]), and a period (.), underscore (_), and dash (-). The tag name cannot contain the special operating system names for parent directory (../), current directory (./), or user home directory (~/). Tag names cannot contain spaces. Tag names cannot begin with a dash (-).

Use the bdata tags clean command to remove tags.

Important: You are responsible for the name space of your tags. LSF does not check whether a tag is valid. Use strings like the job ID, array index, and cluster name as part of your tag names to make sure that your tag is unique.
-u user_name

By default, your job can stage in files that are associated only with your own tags. Use the -u option to stage in files that are associated with tags that belong to another user name. The CACHE_ACCESS_CONTROL = Y parameter must be configured in the lsf.datamanager file to use the -u option.

You must make sure that the tag exists and that you have appropriate permission to use the files associated with that tag before you submit your job.