Staging data in and using it in your job

The bstage in subcommand copies or symbolically links data files from the job cache into the job's execution environment.

About this task

Use the bstage in subcommand to copy or symbolically link files that were pre-staged into the job cache to the job’s execution environment.

Procedure

  • If a job was submitted with a data file requirement, use the same data file path in the bstage in -src option.
    bstage in -src host_name:/file_path -dst file_path

    The host name and file path in the -src option must match the data requirement that is defined in the bsub -data option when you submi the job. After the bstage in command runs, the job can use the requested data file at the location that is specified in the -dst option.

    For example, if you submitted the job with the following data requirement:
    bsub –data "hostA:/data/file1.dat"
    Run the following command in your job script or pre-execution script to copy the requested file from the staging area cache into a known location in the job's execution environment:
    bstage in –src "hostA:/data/file1.dat" –dst /tmp/file1.dat

    The job can use the job data file at /tmp/file1.dat.

  • If a job was submitted with a directory as the data file requirement, use the same directory file path in the bstage in -src option.

    bstage in -src hostname:/dir_path -dst dir_path

    After the bstage in command runs, the job can use all data files in the directory at the location that is specified in the -dst option.

    For example, if you submitted the job with the following data requirement:
    bsub –data "hostA:/data/dir1"
    Run the following command in your job script or pre-execution command to copy the requested file from the staging area cache into a known location in the job's execution environment:
    bstage in –src "hostA:/data/dir1" –dst /tmp/dir1

    LSF interprets the path in the -dst option as a directory and all staged files from the directory in the cache are copied into this folder. The original directory structure is preserved.

  • Copy all files in the job's submission-time data requirements into the job's execution environment by using the bstage in -all option.
    bstage in -all -dst dir_path

    This command copies all files in the job's submission-time data requirements into the location that is specified in the –dst option. LSF interprets the path in the -dst option as a directory, and all files are copied into the destination in a flat structure that uses their original file names. If multiple files have the same name, the multiple files might overwrite one another.

What to do next

Note:
  • If you do not specify a host name in the -src option, LSF uses the job's submission host by default.
  • You can use the –link option to create a symbolic link to the files in the cache instead of using an extra copy. The staging area must be directly mounted on the job execution host to create the link.