Use LSF Data Manager for data staging
LSF Data Manager is an LSF add-on that stages required data as closely as possible to the site of the application.
When large amounts of data are required to complete computations, it is important for your applications to access the required data without being affected by the location of the data in relation to the application execution environment. LSF Data Manager can stage input data from an external source storage repository to a cache that is accessible to the cluster execution hosts. LSF Data Manager stages output data asynchronously (dependency-free) from the cache after job completion. Data transfers run separately from the job allocation, which means more jobs can request data without consuming resources while they wait for large data transfers. Remote execution cluster selection and cluster affinity are based on data availability in the LSF multicluster capability environment. LSF Data Manager transfers the required data to the cluster that the job was forwarded to. This method of moving data files is best suited to situations with large amounts of data, and there are opportunities for data reuse.
LSF Data Manager is also useful when moving data between clusters (for example, from a local cluster on premises to a cluster in the cloud). This scenario uses LSF Data Manager as a data gateway when using the LSF resource connector to make data available to virtual machines in the public cloud. This scenario uses the LSF multicluster capability and is set up as follows:
- Install an on-cloud LSF cluster with one management host.
- Connect the local LSF cluster to the on-cloud LSF cluster using the LSF multicluster capability.
- Configure the LSF
resource connector in the on-cloud LSF
cluster only.
The on-cloud LSF cluster grows or shrinks using the LSF resource connector based on demand.
- Install LSF Data Manager to both the local and the on-cloud LSF clusters. LSF Data Manager ensures data availability.
You can also configure LSF so that if LSF Data Manager is installed and a user runs the bsub -f command option, LSF Data Manager is used to transfer the files instead. For more details on how to use bsub -f with LSF Data Manager, refer to Transferring data requirement files with bsub -f.
When using LSF with LSF Data Manager, you must enable passwordless ssh between the I/O nodes in the transfer queue and all file servers (source hosts for stage in, and destination hosts for stage out). Any compute node that does not directly mount the staging area must also have passwordless SSH access to the staging area's configured file servers.