Cluster selection for remote jobs with data requirements
A job with an input data requirement can be forwarded only to a remote cluster with LSF data manager enabled.
- The candidate execution clusters for the job, as determined by the job-level cluster requirement for the job (the bsub -clusters command) and the queue-level forwarding destination (the SNDJOBS_TO parameter in the lsb.queues file)
- Data availability information from all candidate clusters for the job
If the local cluster is one of the candidate clusters, LSF always tries to dispatch the job locally first. If the job has an input data requirement, local data readiness is mandatory; the local data manager triggers data transfer as on a single cluster. Data availability information is sent to LSF until local data is cached.
The local data manager queries data availability from each remote data manager through the connections that are configured in the RemoteDataManagers section in the lsf.datamanager file. The parameter REMOTE_CACHE_REFRESH_INTERVAL controls how long to cache the query information at local data manager daemon.
bread 128
JOBID MSG_ID FROM POST_TIME DESCRIPTION
128 0 root Sep 3 16:23 DATA AVAILABILITY: cluster2 100 cluster3 0
When the mbatchd daemon receives the data availability information from the local LSF data manager, LSF schedules the job. The data availability information is treated as a data preference. IBM® Spectrum LSF multicluster capability combines this preference with other job forwarding policies to decide which cluster to send the job to.
LSF contacts the local dmd daemon to refresh the data availability information only if the job is not forwarded, or the mbatchd daemon is restarted. The information is not updated otherwise, and it might not reflect the latest cache status.
For example. when the execution cluster with LSF data manager runs in a public-cloud infrastructure, the forwarded job’s data requirements from the submission cluster are pushed to the execution cluster and the forwarding job's output is pulled back from the submission cluster. The execution cluster requests to access the submission cluster for handling the data transfer.