LSF HPC 功能部件
HPC 功能部件作为 PARALLEL 模板的一部分安装在 UNIX 或 Linux 主机上。
注: 现在不推荐使用 HPC 集成,在未来版本的 LSF中可能会将其除去。
安装时,会自动对您进行一些更改。 在 lsf.cluster.cluster_name 文件的 "主机" 部分的 RESOURCES 列下添加相应的资源名称。
HPC 功能部件安装会自动配置以下文件:
- lsb.modules
- lsb.resources
- lsb.queues
- lsf.cluster
- lsf.conf
- lsf.shared
lsb.modules
- HPC 功能部件安装会将外部调度程序插件模块名称添加到 lsb.modules 文件的
PluginModule部分:
Begin PluginModule
SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES
schmod_default () ()
schmod_fcfs () ()
schmod_fairshare () ()
schmod_limit () ()
schmod_parallel () ()
schmod_reserve () ()
schmod_mc () ()
schmod_preemption () ()
schmod_advrsv () ()
schmod_ps () ()
schmod_affinity () ()
#schmod_dc () ()
#schmod_demand () ()
schmod_aps () ()
schmod_cpuset () ()
End PluginModule注:
必须在 PluginModule 列表中的标准 LSF 插件名称之后配置 HPC 插件名称。
lsb.resources
对于 IBM POE 作业, HPC 功能部件安装会配置 lsb.resources 文件中的 ReservationUsage 部分,以按插槽预留 HPS 资源。
ReservationUsage 部分中定义的资源使用情况将覆盖 lsb.params 文件中定义的集群范围 RESOURCE_RESERVE_PER_TASK 参数 (如果该参数也存在)。
Begin ReservationUsage
RESOURCE METHOD
adapter_windows PER_TASK
nrt_windows PER_TASK
End ReservationUsage
lsb.queues
HPC 功能部件安装为 IBM POE 作业配置 hpc_ibm 队列,并为调试 IBM POE 作业配置 hpc_ibm_tv 队列:
Begin Queue
QUEUE_NAME = hpc_linux
PRIORITY = 30
NICE = 20
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job processor limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v Hey
DESCRIPTION = IBM Spectrum LSF 10.1 for linux.
End Queue
Begin Queue
QUEUE_NAME = hpc_linux_tv
PRIORITY = 30
NICE = 20
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job processor limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v Hey
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION = IBM Spectrum LSF 10.1 for linux debug queue.
End Queue
Begin Queue
QUEUE_NAME = hpc_ibm
PRIORITY = 30
NICE = 20
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job processor limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
EXCLUSIVE = Y
REQUEUE_EXIT_VALUES = 133 134 135
DESCRIPTION = IBM Spectrum LSF 10.1 for IBM. This queue is to run POE jobs ONLY.
End Queue
Begin Queue
QUEUE_NAME = hpc_ibm_tv
PRIORITY = 30
NICE = 20
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job processor limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
REQUEUE_EXIT_VALUES = 133 134 135
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION = IBM Spectrum LSF 10.1 for IBM debug queue. This queue is to run POE jobs ONLY.
End Queuelsf.cluster.cluster_name
对于 IBM POE 作业, HPC 功能部件安装会配置 lsf.cluster.cluster_name 文件的 ResourceMap 部分,以将 POE 作业的以下共享资源映射到集群中的所有主机:
Begin ResourceMap
RESOURCENAME LOCATION
poe [default]
adapter_windows [default]
nrt_windows [default]
dedicated_tasks (0@[default])
ip_tasks (0@[default])
us_tasks (0@[default])
End ResourceMap lsf.conf
HPC 功能部件安装在 lsf.conf 文件中定义以下参数:
LSB_SUB_COMMANDNAME=Y- 启用 esub 脚本所需的 LSF_SUB_COMMANDLINE 环境变量。
LSF_ENABLE_EXTSCHEDULER=Y- LSF 将外部调度程序用于拓扑感知的外部调度。
LSB_CPUSET_BESTCPUS=Y- LSF 使用最佳拟合算法根据处理器拓扑中的最短 CPU 半径来调度作业。 在 HP-UX 主机上,设置 HP 供应商 MPI 库的完整路径 (libmpirm.sl): LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"
LSB_RLA_PORT=port_number- 定义用于 LSF HPC 拓扑适配器 (RLA) 与 sbatchd 守护程序之间通信的 TCP 端口。 缺省端口号为 6883。
LSB_SHORT_HOSTLIST=1- 显示并行作业的 bjobs 和 bhist 命令中的主机的缩写列表,其中一个作业的多个进程正在主机上运行。 将以
processes*hostA格式显示多个进程。
lsf.shared
HPC 功能部件安装定义 lsf.shared 文件中 HPC 功能部件所需的以下共享资源:
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION # Keywords
slurm Boolean () () (SLURM)
cpuset Boolean () () (CPUSET)
mpich_gm Boolean () () (MPICH GM MPI)
lammpi Boolean () () (LAM MPI)
mpichp4 Boolean () () (MPICH P4 MPI)
mvapich Boolean () () (Infiniband MPI)
sca_mpimon Boolean () () (SCALI MPI)
ibmmpi Boolean () () (IBM POE MPI)
hpmpi Boolean () () (HP MPI)
intelmpi Boolean () () (Intel MPI)
crayxt3 Boolean () () (Cray XT3 MPI)
crayx1 Boolean () () (Cray X1 MPI)
fluent Boolean () () (fluent availability)
ls_dyna Boolean () () (ls_dyna availability)
nastran Boolean () () (nastran availability)
pvm Boolean () () (pvm availability)
openmp Boolean () () (openmp availability)
ansys Boolean () () (ansys availability)
blast Boolean () () (blast availability)
gaussian Boolean () () (gaussian availability)
lion Boolean () () (lion availability)
scitegic Boolean () () (scitegic availability)
schroedinger Boolean () () (schroedinger availability)
hmmer Boolean () () (hmmer availability)
adapter_windows Numeric 30 N (free adapter windows on css0 on IBM SP)
nrt_windows Numeric 30 N (The number of free nrt windows on IBM systems)
poe Numeric 30 N (poe availability)
css0 Numeric 30 N (free adapter windows on css0 on IBM SP)
csss Numeric 30 N (free adapter windows on csss on IBM SP)
dedicated_tasks Numeric () Y (running dedicated tasks)
ip_tasks Numeric () Y (running IP tasks)
us_tasks Numeric () Y (running US tasks)
End Resource