LSF HPC 功能部件

HPC 功能部件作为 PARALLEL 模板的一部分安装在 UNIX 或 Linux 主机上。

注: 现在不推荐使用 HPC 集成,在未来版本的 LSF中可能会将其除去。

安装时,会自动对您进行一些更改。 在 lsf.cluster.cluster_name 文件的 "主机" 部分的 RESOURCES 列下添加相应的资源名称。

HPC 功能部件安装会自动配置以下文件:

  • lsb.modules
  • lsb.resources
  • lsb.queues
  • lsf.cluster
  • lsf.conf
  • lsf.shared

lsb.modules

  • HPC 功能部件安装会将外部调度程序插件模块名称添加到 lsb.modules 文件的 PluginModule 部分:
Begin PluginModule
SCH_PLUGIN          RB_PLUGIN   SCH_DISABLE_PHASES 
schmod_default        ()                 ()
schmod_fcfs           ()                 ()
schmod_fairshare      ()                 ()
schmod_limit          ()                 ()
schmod_parallel       ()                 ()
schmod_reserve        ()                 ()
schmod_mc             ()                 ()
schmod_preemption     ()                 ()
schmod_advrsv         ()                 ()
schmod_ps             ()                 ()
schmod_affinity       ()                 ()
#schmod_dc            ()                 ()
#schmod_demand        ()                 ()
schmod_aps            ()                 ()
schmod_cpuset         ()                 ()
End PluginModule
注:

必须在 PluginModule 列表中的标准 LSF 插件名称之后配置 HPC 插件名称。

lsb.resources

对于 IBM POE 作业, HPC 功能部件安装会配置 lsb.resources 文件中的 ReservationUsage 部分,以按插槽预留 HPS 资源。

ReservationUsage 部分中定义的资源使用情况将覆盖 lsb.params 文件中定义的集群范围 RESOURCE_RESERVE_PER_TASK 参数 (如果该参数也存在)。

Begin ReservationUsage
RESOURCE           METHOD
adapter_windows    PER_TASK
nrt_windows        PER_TASK
End ReservationUsage

lsb.queues

HPC 功能部件安装为 IBM POE 作业配置 hpc_ibm 队列,并为调试 IBM POE 作业配置 hpc_ibm_tv 队列:
Begin Queue
QUEUE_NAME   = hpc_linux
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0   # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000     # jobs data segment limit
#CORELIMIT    = 20000
#TASKLIMIT    = 5         # job processor limit
#USERS        = all       # users who can submit jobs to this queue
#HOSTS        = all       # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
DESCRIPTION  = IBM Spectrum LSF 10.1 for linux.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_linux_tv
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000     # jobs data segment limit
#CORELIMIT    = 20000
#TASKLIMIT    = 5         # job processor limit
#USERS        = all       # users who can submit jobs to this queue
#HOSTS        = all       # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION  = IBM Spectrum LSF 10.1 for linux debug queue.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_ibm
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA  # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000      # jobs data segment limit
#CORELIMIT    = 20000
#TASKLIMIT    = 5          # job processor limit
#USERS        = all        # users who can submit jobs to this queue
#HOSTS        = all        # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
EXCLUSIVE = Y
REQUEUE_EXIT_VALUES = 133 134 135
DESCRIPTION  = IBM Spectrum LSF 10.1 for IBM. This queue is to run POE jobs ONLY.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_ibm_tv
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA  # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000      # jobs data segment limit
#CORELIMIT    = 20000
#TASKLIMIT    = 5          # job processor limit
#USERS        = all        # users who can submit jobs to this queue
#HOSTS        = all        # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
REQUEUE_EXIT_VALUES = 133 134 135
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION  = IBM Spectrum LSF 10.1 for IBM debug queue. This queue is to run POE jobs ONLY.
End Queue

lsf.cluster.cluster_name

对于 IBM POE 作业, HPC 功能部件安装会配置 lsf.cluster.cluster_name 文件的 ResourceMap 部分,以将 POE 作业的以下共享资源映射到集群中的所有主机:

Begin ResourceMap
RESOURCENAME        LOCATION
poe                 [default]
adapter_windows     [default]
nrt_windows         [default]
dedicated_tasks     (0@[default])
ip_tasks            (0@[default])
us_tasks            (0@[default])
End ResourceMap

lsf.conf

HPC 功能部件安装在 lsf.conf 文件中定义以下参数:

LSB_SUB_COMMANDNAME=Y
启用 esub 脚本所需的 LSF_SUB_COMMANDLINE 环境变量。
LSF_ENABLE_EXTSCHEDULER=Y
LSF 将外部调度程序用于拓扑感知的外部调度。
LSB_CPUSET_BESTCPUS=Y
LSF 使用最佳拟合算法根据处理器拓扑中的最短 CPU 半径来调度作业。 在 HP-UX 主机上,设置 HP 供应商 MPI 库的完整路径 (libmpirm.sl): LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"
LSB_RLA_PORT=port_number
定义用于 LSF HPC 拓扑适配器 (RLA) 与 sbatchd 守护程序之间通信的 TCP 端口。 缺省端口号为 6883。
LSB_SHORT_HOSTLIST=1
显示并行作业的 bjobsbhist 命令中的主机的缩写列表,其中一个作业的多个进程正在主机上运行。 将以 processes*hostA格式显示多个进程。

lsf.shared

HPC 功能部件安装定义 lsf.shared 文件中 HPC 功能部件所需的以下共享资源:

Begin Resource
RESOURCENAME    TYPE    INTERVAL INCREASING  DESCRIPTION       # Keywords
slurm           Boolean    ()    ()          (SLURM)
cpuset          Boolean    ()    ()          (CPUSET)
mpich_gm        Boolean    ()    ()          (MPICH GM MPI)
lammpi          Boolean    ()    ()          (LAM MPI)
mpichp4         Boolean    ()    ()          (MPICH P4 MPI)
mvapich         Boolean    ()    ()          (Infiniband MPI)
sca_mpimon      Boolean    ()    ()          (SCALI MPI)
ibmmpi          Boolean    ()    ()          (IBM POE MPI)
hpmpi           Boolean    ()    ()          (HP MPI)
intelmpi        Boolean    ()    ()          (Intel MPI)
crayxt3         Boolean    ()    ()          (Cray XT3 MPI)
crayx1          Boolean    ()    ()          (Cray X1 MPI)
fluent          Boolean    ()    ()          (fluent availability)
ls_dyna         Boolean    ()    ()          (ls_dyna availability)
nastran         Boolean    ()    ()          (nastran availability)
pvm             Boolean    ()    ()          (pvm availability)
openmp          Boolean    ()    ()          (openmp availability)
ansys           Boolean    ()    ()          (ansys availability)
blast           Boolean    ()    ()          (blast availability)
gaussian        Boolean    ()    ()          (gaussian availability)
lion            Boolean    ()    ()          (lion availability)
scitegic        Boolean    ()    ()          (scitegic availability)
schroedinger    Boolean    ()    ()          (schroedinger availability)
hmmer           Boolean    ()    ()          (hmmer availability)
adapter_windows Numeric    30    N    (free adapter windows on css0 on IBM SP)
nrt_windows     Numeric    30    N    (The number of free nrt windows on IBM systems)
poe             Numeric    30    N    (poe availability)
css0            Numeric    30    N    (free adapter windows on css0 on IBM SP)
csss            Numeric    30    N    (free adapter windows on csss on IBM SP)
dedicated_tasks Numeric    ()    Y    (running dedicated tasks)
ip_tasks        Numeric    ()    Y    (running IP tasks)
us_tasks        Numeric    ()    Y    (running US tasks)
End Resource