Topic
  • 4 replies
  • Latest Post - ‏2008-08-01T18:17:39Z by lcham
michael-t
michael-t
28 Posts

Pinned topic Some basic question for LL scheduling

‏2008-07-29T22:23:30Z |
We have installed LoadL.full 3.4.1.2 on our 40 node 575 cluster. I am trying to verify and clarify the behavior of LL based on some basic job and resource conditions.

*Q1*) With

SCHEDULE_BY_RESOURCES = ConsumableCpus ConsumableMemory RDMA,
ENFORCE_RESOURCE_USAGE = ConsumableCpus ConsumableMemory
ENFORCE_RESOURCE_MEMORY = true,

and

ENFORCE_RESOURCE_POLICY = soft
,

will a job which launches more threads/task than its ConsumableCpus be throttled by WLM to receive a CPU portion corresponding to its requested resources (Total_Processors_in_Node/tasks_node X ConsumableCpus), or there is
no resource consumption enforced by WLM ?
What is the difference between ENFORCE_RESOURCE_POLICY "shares" and "soft"?
*Q2*) The LL Job script manual states that "

Max_processors
" is the maximum number of nodes that can be requested. However, common sense and discussions from IBM support state that this number "specifies the maximum number of processors for a parallel job in a job command file using the min_processors and max_processors keywords".

Is the description in the manual a "typo"?
*Q3*) Is the '

Nproc_limit
' per node per userid process limit OR total per job per userid?
That is, if a user launches a multi-node POE job will the

Nproc_limit
apply to all processes in all nodes for this job or only to all the processes in EACH of the nodes used by this job?
*Q4*)

Given

DAFULT_CLASS  = smp1 smp2  mpi32 mpi64 mpi128 mpi256 mpi592
and LL Job scripts which DO NOT request a specific class, I would like to ensure that when LL encounters all three keywords in an LL Job script (nodes, tasks_node and ConsummableCpus) it will allocate a collection of nodes where ALL three values can be correctly satisfied and assign the job to the proper class.

Example:

#@ node = 2
#@ tasks_per_node = 3
#@ resources = ConsumableCpus(5)


I would like to ensure that

1) LL will select two nodes and set aside 15 = 3 X 5 processors in each one of these nodes for this job. And that

2) LL will assign this job to one of the classes in default_class which allows this combination of resources to be allocated.

Thanks!

Michael/SC/TAMU
Updated on 2008-08-01T18:17:39Z at 2008-08-01T18:17:39Z by lcham
  • lcham
    lcham
    7 Posts

    Re: Some basic question for LL scheduling

    ‏2008-07-31T15:20:59Z  
    Hi Michael,

    A1) When the consumable resources are set in the SCHEDULE_BY_RESOURCES and ENFORCE_RESOURCE_USAGE in the config file and the consumable resources are defined in the admin file, this enables enforcement of these consumable resources by AIX WLM.
    Enforcement is done on task level not thread level. All the threads of a task constrained/throttled by CPU portion WLM enforces to that task.

    Setting ENFORCE_RESOURCE_MEMORY = true, allows AIX WLM to limit the real memory usage of a WLM class as precisely as possible.
    When a class exceeds its limit, all processes in the class are killed.
    The ENFORCE_RESOURCE_MEMORY keyword overrides the policy set through the ENFORCE_RESOURCE_POLICY keyword for ConsumableMemory only.
    The ENFORCE_RESOURCE_POLICY keyword value still applies for ConsumableCpus.

    ENFORCE_RESOURCE_POLICY specifies what type of resource entitlements will be assigned to the AIX Workload Manager classes.
    If the value specified is shares, it means a share value is assigned to the class based on the job step’s requested resources
    (one unit of resource equals one share). This is the default policy. If the value specified is soft, it means a percentage (ConsumabelCpus/Total_Processors_in_Node X 100 ) value is assigned to the class based on the job step’s requested resources and the total machine resources. This percentage can be exceeded if there is no contention for the resource. If the value specified is hard, it means a percentage value is assigned to the class based on the job step’s requested resources and the total machine resources.

    A2) For the job command file, the max_processors specifies the maximum number of nodes requested for a parallel job, regardless of the number of processors contained in the node. This keyword is equivalent to the maximum value you specify on the node keyword.
    In any new job command files you create for parallel jobs, you should use the node keyword to request nodes/processors.
    The max_processors keyword is supported by the LL_DEFAULT scheduler.

    In the admin file, max_processors specifies the maximum number of processors that can be requested for a particular class or by a particular user or group for a parallel job.

    A3) nproc_limit specifies the hard limit, soft limit, or both for the number of processes that can be created for the real user ID of the submitted job. This limit is a per process limit.

    A4)
    1) True

    2) The default class is used to see which class in the list has the corresponding limit it needs and if that user can run in this class. See sections "Defining Classes"->"Using limit keywords" and "Allowing users to use a class" .
    Consumable resources and initiators are not part of the calculation when picking a class from the default_class list.

    • Please see the document TWS LoadLeveler Using and Administering Guide for more detail information to all the questions above.
    http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl.doc/llbooks.html

    Regards,
    Linda Cham
  • michael-t
    michael-t
    28 Posts

    Re: Some basic question for LL scheduling

    ‏2008-07-31T20:47:21Z  
    • lcham
    • ‏2008-07-31T15:20:59Z
    Hi Michael,

    A1) When the consumable resources are set in the SCHEDULE_BY_RESOURCES and ENFORCE_RESOURCE_USAGE in the config file and the consumable resources are defined in the admin file, this enables enforcement of these consumable resources by AIX WLM.
    Enforcement is done on task level not thread level. All the threads of a task constrained/throttled by CPU portion WLM enforces to that task.

    Setting ENFORCE_RESOURCE_MEMORY = true, allows AIX WLM to limit the real memory usage of a WLM class as precisely as possible.
    When a class exceeds its limit, all processes in the class are killed.
    The ENFORCE_RESOURCE_MEMORY keyword overrides the policy set through the ENFORCE_RESOURCE_POLICY keyword for ConsumableMemory only.
    The ENFORCE_RESOURCE_POLICY keyword value still applies for ConsumableCpus.

    ENFORCE_RESOURCE_POLICY specifies what type of resource entitlements will be assigned to the AIX Workload Manager classes.
    If the value specified is shares, it means a share value is assigned to the class based on the job step’s requested resources
    (one unit of resource equals one share). This is the default policy. If the value specified is soft, it means a percentage (ConsumabelCpus/Total_Processors_in_Node X 100 ) value is assigned to the class based on the job step’s requested resources and the total machine resources. This percentage can be exceeded if there is no contention for the resource. If the value specified is hard, it means a percentage value is assigned to the class based on the job step’s requested resources and the total machine resources.

    A2) For the job command file, the max_processors specifies the maximum number of nodes requested for a parallel job, regardless of the number of processors contained in the node. This keyword is equivalent to the maximum value you specify on the node keyword.
    In any new job command files you create for parallel jobs, you should use the node keyword to request nodes/processors.
    The max_processors keyword is supported by the LL_DEFAULT scheduler.

    In the admin file, max_processors specifies the maximum number of processors that can be requested for a particular class or by a particular user or group for a parallel job.

    A3) nproc_limit specifies the hard limit, soft limit, or both for the number of processes that can be created for the real user ID of the submitted job. This limit is a per process limit.

    A4)
    1) True

    2) The default class is used to see which class in the list has the corresponding limit it needs and if that user can run in this class. See sections "Defining Classes"->"Using limit keywords" and "Allowing users to use a class" .
    Consumable resources and initiators are not part of the calculation when picking a class from the default_class list.

    • Please see the document TWS LoadLeveler Using and Administering Guide for more detail information to all the questions above.
    http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl.doc/llbooks.html

    Regards,
    Linda Cham
    Dear Linda, thanks for the useful clarifications.

    In A1) when ENFORCE_RESOURCE_POLICY is "shares" how does WLM implement resource enforcement when there is contention or when there no contention?

    When it is set to "soft" and there is no contention there will be no resource consumption enforcement by WLM. How does WLM handle the situation when contention is present? How does it use the ratio ConsumableCpus_job / total_node_processors ?

    What is the difference in implementation between these two enforcement policies? From your description and the manual appear to be very close in essence.
    In A3) suppose I have a job J which has launched N_1, N_2 and N_3 tasks in nodes A, B and C, respectively.
    Does Nproc_limit bound N_1 + N_2 + N_3 OR EACH one of the N_1, N_2 and N_3?
    In A4-1) LL considers ALL three computation units (nodes, tasks/node, threads/task) to find a "valid" set of nodes that have a compatible collection of resources to match job's requests wrt to all three units.

    But in A4-2) it ignores the consumablecpus, right ? So then it seems that it considers ONLY num of nodes and tasks per node? So it does NOT consider ALL three requested units from A4-1) above?

    Is the default_class behavior "broken" ? If it only considers two out of three resource units then it does not work according to the manual.

    Can you help me understand and reconcile the differences in A4-1) and A4-2) ?
    Thanks for all the answer!

    Michael
  • lcham
    lcham
    7 Posts

    Re: Some basic question for LL scheduling

    ‏2008-08-01T18:07:50Z  
    • michael-t
    • ‏2008-07-31T20:47:21Z
    Dear Linda, thanks for the useful clarifications.

    In A1) when ENFORCE_RESOURCE_POLICY is "shares" how does WLM implement resource enforcement when there is contention or when there no contention?

    When it is set to "soft" and there is no contention there will be no resource consumption enforcement by WLM. How does WLM handle the situation when contention is present? How does it use the ratio ConsumableCpus_job / total_node_processors ?

    What is the difference in implementation between these two enforcement policies? From your description and the manual appear to be very close in essence.
    In A3) suppose I have a job J which has launched N_1, N_2 and N_3 tasks in nodes A, B and C, respectively.
    Does Nproc_limit bound N_1 + N_2 + N_3 OR EACH one of the N_1, N_2 and N_3?
    In A4-1) LL considers ALL three computation units (nodes, tasks/node, threads/task) to find a "valid" set of nodes that have a compatible collection of resources to match job's requests wrt to all three units.

    But in A4-2) it ignores the consumablecpus, right ? So then it seems that it considers ONLY num of nodes and tasks per node? So it does NOT consider ALL three requested units from A4-1) above?

    Is the default_class behavior "broken" ? If it only considers two out of three resource units then it does not work according to the manual.

    Can you help me understand and reconcile the differences in A4-1) and A4-2) ?
    Thanks for all the answer!

    Michael
    Hi Mike,
    A1) LoadLeveler dynamically generates WLM classes with specific resource entitlements.
    A single WLM class is created for each job step and the process id of that job step is assigned to that class. This is done for each node that a job step is assigned to execute on. LoadLeveler then defines resource shares or limits for that class depending on the LoadLeveler enforcement policy defined. These resource shares or limits represent the job’s requested resource usage in relation to the amount of resources available on the machine.
    When the enforcement policy is shares, LoadLeveler assigns a share value to the class based on the resources requested for the job step (one unit of resource equals one share). When the job step process is executing, AIX WLM dynamically calculates a desired resource entitlement based on the WLM class share value of the job step and the total number of shares requested by all active WLM classes. It is important to note that AIX WLM will only enforce these target percentages when the resource is under contention.
    When the enforcement policy is limits (soft or hard), LoadLeveler assigns a percentage value to the class based on the resources requested for the job step and the total machine resources. This resource percentage is enforced regardless of any other active WLM classes. A soft limit indicates the maximum amount of the resource that can be made available when there is contention for the resources. This maximum can be exceeded if no one else requires the resource. A hard limit indicates the maximum amount of the resource that can be made available even if there is no contention for the resources.

    Your questions are more what AIX WLM does with each setting. For more information on AIX WLM Resource entitlements please see:
    http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/mem_usage_vmstat.htm

    A3) This limit is only enforced under LINUX. This keyword is same as RLIMIT_NPROC in system.
    The maximum number of threads that can be created for the real user ID of the calling process. This limit is on each node for one process.

    A4-1) This is a job command file. Depending on the keyword, some syntax checking can be done during submission while calculations are done during scheduling. The fields you stated are considered during scheduling.
    A4-2) The default_class is in the admin file. During job submission, the class stanza have info about the limits and users and groups to determine which class to pick for the job when it is submitted. ConsumableCpus is just a resource consumabled by each task running on a node. So it will be considered during scheduling to find if the machine has enough resources for the job.

    Linda
  • lcham
    lcham
    7 Posts

    Re: Some basic question for LL scheduling

    ‏2008-08-01T18:17:39Z  
    • lcham
    • ‏2008-08-01T18:07:50Z
    Hi Mike,
    A1) LoadLeveler dynamically generates WLM classes with specific resource entitlements.
    A single WLM class is created for each job step and the process id of that job step is assigned to that class. This is done for each node that a job step is assigned to execute on. LoadLeveler then defines resource shares or limits for that class depending on the LoadLeveler enforcement policy defined. These resource shares or limits represent the job’s requested resource usage in relation to the amount of resources available on the machine.
    When the enforcement policy is shares, LoadLeveler assigns a share value to the class based on the resources requested for the job step (one unit of resource equals one share). When the job step process is executing, AIX WLM dynamically calculates a desired resource entitlement based on the WLM class share value of the job step and the total number of shares requested by all active WLM classes. It is important to note that AIX WLM will only enforce these target percentages when the resource is under contention.
    When the enforcement policy is limits (soft or hard), LoadLeveler assigns a percentage value to the class based on the resources requested for the job step and the total machine resources. This resource percentage is enforced regardless of any other active WLM classes. A soft limit indicates the maximum amount of the resource that can be made available when there is contention for the resources. This maximum can be exceeded if no one else requires the resource. A hard limit indicates the maximum amount of the resource that can be made available even if there is no contention for the resources.

    Your questions are more what AIX WLM does with each setting. For more information on AIX WLM Resource entitlements please see:
    http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/mem_usage_vmstat.htm

    A3) This limit is only enforced under LINUX. This keyword is same as RLIMIT_NPROC in system.
    The maximum number of threads that can be created for the real user ID of the calling process. This limit is on each node for one process.

    A4-1) This is a job command file. Depending on the keyword, some syntax checking can be done during submission while calculations are done during scheduling. The fields you stated are considered during scheduling.
    A4-2) The default_class is in the admin file. During job submission, the class stanza have info about the limits and users and groups to determine which class to pick for the job when it is submitted. ConsumableCpus is just a resource consumabled by each task running on a node. So it will be considered during scheduling to find if the machine has enough resources for the job.

    Linda
    Sorry. The link is incorrect.

    For more information on AIX WLM Resource entitlements go to:
    http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.baseadmn/doc/baseadmndita/
    On the left side there is a search field, do a search for "WLM Resource entitlements"
    and click on "Resource entitlements" on the left panel.