Topic
  • 2 replies
  • Latest Post - ‏2011-06-06T19:41:05Z by uk086242
uk086242
uk086242
2 Posts

Pinned topic Configuring LoadLeveler classes to run on ONE SLES (on POWER7) only.

‏2011-05-25T13:19:01Z |
Hello.

I have recently configured TWS LoadLeveler v3.5 for use with a BlueGene /P system (BG/P). The system comprises a Front End Node (FEN) running SuSE Enterprise Linux which is a POWER7 p755 server (running as a full system partition), and a Service Node (SN). The FEN was originally a submission node only, with the SN handling all the jobs for the BlueGene /P racks, and this is all working fine.

The customer has now requested that the FEN be used as a general purpose parallel processing node for NON-BG/P users. Now, I am not entirely sure how to proceed. My previous experience with LoadLeveler has been with systems where the customer had multiple nodes (servers) which mapped to the job classes defined - e.g. 1 node for 12 hours, 2 nodes for 12 hours etcetera. The question I have is how do I define job classes that map to numbers of CPU's (and MEMORY) instead of whole nodes? Am I looking at a 'Consumable Resources' RTFM?

Many thanks in advance.
Updated on 2011-06-06T19:41:05Z at 2011-06-06T19:41:05Z by uk086242
  • SystemAdmin
    SystemAdmin
    46 Posts

    Re: Configuring LoadLeveler classes to run on ONE SLES (on POWER7) only.

    ‏2011-06-02T20:28:46Z  
    Hi uk086242,

    I'll post an example of what you can to do configure LoadLeveler to have CPU and Memory as a resource for the FEN.

    Keep in mind that for Linux, there's no enforcement that is done. For AIX, enforcement is done using WLM (AIX Workload Manager)
    Basically if you have LoadLeveler configured so each job uses only 2 CPUs, and the job actually starts using 4 CPUs, LoadLeveler is not aware of that on Linux.

    Here's a sample way to configure LoadLeveler allocating 8 CPUs and 128GB of memory on the FEN for the job class "fenjob" to use. The hostname for my system is bgpsys1.ibm.com
    Each job submitted to the fenjob class will use up 1 CPU resource and 16GB of memory. You can add additional limits to the job class to enforce memory limits...

    In LoadL_admin file:
    
    bgpsys1.ibm.com: type = machine central_manager = 
    
    true schedd_host = 
    
    true resources = ConsumableCpus(8) ConsumableMemory(128000)   fenjob: type = 
    
    class class_comment = 
    "submit jobs to run on front end node" default_resources = ConsumableCpus(1) ConsumableMemory(16000)
    

    In LoadL_config file, add the keyword SCHEDULE_BY_RESOURCE
    
    SCHEDULE_BY_RESOURCES = ConsumableCpus ConsumableMemory
    

    In the LoadL_config.local file to bgpsys1.ibm.com, add the fenjob class to the class list:
    
    CLASS = .... fenjob(8)
    

    llstatus -R will show the consumable resources we defined

    
    > llstatus -R Machine                        Consumable Resource(Available, Total) ------------------------------ ------------------------------------------------- bgpsys1.ibm.com                ConsumableCpus(8,8) ConsumableMemory(125.000 gb,125.000 gb)
    


    Submit a job to LoadLeveler in the fenjob class.
    
    > cat fenjob.cmd #!/bin/sh # # # @ 
    
    class = fenjob # @ error   = fenjob.$(Host).$(Cluster).$(Process).err # @ output  = fenjob.$(Host).$(Cluster).$(Process).out # @ queue   sleep 60
    

    
    bgpsys1:loadl/bgl/testdir > llq llq: There is currently no job status to report. bgpsys1:loadl/bgl/testdir > llsubmit fenjob.cmd llsubmit: The job 
    "bgpsys1.ibm.com.14" has been submitted. bgpsys1:loadl/bgl/testdir > llsubmit fenjob.cmd llsubmit: The job 
    "bgpsys1.ibm.com.15" has been submitted. bgpsys1:loadl/bgl/testdir > llq Id                       Owner      Submitted   ST PRI Class        Running On ------------------------ ---------- ----------- -- --- ------------ ----------- bgpsys1.14.0          vhu         6/1  10:45 R  50  fenjob       bgpsys1 bgpsys1.15.0          vhu         6/1  10:45 R  50  fenjob       bgpsys1   2 job step(s) in queue, 0 waiting, 0 pending, 2 running, 0 held, 0 preempted
    

    Verify resources are used up for each job.
    
    bgpsys1:loadl/bgl/testdir > llstatus -R Machine                        Consumable Resource(Available, Total) ------------------------------ ------------------------------------------------- bgpsys1.ibm.com                ConsumableCpus(6,8) ConsumableMemory(93.750 gb,125.000 gb)
    
  • uk086242
    uk086242
    2 Posts

    Re: Configuring LoadLeveler classes to run on ONE SLES (on POWER7) only.

    ‏2011-06-06T19:41:05Z  
    Hi uk086242,

    I'll post an example of what you can to do configure LoadLeveler to have CPU and Memory as a resource for the FEN.

    Keep in mind that for Linux, there's no enforcement that is done. For AIX, enforcement is done using WLM (AIX Workload Manager)
    Basically if you have LoadLeveler configured so each job uses only 2 CPUs, and the job actually starts using 4 CPUs, LoadLeveler is not aware of that on Linux.

    Here's a sample way to configure LoadLeveler allocating 8 CPUs and 128GB of memory on the FEN for the job class "fenjob" to use. The hostname for my system is bgpsys1.ibm.com
    Each job submitted to the fenjob class will use up 1 CPU resource and 16GB of memory. You can add additional limits to the job class to enforce memory limits...

    In LoadL_admin file:
    <pre class="jive-pre"> bgpsys1.ibm.com: type = machine central_manager = true schedd_host = true resources = ConsumableCpus(8) ConsumableMemory(128000) fenjob: type = class class_comment = "submit jobs to run on front end node" default_resources = ConsumableCpus(1) ConsumableMemory(16000) </pre>
    In LoadL_config file, add the keyword SCHEDULE_BY_RESOURCE
    <pre class="jive-pre"> SCHEDULE_BY_RESOURCES = ConsumableCpus ConsumableMemory </pre>
    In the LoadL_config.local file to bgpsys1.ibm.com, add the fenjob class to the class list:
    <pre class="jive-pre"> CLASS = .... fenjob(8) </pre>
    llstatus -R will show the consumable resources we defined

    <pre class="jive-pre"> > llstatus -R Machine Consumable Resource(Available, Total) ------------------------------ ------------------------------------------------- bgpsys1.ibm.com ConsumableCpus(8,8) ConsumableMemory(125.000 gb,125.000 gb) </pre>

    Submit a job to LoadLeveler in the fenjob class.
    <pre class="jive-pre"> > cat fenjob.cmd #!/bin/sh # # # @ class = fenjob # @ error = fenjob.$(Host).$(Cluster).$(Process).err # @ output = fenjob.$(Host).$(Cluster).$(Process).out # @ queue sleep 60 </pre>
    <pre class="jive-pre"> bgpsys1:loadl/bgl/testdir > llq llq: There is currently no job status to report. bgpsys1:loadl/bgl/testdir > llsubmit fenjob.cmd llsubmit: The job "bgpsys1.ibm.com.14" has been submitted. bgpsys1:loadl/bgl/testdir > llsubmit fenjob.cmd llsubmit: The job "bgpsys1.ibm.com.15" has been submitted. bgpsys1:loadl/bgl/testdir > llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- bgpsys1.14.0 vhu 6/1 10:45 R 50 fenjob bgpsys1 bgpsys1.15.0 vhu 6/1 10:45 R 50 fenjob bgpsys1 2 job step(s) in queue, 0 waiting, 0 pending, 2 running, 0 held, 0 preempted </pre>
    Verify resources are used up for each job.
    <pre class="jive-pre"> bgpsys1:loadl/bgl/testdir > llstatus -R Machine Consumable Resource(Available, Total) ------------------------------ ------------------------------------------------- bgpsys1.ibm.com ConsumableCpus(6,8) ConsumableMemory(93.750 gb,125.000 gb) </pre>
    Thanks for the reply.

    Yes, I was aware that the enforcement of the consumable resource usage was via WLM in AIX, and I recommended that the BG/P FEN be LPAR'd to allow for a LINUX FEN for BG/P job submission and AIX LPAR(s) for general parallel processing via LoadLeveler. Unfortunately this did not happen.

    I guess my solution lies within a job filter that enforces job class usage only, without allowing the user to specify resources outside of a set of defined job classes.

    Thanks again.