Topic
14 replies Latest Post - ‏2008-09-17T23:11:18Z by ezhong
michael-t
michael-t
28 Posts
ACCEPTED ANSWER

Pinned topic Basic LoadLeveler (3.4+) Questions

‏2008-07-09T22:20:53Z |
Assume that the BACKFILL scheduler is used.

1) If SYSPRIO variable is not defined in the LoadL_config, does the negotiator use any default setting ? If yes, what is this setting ? (eg., --QDate ?)

2) Is the pending job queue always reordered at least when a new job enters or exits the system? If NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = 0, does queue ordering take place upon new job arrival?

3) If we set SYSPRIO to an expression which does NOT include 'QDate', do we have any way to include submission order in the SYSPRIO calculation ? If 'QDate' is the only factor that makes LL consider submission order then it is possible that very old jobs may never get selected for dispatching, right ?

4) Is it possible to let the levels of resource requirements of a job to be considered in the computation of its SYSPRIO value?
For instance, assume that I would like to favor large CPU count jobs over smaller ones, one way would be to be able to use say the number of nodes or the number nodes X tasks_per_node as input for the calculation of SYSPRIO value. I assume that this is not currently available, but is it being considered for a future LL release ?

5) Is there any default "aging" mechanism to allow "unfavorable" jobs to get eventually dispatched?

The problem is that QDate is a number which cannot be SCALED to the levels compatible to the levels of other SYSPRIO input variables (eg usedshares). So as QDate monotonically increases (+1 / second) it eventually overshadows all other inputs....

However, if we not use QDate is input to SYSPRIO in the absence of other default aging mechanisms, some jobs may to get scheduled after long delays or may even get indefinitely postponed. Right?

thanks
Michael
Updated on 2008-09-17T23:11:18Z at 2008-09-17T23:11:18Z by ezhong
  • SystemAdmin
    SystemAdmin
    46 Posts
    ACCEPTED ANSWER

    Re: Basic LoadLeveler (3.4+) Questions

    ‏2008-07-15T22:10:58Z  in response to michael-t
    I forwarded your questions to LL team.

    HPC central Admin.
  • ezhong
    ezhong
    11 Posts
    ACCEPTED ANSWER

    Re: Basic LoadLeveler (3.4+) Questions

    ‏2008-07-16T16:01:16Z  in response to michael-t
    Hi Michael,

    I'll attempt to answer your questions based mainly on my impression only but cannot guarantee that everything I say is 100% accurate or will never change in the future.

    Assume that the BACKFILL scheduler is used.

    1) If SYSPRIO variable is not defined in the LoadL_config, does the negotiator use any default setting ? If yes, what is this setting ? (eg., --QDate ?)

    No default setting or the default is 0.

    2) Is the pending job queue always reordered at least when a new job enters or exits the system? If NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = 0, does queue ordering take place upon new job arrival?

    When a new job enters the system, it will be put at the right place in the queue according to its q_sysprio value and some other information. I don't believe the queue is completely reordered.

    3) If we set SYSPRIO to an expression which does NOT include 'QDate', do we have any way to include submission order in the SYSPRIO calculation ? If 'QDate' is the only factor that makes LL consider submission order then it is possible that very old jobs may never get selected for dispatching, right ?

    I believe the submission order is used to place jobs in the queue if no other information like q_sysprio overides it. That's what I saw when I looked into the matter several years ago. You can do a test to see whether that's still the case. It's not documented and thus not guranteed to be the behavior in the future.

    4) Is it possible to let the levels of resource requirements of a job to be considered in the computation of its SYSPRIO value?
    For instance, assume that I would like to favor large CPU count jobs over smaller ones, one way would be to be able to use say the number of nodes or the number nodes X tasks_per_node as input for the calculation of SYSPRIO value. I assume that this is not currently available, but is it being considered for a future LL release ?

    You are right that it's not directly available.

    Yet, you can put large CPU count jobs in a job class that allows higher CPU limit and set priority for the job class to be higher than other job classes. Then you can use ClassSysprio in the SYSPRIO expression.

    You can also send your new requirements to IBM and one channel I know of is SP-XXL.

    5) Is there any default "aging" mechanism to allow "unfavorable" jobs to get eventually dispatched?

    The problem is that QDate is a number which cannot be SCALED to the levels compatible to the levels of other SYSPRIO input variables (eg usedshares). So as QDate monotonically increases (+1 / second) it eventually overshadows all other inputs....

    However, if we not use QDate is input to SYSPRIO in the absence of other default aging mechanisms, some jobs may to get scheduled after long delays or may even get indefinitely postponed. Right?

    QDate is the number of seconds since the LoadL_negotiator daemon is last started. As one day consists of 24 * 3600 = 86400 seconds, the value of QDATE increases rapidly.

    You could use
    SYSPRIO: (0 - QDATE / 60)
    to let the value increase by 1 every minute, or
    SYSPRIO: (0 - QDATE / 600)
    to let the value increase by 1 every 10 minutes.

    Hope this helps.

    Regards,
    Enci Zhong
    LoadLeveler Development
    • michael-t
      michael-t
      28 Posts
      ACCEPTED ANSWER

      Re: Basic LoadLeveler (3.4+) Questions

      ‏2008-07-16T16:25:49Z  in response to ezhong
      Enci,

      many thanks for the reply ....

      One remaining question is the following: given max total shares and shares/user, HOW are used shares charged ? That is, what formula/method does LL use to translate total CPU computation time into share units / user (or / group) ?
      As for our SYSPRIO:

      I have decided to use the following formula for SYSPRIO in our system:

      SYSPRIO: 86400 * ( S_r * 7/10 + P_c * 1/50 ) - QDate

      where, S_r = remaining shares / user (initial shares is 10 / user), and
      P_c = system class priority

      Our decay period is 7 days. The attempt above is to let remaining shares / user and class priority be 'scaled' to 1 week since we decay used shares every 1 week.

      Thanks a lot again...

      Michael
      SC/TAMU
      • ezhong
        ezhong
        11 Posts
        ACCEPTED ANSWER

        Re: Basic LoadLeveler (3.4+) Questions

        ‏2008-07-16T18:25:11Z  in response to michael-t
        Michael,

        Thanks for sharing your SYSPRIO formula.

        One remaining question is the following: given max total shares and shares/user, HOW are used shares charged ? That is, what formula/method does LL use to translate total CPU computation time into share units / user (or / group) ?

        Please take a look at this
        http://www.freepatentsonline.com/y2007/0256077.html
        and let me know if you still have unanswered questions.

        Regards,
        Enci
        • michael-t
          michael-t
          28 Posts
          ACCEPTED ANSWER

          Re: Basic LoadLeveler (3.4+) Questions

          ‏2008-07-29T21:58:24Z  in response to ezhong
          Thanks for the pointer. Patents are the least entertaining thing to read but at least go to some tech. depth. If you have you published anything based on this, can you send me some references ?

          Mike
  • ezhong
    ezhong
    11 Posts
    ACCEPTED ANSWER

    Re: Basic LoadLeveler (3.4+) Questions

    ‏2008-07-30T18:47:56Z  in response to michael-t
    Calculation for CPU resources per share:

    CPU resource per share =
    effective total CPU resources / total number of shares

    effective total CPU resources = number of processors / decay constant

    decay constant = 3 / fair share interval (in seconds)

    CPU resource usage at end of a step is collected. It is added to the
    total effective CPU usage of the step owner and step group, respectively, considering decay.

    When it is time to calculate the used shares of a user or group, like when the llfs command is issued, the total effective CPU usage decayed to the current time divided by CPU resource per share gives the used share value, only the whole number is taken.

    Hope this helps somewhat. Happy
    • michael-t
      michael-t
      28 Posts
      ACCEPTED ANSWER

      Re: Basic LoadLeveler (3.4+) Questions

      ‏2008-07-30T20:17:55Z  in response to ezhong
      Yes, it helps ...

      thanks
    • michael-t
      michael-t
      28 Posts
      ACCEPTED ANSWER

      Re: Basic LoadLeveler (3.4+) Questions

      ‏2008-07-30T20:19:56Z  in response to ezhong
      BTW do you see any problems with the proposed SYSPRIO formula I posted a few messages up? Majority of our jobs finish within 1 week.

      thanks
      Michael
  • ezhong
    ezhong
    11 Posts
    ACCEPTED ANSWER

    Re: Basic LoadLeveler (3.4+) Questions

    ‏2008-07-30T21:07:24Z  in response to michael-t
    Michael,

    Your SYSPRIO expression looked good.

    Regards,
    Enci
    • michael-t
      michael-t
      28 Posts
      ACCEPTED ANSWER

      Re: Basic LoadLeveler (3.4+) Questions

      ‏2008-09-15T21:17:48Z  in response to ezhong
      I was looking at the 'llfs -l' and the 'llq -l' outputs but even though the 'Used Shares' column is update, the 'System Priority', 'q_sysprio' and 'Previous q_sysprio' are not updated accordingly.

      We have set

      NEGOTIATOR_INTERVAL = 60
      NEGOTIATOR_PARALLEL_DEFER = 300
      NEGOTIATOR_PARALLEL_HOLD = 300
      NEGOTIATOR_REDRIVE_PENDING = 90
      NEGOTIATOR_RESCAN_QUEUE = 90
      NEGOTIATOR_REMOVE_COMPLETED = 0
      NEGOTIATOR_CYCLE_DELAY = 0
      NEGOTIATOR_CYCLE_TIME_LIMIT = 0
      NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = 120
      and
      SYSPRIO: 8640 * ( (7/10) * $(UserRemainingShares) + (ClassSysprio / 5 ) ) - QDate

      we are running LoadL.full 3.4.1.2

      Any hint will be greatly appreaciated ....

      Michael
      • ezhong
        ezhong
        11 Posts
        ACCEPTED ANSWER

        Re: Basic LoadLeveler (3.4+) Questions

        ‏2008-09-16T21:07:37Z  in response to michael-t
        The setup looks fine.

        What are the values for UserRemainingShares, ClassSysprio, QDate and current and previous q_sysprio values?

        Could it be that the change in UserRemainingShares is too small to affect the result of the following expression?

        SYSPRIO: 8640 * ( (7/10) * $(UserRemainingShares) + (ClassSysprio / 5 ) ) - QDate

        To make q_sysprio more sensitive to UserRemainingShares, you could try increasing the value for FAIR_SHARE_TOTAL_SHARES and remove (7/10) in the above expression.

        Hope this helps.

        Enci
        • michael-t
          michael-t
          28 Posts
          ACCEPTED ANSWER

          Re: Basic LoadLeveler (3.4+) Questions

          ‏2008-09-17T20:41:53Z  in response to ezhong
          Hi Enci,

          do you know how the arithmetic in calculating SYSPRIO is done? Is it single or double precission float or 4 or 8 bit integer?

          We have 180 users and I just gave 100 shares to each one.

          The intent of my formula is that I weight the UserRemainingShares and ClassSysprio to one weeks worth of seconds after normalizing their values between 0,1. We let the decay period be one week. We roughly expect no job remaining in the system (queuing or processing) for more than a week.

          I can rewrite the formula to avoid normalizing UserRemainingShares and ClassSysprio to avoid roundoff errors (or worse) integer arithmetic.

          Now specifically:

          Class priorities in 0,100 with 50 midpoint
          specific priorities are 40,50,55,55,60,60,65,70,75,85,85,85,90
          assigned roughly in increasing order of max tasks for this class.

          Current remaining shares of users (ordered ):

          -8645,-1783,-1766,-893,-494,-420,-274,-272,-240,-232,-160,-150,-127,0,3,12,31,33,48,59,66,71,71,75,76,86,86,92,94,95,96,96,98,98,98,100,100,100,100,100,100,100,100,100,100,100,100,100

          My problem is that I don't see q_sysprio and 'System priority' in llq -l changing even though UserRemainingShares changes as I can see by llfs
          • michael-t
            michael-t
            28 Posts
            ACCEPTED ANSWER

            Re: Basic LoadLeveler (3.4+) Questions

            ‏2008-09-17T22:38:54Z  in response to michael-t
            Forgot to show the 'System Priority' from llq -l:

            -98676,-96620,-81767,-74539,-72103,-69049,-68581,-68005,-62567,-62357,-61421,-61414,-58884,-57591,-57307,-57300,-57295,-57274,-56962,-54854,-52083,-40426,-36020,-15630,-7065,-96,8440,10286,11481,12154,12722,16252,26798,30377,34702,38438,44563,62073,76841,87813,90653,90672,92818,106014,106027,106037,106047,106066,106077,106092,106117,119610,158160,164623,167049,185547

            just to get an idea on the values

            thanks
            Michael
  • ezhong
    ezhong
    11 Posts
    ACCEPTED ANSWER

    Re: Basic LoadLeveler (3.4+) Questions

    ‏2008-09-17T23:11:18Z  in response to michael-t
    Please try the following SYSPRIO expressions:
    SYSPRIO: $(UserRemainingShares)
    SYSPRIO: UserUsedShares
    SYSPRIO: UserTotalShares
    and see whether q_sysprio values match the llfs output. We did those basic tests while developing the function. :)

    We can decide what to do next after seeing the test results. If you can give me access to your system, I can take a look on your system as well.

    Thanks,
    Enci