Topic
  • 7 replies
  • Latest Post - ‏2013-10-10T16:39:14Z by nagger
Jon_Seymour
Jon_Seymour
2 Posts

Pinned topic Understanding LPAR records

‏2013-05-12T09:15:59Z |

G'day,

I have a question about how AIX &/or nmon report the EC_User% statistic.

The extract below shows two consecutive rows on the LPAR sheet in a circumstance where the frame rapidly lost idle CPUs because of some unknown event.

At 12:15:50, nmon reported EC_USER%+EC_Sys%+EC_Wait%+EC_Idle% as 683.8% which is equivalent to Physical CPU/entitled = 2.05/0.3 = 6.837

My question relates to the values reported at 12:15:40.

At that time EC_User+EC_Sys+EC_Wait+EC_Idle% is reported as 100.1 even though the physical CPU usage is higher (2.236)

What is this 100.1% of, and why does the basis for calculating this percentage change when the shared pool is exhausted? How do I determine, by inspection of the nmon data, when the basis has changed?

 

Logical Partition udecommwcs1 PhysicalCPU virtualCPUs logicalCPUs poolCPUs entitled weight PoolIdle usedAllCPU% usedPoolCPU% SharedCPU Capped EC_User% EC_Sys% EC_Wait% EC_Idle% VP_User% VP_Sys% VP_Wait% VP_Idle% Folded Pool_id Unfolded VPs OtherLPARs    
12:15:40 2.236 3 12 48 0.3 100 3.75 4.66 4.66 1 0 33.33 33.33 0.02 33.33 47.65 14.16 0.01 12.73 0 0 3.0 42.0 100.0  
12:15:50 2.051  3 12 48 0.3 100 0.7 4.27 4.27 1 0 469.57 155.4 0.61 58.17 46.96 15.54 0.06 5.82 0 0 3.0 45.2 683.8 6.837

 

 

  • nagger
    nagger
    1685 Posts

    Re: Understanding LPAR records

    ‏2013-05-15T17:55:07Z  

    Hi,

    The idea of EC_xxxxx is the regular utilisation stats relative to the CPU Entitlement of the VM.

    It never seems to work well and with Shared Processor VMs.

    I recommend you stop looking at this statistic for inspiration or problem diagnosis.

    Most of the time monitor Physical CPU Used and look at Utilisation only to make sure you have high User time compatred to System time.

    I don't see the Entitlement nor the AIX version but I would remind every one to be on the latest service packs and a minimum of AIX 6 TL5 or you may be looking at bugs and not stats !!

     

    Cheers, Nigel Griffiths

  • Jon_Seymour
    Jon_Seymour
    2 Posts

    Re: Understanding LPAR records

    ‏2013-05-15T22:56:20Z  
    • nagger
    • ‏2013-05-15T17:55:07Z

    Hi,

    The idea of EC_xxxxx is the regular utilisation stats relative to the CPU Entitlement of the VM.

    It never seems to work well and with Shared Processor VMs.

    I recommend you stop looking at this statistic for inspiration or problem diagnosis.

    Most of the time monitor Physical CPU Used and look at Utilisation only to make sure you have high User time compatred to System time.

    I don't see the Entitlement nor the AIX version but I would remind every one to be on the latest service packs and a minimum of AIX 6 TL5 or you may be looking at bugs and not stats !!

     

    Cheers, Nigel Griffiths

    Nigel,

    Thanks for the reply.

    The Entitlement was 0.3 representing 10% of 3 Online  Virtual CPUs.

    The problem with the physical CPU measure is that it isn't well correlated with radical (10x) degradation of our 95% response times. What is well correlated with these is the drop in the PoolIdle statistics from to low figures (< 3) (from a typical value of 25). During these events the Physical CPU remains around 2. At these times, we see the reported EC_xxx figures jump in a way that makes them consistent with the Physical CPU/Entitlement however at other times they don't seem correlated at all.  

    To be honest, I was expecting the physical CPU figure to drop under these low PoolIdle conditions, but this did not occur, even though in practice it does appear that the LPAR was getting only 10% of the CPU it was getting previously. I presume this may be an artifact of the sampling interval or other such considerations.

    For the record:

    $ oslevel -g

     

    Fileset                                 Actual Level        Maintenance Level
    -----------------------------------------------------------------------------
    bos.rte                                 6.1.7.15            6.1.0

    I take your point that the EC_xxxx measures are unreliable. At this point I am just looking for a signature  in the nmon data that allows me to exclude known bad periods of execution from analysis. At this point low values of PoolIdle seem to be the best indicator of this.

    jon.

  • nagger
    nagger
    1685 Posts

    Re: Understanding LPAR records

    ‏2013-05-20T09:23:34Z  

    Nigel,

    Thanks for the reply.

    The Entitlement was 0.3 representing 10% of 3 Online  Virtual CPUs.

    The problem with the physical CPU measure is that it isn't well correlated with radical (10x) degradation of our 95% response times. What is well correlated with these is the drop in the PoolIdle statistics from to low figures (< 3) (from a typical value of 25). During these events the Physical CPU remains around 2. At these times, we see the reported EC_xxx figures jump in a way that makes them consistent with the Physical CPU/Entitlement however at other times they don't seem correlated at all.  

    To be honest, I was expecting the physical CPU figure to drop under these low PoolIdle conditions, but this did not occur, even though in practice it does appear that the LPAR was getting only 10% of the CPU it was getting previously. I presume this may be an artifact of the sampling interval or other such considerations.

    For the record:

    $ oslevel -g

     

    Fileset                                 Actual Level        Maintenance Level
    -----------------------------------------------------------------------------
    bos.rte                                 6.1.7.15            6.1.0

    I take your point that the EC_xxxx measures are unreliable. At this point I am just looking for a signature  in the nmon data that allows me to exclude known bad periods of execution from analysis. At this point low values of PoolIdle seem to be the best indicator of this.

    jon.

    This is a classic problem we are finding else where.

    With Entitlement of 0.3 you are saying you expect the LPAR to get all its work done in this much CPU time and you commit only this much CPU time to the LPAR - it is guaranteed but the 3 VP is only if the Pool Idle is large enough.  But then you say it normally takes 2 CPUs.  The LPAR is only getting the other 1.7 CPU time by the luck of idle time and has to actually compete for the CPU cycles with other LPARs.

    In high Pool Idle times you get away with this as the LPAR can dominate spare CPUs. But when the pool gets low your LPAR is thrown off the CPU in 3 milliseconds for all other LPARs to run. Then may get 1 more millisecond on the CPU the thrown off for all other LPARs to run then gets a 1 millisecond etc.  This is not efficient use of the CPU. But you asked for this.  I have customers doing the very high VP to E ration and having applications look like they have hung.  But you are getting exactly what you asked for - you ask for the LPAR to run in 15% of the CPU it needs.  If the LPAR is important - set the Entitlement to what it needs to run.

    Have you watched my YouTube video on this topic?  https://www.youtube.com/watch?v=1W1M114ppHQ

    Check the ZZZZ reconds in the nmon file as you might find nmon was failing to get CPU time regularly as the LPAR is CPU starved.

    Hope this helps, Nigel Griffiths

  • Murali_Y
    Murali_Y
    2 Posts

    Re: Understanding LPAR records

    ‏2013-10-04T18:34:47Z  
    • nagger
    • ‏2013-05-15T17:55:07Z

    Hi,

    The idea of EC_xxxxx is the regular utilisation stats relative to the CPU Entitlement of the VM.

    It never seems to work well and with Shared Processor VMs.

    I recommend you stop looking at this statistic for inspiration or problem diagnosis.

    Most of the time monitor Physical CPU Used and look at Utilisation only to make sure you have high User time compatred to System time.

    I don't see the Entitlement nor the AIX version but I would remind every one to be on the latest service packs and a minimum of AIX 6 TL5 or you may be looking at bugs and not stats !!

     

    Cheers, Nigel Griffiths

    Hi Nigel.

     

    Could you please help me to obtain the reports which shows CPU and memory usage using nmon.I never did it and doesn't know how to do it.Itried to get the information about it how to make it,but didn't get correct information.We have different systems with AIX 5.2,5.3.6.1.7.1 . Iam not sure which verion of NMON needs to run and where the output gets collected and where it stores.Kindly requesting to help on this as how to collect the reports for CPU usage and Memory usage using NMON.

  • nagger
    nagger
    1685 Posts

    Re: Understanding LPAR records

    ‏2013-10-04T22:14:51Z  
    • Murali_Y
    • ‏2013-10-04T18:34:47Z

    Hi Nigel.

     

    Could you please help me to obtain the reports which shows CPU and memory usage using nmon.I never did it and doesn't know how to do it.Itried to get the information about it how to make it,but didn't get correct information.We have different systems with AIX 5.2,5.3.6.1.7.1 . Iam not sure which verion of NMON needs to run and where the output gets collected and where it stores.Kindly requesting to help on this as how to collect the reports for CPU usage and Memory usage using NMON.

    Perhaps Google does not work in your country

    The nmon wiki page is here and h lots of information

    I recommend you watch two videos on YouTube

    To collect the data all day run: nmon -x

    Or for a busy hour: nmon -X

    A file ending in .nmon will be in you current directory.

    Then you FTP binary mode the .nmon file to your windows workstation and then you need to find the nmon Analyser Excel spreadsheet, start it, agree to run the Enable Macro's then kick the Analyse nmon data and select the nmon file.

    Download the nmon Analyser here - https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

    I hope that gets you started, cheers Nigel Griffiths

  • Murali_Y
    Murali_Y
    2 Posts

    Re: Understanding LPAR records

    ‏2013-10-05T21:36:16Z  
    • nagger
    • ‏2013-10-04T22:14:51Z

    Perhaps Google does not work in your country

    The nmon wiki page is here and h lots of information

    I recommend you watch two videos on YouTube

    To collect the data all day run: nmon -x

    Or for a busy hour: nmon -X

    A file ending in .nmon will be in you current directory.

    Then you FTP binary mode the .nmon file to your windows workstation and then you need to find the nmon Analyser Excel spreadsheet, start it, agree to run the Enable Macro's then kick the Analyse nmon data and select the nmon file.

    Download the nmon Analyser here - https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

    I hope that gets you started, cheers Nigel Griffiths

    Nigel,

    Thanks for the reply.

    :Actually i tried to find out the notes from google for NMON,seen many blogs.And moreover as said this is the first i wanted to use nmon for obtaining the memory and cpu statistics,iam not sure which is correct or which is wrong.TO be frank iam totally new on nmon where i want to capture the statistics.

    We have recently issues with slowness on the couple of servers where application team requested us to share the statistics for particular timings what is the amount of CPU and memory is utilizing at that point of time.So we don't have this data as it is previous day data also on which application user requested for particular timings.

    We want these reports on the servers where AIX is running on 6.1 and 7.1 versions.So also confused which version of nmon should run on AIX 6.1 and it's different TL's and also same for AIX 7.1 and it's TL's.

    Thanks in advance for you various suggestions and information on this.

  • nagger
    nagger
    1685 Posts

    Re: Understanding LPAR records

    ‏2013-10-10T16:39:14Z  
    • Murali_Y
    • ‏2013-10-05T21:36:16Z

    Nigel,

    Thanks for the reply.

    :Actually i tried to find out the notes from google for NMON,seen many blogs.And moreover as said this is the first i wanted to use nmon for obtaining the memory and cpu statistics,iam not sure which is correct or which is wrong.TO be frank iam totally new on nmon where i want to capture the statistics.

    We have recently issues with slowness on the couple of servers where application team requested us to share the statistics for particular timings what is the amount of CPU and memory is utilizing at that point of time.So we don't have this data as it is previous day data also on which application user requested for particular timings.

    We want these reports on the servers where AIX is running on 6.1 and 7.1 versions.So also confused which version of nmon should run on AIX 6.1 and it's different TL's and also same for AIX 7.1 and it's TL's.

    Thanks in advance for you various suggestions and information on this.

    The top of the nmon wiki page says when nmon started to be shipped with AIX. Could you actually read that page?

    To find your AIX level type in: oslevel -s

    If you type in: whence nmon

    and it gives you /usr/bin/nmon

    Then you have nmon so start using it.

    Assuming your AIX user name is Murali add the command nmon -x -m /home/Murali you your crontab file and then you will have the stats the next time you get asked.

    Best of Luck, Nigel