Topic
6 replies Latest Post - ‏2013-12-06T04:57:51Z by fbenfredj
fbenfredj
fbenfredj
39 Posts
ACCEPTED ANSWER

Pinned topic NMON CPU not correct on P795

‏2013-05-17T10:11:51Z |

Hi all,

I have problem with nmon CPUs stats on a P795.

The PhysicalCPU in the LPAR tab show wrong values compared to sar and lpar2rrd.

For example, on one lpar with VP=4, sar and lpar2rrd show physc ~ 3.8, however nmon and lparstat show physc ~ 2.9.

Theses metrics are correct : %usr, %sys, %wio, %idle.

This issue is occuring on all AIX version (5.3, 6.1 and 7.1).

However on P770 servers there no problem.

Any idea?

Thank you for your help

 

 

  • puvichakravarthy
    puvichakravarthy
    55 Posts
    ACCEPTED ANSWER

    Re: NMON CPU not correct on P795

    ‏2013-05-17T18:23:09Z  in response to fbenfredj

    Could you kindly validate with lparstat output?

  • nagger
    nagger
    1595 Posts
    ACCEPTED ANSWER

    Re: NMON CPU not correct on P795

    ‏2013-05-20T08:46:22Z  in response to fbenfredj

    My first reaction is that is very unlikely and to check your service pack levels.

    Use oslevel -s   NOTE THE -s

    This will give you the date of the service pack like:

    $ oslevel -s
    6100-07-03-1207
    

    The 12 means 2012 and the 07 means week 7 like early February.  If your service pack level is more than 18 months out of date - you need to update as you might be looking at a bug fixed a long long time ago.

    Second, reaction is: are you running the supplied nmon with AIX version and not nmon classic. I am totally gob-smacked with people running nmon versions from 5 years ago on new hardware. The last nmon classic version was from POWER5 days. Expecting that to work on POWER6 and POWER7 is insane.

    Third point is - raise a PMR if you have to two above right. Why are you asking a forum?

    Fourth, how did you conclude nmon was wrong!  Sar is an ancient UNIX tool and rarely used- as far as I know it is not SMT aware, and LPAR2RRD only gets the data from the HMC. You also don't include if this is a shared CPU or uncapped. Which would have been useful.

     

    cheers, Nigel Griffiths

    • fbenfredj
      fbenfredj
      39 Posts
      ACCEPTED ANSWER

      Re: NMON CPU not correct on P795

      ‏2013-05-21T13:44:49Z  in response to nagger

      Hi Nigel,

      I am sorry for the delay. In France we had a weekend of 3 days. Monday was off-day.

      Here are answers to your reactions

      1/ I had the issue on many lpars of my P795 with different AIX versions and different TL and SP.

      For example, I have an lpar with the last TL and SP of AIX 7.1l : 7100-02-02-1316.

      2/ Of course nmon supplied with AIX version

      3/ The AIX support contract of my customer is being renewed with IBM. I will raise a PMR as soon as it will be renewed.

      So, I wanted to save time to know if there is a known bug.

      4/ I noticed that nmon values are diffrenent from the stats all other tools (vmstat, sar, lpar2rrd, ...) and our official perf tool : Omnivision.

      That's why I concluded that nmon is wrong.

       

      Here are some examples of stats :

      SAR :

      05/21/2013 15:16:57    %usr    %sys    %wio   %idle   physc   %entc
      05/21/2013 15:17:27      60       3       0      37    1.08   537.6
      05/21/2013 15:17:57      61       2       0      37    1.04   521.6
      05/21/2013 15:18:27      59       4       0      37    1.08   540.8
      05/21/2013 15:18:57      61       2       0      37    1.04   520.6
      05/21/2013 15:19:27      61       3       0      37    1.07   537.0
      05/21/2013 15:19:57      61       1       0      37    1.04   519.7
      05/21/2013 15:20:27      60       3       0      37    1.09   545.7
      05/21/2013 15:20:57      61       1       0      37    1.04   519.3
      05/21/2013 15:21:27      61       2       0      37    1.05   525.4
      05/21/2013 15:21:57      59       4       0      37    1.08   540.7

      LPARSTAT :

      %user  %sys  %wait  %idle physc %entc  lbusy   app  vcsw phint  %nsp  Time
      ----- ----- ------ ------ ----- ----- ------   --- ----- ----- ----- --------
      60.5   2.5    0.0   37.0  1.07 534.3   13.2 17.14   591    93    77 15:17:15
      61.2   1.6    0.0   37.2  1.05 523.1   12.9 16.75   507   104    77 15:17:45
      60.1   2.8    0.0   37.1  1.07 532.8   13.2 16.22   542   149    77 15:18:15
      60.4   2.5    0.0   37.1  1.06 529.9   13.2 16.85   548   137    77 15:18:45
      60.7   2.6    0.0   36.7  1.07 534.4   13.3 18.02   528   109    77 15:19:15
      61.3   1.6    0.0   37.1  1.05 523.7   13.0 17.83   499    96    77 15:19:45
      60.2   2.8    0.0   37.0  1.08 542.3   13.6 17.84   684   112    77 15:20:15
      61.3   1.5    0.0   37.2  1.05 522.9   12.9 16.93   551    97    77 15:20:45
      61.5   1.9    0.0   36.6  1.04 522.5   13.1 17.32   551   108    77 15:21:15
      61.0   1.9    0.0   37.1  1.06 527.7   13.0 17.15   537   111    77 15:21:45

      VMSTAT :

      kthr    memory              page              faults              cpu
      ----- ----------- ------------------------ ------------ -----------------------
       r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
       2  0 1409651  4902   0   0   0   0    0   0   6 2711 976 60  3 38  0  1.09 547.5
       2  0 1409651  4902   0   0   0   0    0   0  15 1580 907 62  2 37  0  1.02 512.5
       2  0 1409651  4902   0   0   0   0    0   0  15  771 858 61  2 38  0  1.05 525.7
       2  0 1409651  4902   0   0   0   0    0   0   5 2004 829 61  2 37  0  1.05 522.8

       NMON

      LPAR    Logical Partition XXXXXXX      PhysicalCPU

      LPAR    T1864   0.865
      LPAR    T1865   0.790
      LPAR    T1866   0.815
      LPAR    T1867   0.801
      LPAR    T1868   0.842
      LPAR    T1869   0.794
      LPAR    T1870   0.825
      LPAR    T1871   0.792
      LPAR    T1872   0.817
      LPAR    T1873   0.813
      LPAR    T1874   0.823
      LPAR    T1875   0.793
      LPAR    T1876   0.802
      LPAR    T1877   0.797
      LPAR    T1878   0.845
      LPAR    T1879   0.793
      LPAR    T1880   0.814
      LPAR    T1881   0.794
      LPAR    T1882   0.804
       

      Updated on 2013-05-21T13:53:15Z at 2013-05-21T13:53:15Z by fbenfredj
      • nagger
        nagger
        1595 Posts
        ACCEPTED ANSWER

        Re: NMON CPU not correct on P795

        ‏2013-05-21T15:14:32Z  in response to fbenfredj

        OK, thanks for the info, you clearly have covered the basics well - this needs a PMR to get it fixed.

        2% differences might be explainable due to timing of the stats output but 20% consistently under is gibberish.

        Hope you can let us know what happens!

        Cheers, Nigel Griffiths

        • fbenfredj
          fbenfredj
          39 Posts
          ACCEPTED ANSWER

          Re: NMON CPU not correct on P795

          ‏2013-12-06T04:57:51Z  in response to nagger

          I opened a PMR, but my contract with my customer was over at the of June before I got a solution from IBM.

          But, I got news from my colleague who continued following the PMR.

          The issue was due to the mecanism of power saving.

          When my colleague deactivate the power saving on the P795s the CPU usage became correct.

          Hope this help!

          Cheers,

          Faouzi BEN FREDJ

          P.S : Sorri, I changed my display name from aixadmin_fbf to fbenfredj

      • Steve_ATS
        Steve_ATS
        40 Posts
        ACCEPTED ANSWER

        Re: NMON CPU not correct on P795

        ‏2013-05-22T22:23:09Z  in response to fbenfredj

        Hmm, %nsp at 77, cpus are throttled.  physc X nsp% looks pretty close...

        I've seen some weird output from lparstat -E in later levels, but have not seen it impact physc, but wasn't trying to correlate to nmon either at the time.