Topic
  • 1 reply
  • Latest Post - ‏2013-07-17T04:55:03Z by MrBhatt
Govind-Paciolan
Govind-Paciolan
3 Posts

Pinned topic Page fault and context switching

‏2013-07-16T20:33:42Z |

Hello,

We have 2 P7 770's that each have close to 12 LPAR's carved each. The CPU is dynamically allocated with entitled set to 2 and Virtual typically set to 8. There are occasions when we increase the entitled to 6 and subsequently increase the Virtuals to 14 (just so that if the load requires more than 6 physical CPU;s,it can go all the way upto 14 physical CPU's, if its available).

We use U2 UniVerse database, which does not like sharing memory, so we allocate memory to each LPAR. The Frames have 256GB of Memory each, so roughly each LPAR gets around 20GB of RAM.

We have had performance problems with our systems since the time we moved to AIX 7.1 about 8 months ago, and have tried all possible combinations to get optimal performance, but have not had much luck.

Our issue is that when heavy load hits the LPAR, the system works perfectly for around 5-7 minutes, and then hits a wall.. The characteristics that we have observed lately are that the cswitch/s on the vmstat starts rising from around 10000 to 100000 around the 5th minute mark, and  keeps climbing. During this time, the system becomes very sluggish. Any command on the shell prompt is sloooow. Later, over a period of time, the cswitch/s comes down (as shown below) and system performance is better too. Additionally, the topas_nmon in verbose mode, has the page faults always in the Danger zone (above 6000/s)

Any help to identify the root cause and resolve the issue would be much appreciated.

:

08:34:11 cswch/s

08:34:11   10332

08:34:16   10748

08:34:21   10126

08:34:26   10978

08:34:31   12635

08:34:36   14022

08:34:41   11323

08:34:46   11251

08:34:51   11334

08:34:56   11106

08:35:01   21873

08:35:06   53078

08:35:11   47981

08:35:16   66985

08:35:21   70085

08:35:26   98738

08:35:31  126427

08:35:37  135490

08:35:42  121562

08:35:47  123373

08:35:52  120545

08:35:57  122405

08:36:02  106024

08:36:07  111116

08:36:12  126232

08:36:17  127205

08:36:22  126834

08:36:27  122402

08:36:32  127339

08:36:37  132542

08:36:42  133216

08:36:47  136304

08:36:52  136131

08:36:57  133653

---

------

08:52:12 cswch/s

08:52:12  138880

08:52:17  142634

08:52:22  145086

08:52:27  139646

08:52:32  141603

08:52:37  139668

08:52:42  150483

08:52:47  145098

08:52:52  153237

08:52:57  153677

08:53:02  117990

08:53:07  127526

08:53:12  139365

08:53:17  126767

08:53:22  133493

08:53:27  144945

08:53:32  149401

08:53:37  152669

08:53:42  153789

08:53:47   49023

08:53:52   11245

08:53:57   10776

08:54:02   10926

08:54:07   10257

08:54:12   10156

08:54:17   10209

08:54:22    8360

08:54:27    8894

08:54:32    8726

08:54:37   11563

08:54:42    9351

08:54:47   10743

08:54:52   10722

08:54:57    9418

  • MrBhatt
    MrBhatt
    11 Posts

    Re: Page fault and context switching

    ‏2013-07-17T04:55:03Z  

    Please share full vmstat (preferably #vmstat -Iwt ) output file nmon if you are generating on this problem partition