We had a memory fault on our P720 so IBM replaced the system board and memory module. This seemed to work in that our single VIOS and 3 AIX 7 LPARs operated normally. However we noticed that when lsvg was run on the VIOS against the client LPAR disks, the response took several seconds instead of the expected tenths of seconds. An lslv command on the VIOS took 50 seconds when it should complete in under 1 second, but the client LPARs seemed to respond in a normal timeframe. After a few months of operation we got a few "general xmalloc debug error" messages on the VIOS and soon after there were no memory left (couldn't fork, etc). After rebooting the VIOS and LPARs we got several "undetermined error" messages on the VIOS and hundreds of "disk operation error" messages on the client LPARs.
We then realised that the Multi-Core Scaling value was set to the default of 4, however the P720 had been configured and deployed with a value of 1. Setting the MCS value back to 1 resolved the problem.
It's unfortunate that a mismatched MCS is allowed to boot up because there seems to be no obvious indication that a base configuration error is the cause for all manner of failure symptoms.
NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
This topic has been locked.
2 replies Latest Post - 2013-01-17T21:16:39Z by SystemAdmin
Pinned topic Replaced P720 system board reset MCS giving poor performance & disk errors
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-01-17T21:16:39Z at 2013-01-17T21:16:39Z by SystemAdmin
jeffrey27 270002K1W65 PostsACCEPTED ANSWER
Re: Replaced P720 system board reset MCS giving poor performance - RESOLVED2013-01-17T03:43:37Z in response to jeffrey27The actual problem was that this machine had 1GB memory allocated to its VIOS. Increasing that to 2GB resolved the problem. The MCS value was a red herring.
SystemAdmin 110000D4XK6902 PostsACCEPTED ANSWER
Re: Replaced P720 system board reset MCS giving poor performance & disk errors2013-01-17T21:16:39Z in response to jeffrey27The VIOS Performance Analyzer often can help in this kind of situation. It's a free download from IBM http://www.ibm.com/developerworks/wikis/display/wikiptype/other+performance+tools#OtherPerformanceTools-VIOSPA