Local, Near & Far Memory part 2 - Virtual Machine CPU & Memory Lay Out
nagger 100000MRSJ Comments (6) Visits (27623)
So you know about Power7 Local, Near and Far memory for your actual machine but what is your Virtual Machine (LPAR) actually using? There are three key commands to show you (lssrad, mpstat and topas) and we will look at some example output.
First, we need to define an SRAD or a Scheduler Resource Affinity Domain. If you have used Resource Sets with AIX WLM or WPAR then you have a good idea what these are like. An SRAD is a group of resources but in our case CPU/cores and the associated memory that is directly attached to it. As an example to help make this real, an SRAD might have logical CPUs 4, 5, 6 and 7 (i.e. the logical CPUs of the second physical CPU/core) and 28 GB of memory. With a process that is running in this SRAD, any memory it gets allocated at start p time or later on with malloc() with be assigned to the 28 GB's of memory in the SRAD. Well, if there is available memory that will happen in a worst case AIX may be forces to allocate memory further away. In fact, the process ID (PID) includes the SRAD number (it makes up a few bits 8 to 11 in the middle) and helps explain the large PID values AIX uses but you don't really need to know that! Of course, an SRAD, could contain many logical CPUs as each Power7 chip has up to 8 cores.
lssrad -ax - link to manual page lssrad command
Fortunately, the lssrad command gives you a clear view of the logical layout of your Virtual Machine (LPAR). Here is a simple from my Power 770:
There is no need to explain the lssrad options as all the others are IMHO pointless :-)
So let me explain what we have here
So we can see:
We can also assume that when the VM was started that the fist Power7 chip already had one core allocated to a different VM and so this one was spread across two Power7 chips. Processes started on the first Power7 chip in SRAD0 will (hopefully) get memory within that same SRAD's 28 GB's of memory for faster access. Likewise, for Processes started on the second Power7 = SRAD1 and its 2.5 GB.
Can we determine which Power770 CEC drawer or which processors?
mpstat -d 1 99999 - link to manual page mpstat command
This gives you the dynamic picture of how often the CPU/cores are accessing Local, Near and Far memory at the moment. Below is a sample of the output but I have removed 15 columns of output to focus on Affinity.
So let me explain what we have here:
Yes, those column headings are as clear as mud! But it shows out the majority of memory access is Local with some (20%) Far access from CPU/core 17 and a few lower percentages to Near memory in the other CPU/cores. Of course, the very high bandwidth between Power7 chips and between nodes (in the hundreds of GB/s) means Near and Far memory references are not a problem provided they are not the majority of memory accesses.
topas -M - link to the manual page topas command
If you have lots of CPU/core in a Virtual Machine and four times that (due to SMT=4) logical CPUs you are going to go nuts watching the data steam of the top of the screen. So here good old topas command to the rescue and actually shows the lssrad, mpstat like data and more on a screen. Either start this with: topas -M - or - start topas and hit capital M to switch modes.
So let me explain what we have here (note: there are 60 other logical CPUs removed from the above screen capture):
But that is another story for another blog in this series!
I hope you find this interesting, thanks Nigel Griffiths