Local, Near & Far Memory part 1 - Large Power7 boxes more local memory
nagger 100000MRSJ Comment (1) Visits (18018)
On Power6 the largest machine was the Power 595 with 64 Physical CPUs (cores) across eight CPU books in the machine - each CPU book having 4 Power6 chips and so 8 CPUs (Power6 is a dual CPU chip design). However, with Power7 that has stepped up to 256 CPUs across the same eight CPU books with four chips but with 8 CPUs each so that is 32 CPUs per book.
On the Power6 595, a 32 way virtual machine (LPAR) would require the use of 4 CPU books so you are having to use the CPU book to CPU book access to memory a lot of the time.However, on the Power7 795, all 32 CPUs can be in a single CPU book - thus the memory is closer and a bit faster. And that is not including the fact that Power7 memory is faster as well.
This introduces the different distances to memory. Before I go further I have better state this distance is not massive. We are not talking "the good, the bad and the ugly" of memory access here but more like "the good, the excellent and the blisteringly fast" memory access. Each generation has in addition to the CPU performance jump had to add similar jumps in memory performance for it to keep up.
Definitions of memory access:
Note: the above chart was updated 3rd October 2011
The Power 710 to 740 uses either a single Chip (by definition using Local memory) or has two Power7 processor chips using Local and "unknown" memory. See other blog asking for help to work this out.
The Power 750 - it looks if you open the top of the machine like the CPUs are on different mini "CPU books" as this models allows you to insert one to four CPU and memory "cards" but they are in fact joined together by simple and fast copper tracks - hence the large number of connectors. So the memory access is Local or Far.
From the above table you can see it is only the Power 770, Power 780 and Power 795 that make use of the Far memory access across the Intra-node (Central Electronic Complex (CEC)) communications. Assuming the regular 8 CPU (cores) per Power 7 this means the Power 770/780 has is local and near if you virtual machine is up to 16 CPUs and placed in a single CEC and on the Power 795 the memory can be local or near for up to 32 CPUs. Let me remind you that the Far memory is not a issue - it is a designed in to the architecture to be fast and lets these large machine give us excellent performance on super larger virtual machines and Local/Near can give you a speed boost for smaller virtual machines that don't span nodes. Plus the 8 core design of the Power7 processor lets you avoid this even more than in Power6.
So how does this effect me?
Local, Near and Far in benchmark ratings
On Power6 based machines, the rPerf values are based on a cocktail of benchmarks (some public and many internal) for various numbers of CPU up to the whole machine. However, on Power7 it was decided that the number of people that would run 256 CPU single virtual machines was going to be rare - certainly in the early days. So for example on the Power 795 the benchmarks are based on four 64 CPU virtual machines in a single box. The result is that rPerf number is closer to what you should find in practice. It also means in both Power6 and Power7 and for published Industry Benchmarks all types of memory access are involved and performs well.