The PowerPC 970MP is the latest member of the PowerPC family. It is a dual-core 64-bit PowerPC processor, based on the 64-bit POWER4™ processor. It improves over the performance of previous PowerPC processors with a high-speed processor bus (to meet the bandwidth demands of a highly superscalar and SIMD-enhanced core) and a larger L2 cache. The SIMD functionality continues the PowerPC processor line's support of multimedia, graphics, and data movement. The 970MP offers a wide range of power and performance options with frequency tuning, voltage scaling, and individual core control.
The POWER4, POWER4+™, and POWER5™ processors have been dual-core, but the 970MP is the first 64-bit dual-core PowerPC processor (see Resources for information on dual-core 32-bit PowerPC chips from Freescale). Historically, the POWER™ line has been targeted primarily at servers, while the PowerPC line has been targeted at servers, general-purpose computing, and embedded systems. The two processor lines are binary-compatible, making it practical for them to share substantial pieces of design.
The PowerPC 970MP is based on the POWER4 architecture. The POWER4 design goals were a balanced system design, SMP optimization, native 32-bit and 64-bit PowerPC architecture, and a high processor frequency.
The PowerPC 970xx implementations have generally tried to adopt features from the POWER4 CPU. The PowerPC chips add support for the AltiVec SIMD calculation unit, and have large L2 cache. They have smaller die sizes and have lower power requirements, due to enhanced process technology, and they generally run at a higher frequency.
One notable difference between the POWER4 and the PowerPC 970MP is that the 970MP offers individual L2 cache rather than shared L2 cache. The 970MP does not provide any L3 cache. On the other hand, the chips share I/O bus and PLL.
The following block diagram illustrates the design of the 970MP processor cores:
Figure 1. The topology of a 970MP core
The 970MP core has hardware branch prediction to guide it in prefetching likely instructions to reduce the cost of branches. Up to eight instructions per cycle can be fetched into the instruction buffer, decoded, and dispatched.
There are four queues for instructions: one queue is split between the LSUs and the FXUs, one handles the FPUs, one handles the AltiVec units (VALU and VPerm), and one handles the BRU and CRU. The LSUs can access up to 16 bytes per cycle from the L1 cache.
Main memory access is slower, at four bytes per cycle, but the large L2 cache mitigates this. The separate instruction and data caches increase usable bandwidth.
The 970MP cores share a single input/output bus. The input bus feeds data and commands to both cores. The output bus is also shared, with an arbiter controlling a multiplexer (mux) unit. The arbiter uses a simple round-robin technique giving each core equal access; in the case of simultaneous access, the core that submitted the least recent previous request is given access first. The shared I/O lets a single encoder and decoder handle the load for the full processor.
The dual-core implementation of the 970MP is designed to offer a great deal of flexibility at runtime. Each core has dedicated resets and interrupts. C0 and C1 have separate voltage planes, so that C1 can be powered down completely without affecting C0. Doze mode is also entered independently, allowing either core to doze while the other works. Memory coherence is maintained through the North Bridge.
Some modes are shared; the nap and deep nap power modes are entered by both cores at once. Both cores, and the I/O bus, run at the same frequency, which can be full speed, F/2, or F/4.
On a 2.5GHz processor, the theoretical maximum power consumption is roughly 100W, with both cores active at high voltage and full frequency. At F/2 operation, power consumption is about 60W, and at F/4 operation with only one core active, consumption is about 6W. The low voltage mode is not available at full frequency, only at F/2 or slower.
Figure 2. Power consumption of a 2.5GHz 970MP processor
Parametrics and conclusions
Figure 3. Specifications of the 970MP
The PowerPC 970MP design uses dual 64-bit cores per chip, based on the POWER4 core with a SIMD/vector engine, and 1MB of L2 cache per core, to achieve high performance on computation and bandwidth-intensive operations over a wide power and performance range.
This article was adapted by Peter Seebach, working from the original presentation "Introducing the IBM PowerPC 970MP: A new, low-power, high-performance dual-core processor," presented at MPR Fall Processor Forum 2005 by Norman Rohrer of IBM. Peter would like to thank Tim Kelly and Norman Rohrer for technical and editorial review during the writing process.
- This paper is based on a presentation given at Fall Processor Forum 2005: The Road to Multicore. See the rest in this series.
- The 970MP is the first dual-core 64-bit PowerPC chip. See also this dual-core 32-bit PowerPC chip (from Motorola).
- IBM: PowerPC G5 to Go Mobile (eWeek) discusses some of the 970MP's unique power-saving abilities.
- The IBM PowerPC 970FX RISC Microprocessor User's Manual is an invaluable resource. You will find it and many other pieces of Power Architecture technical documentation, including specifications, manuals, and much more, posted to the IBM Semiconductor solutions Technical library.
- Keep abreast of all the Power Architecture news: subscribe to the Power Architecture Community Newsletter.
Get products and technologies
- Get Custom: Contact IBM E&TS about Engineering & Technology Services and consulting.
- Find Power Architecture-related downloads at the developerWorks Power Architecture zone.
- Take part in the IBM developerWorks Power Architecture discussion forums.
- Send a letter to the editor.