Just like being there: Papers from the Fall Processor Forum 2005: Introducing the IBM PowerPC 970MP

A new, low-power, high-performance dual-core processor

This Fall Processor Forum paper explores the PowerPC® 970MP, a 90nm-process, dual-core version of the PowerPC 970FX with remarkable, dynamic power-saving features. It's like no 64-bit dual-core PowerPC processor you've ever met before. Read why.

Norman Rohrer (rohrern@us.ibm.com), Distinguished Engineer, IBM

Norman RohrerNorman Rohrer is a Distinguished Engineer in the PowerPC Microprocessor Group within the System and Technology Group of IBM located in Essex Junction, VT. Norman received his Bachelor’s Degree in physics and mathematics from Manchester College, North Manchester, IN in 1987. He received his Master’s Degree and Doctor of Philosophy degree in electrical engineering from The Ohio State University, Columbus, OH in 1990 and 1992, respectively. Norman has been a lead designer on PowerPC 750 and 970 products for Apple’s G3 and G5 chips and Nintendo’s GameCube. His interests lie in the area of high-speed circuit optimization for future technologies. Norman holds 18 patents and is a co-author on two books titled High Speed CMOS Circuit Design Styles and SOI Circuit Design Concepts. He has been a Senior Member of IEEE since 2003.



14 December 2005

The PowerPC 970MP is the latest member of the PowerPC family. It is a dual-core 64-bit PowerPC processor, based on the 64-bit POWER4™ processor. It improves over the performance of previous PowerPC processors with a high-speed processor bus (to meet the bandwidth demands of a highly superscalar and SIMD-enhanced core) and a larger L2 cache. The SIMD functionality continues the PowerPC processor line's support of multimedia, graphics, and data movement. The 970MP offers a wide range of power and performance options with frequency tuning, voltage scaling, and individual core control.

The POWER4, POWER4+™, and POWER5™ processors have been dual-core, but the 970MP is the first 64-bit dual-core PowerPC processor (see Resources for information on dual-core 32-bit PowerPC chips from Freescale). Historically, the POWER™ line has been targeted primarily at servers, while the PowerPC line has been targeted at servers, general-purpose computing, and embedded systems. The two processor lines are binary-compatible, making it practical for them to share substantial pieces of design.

Architectural features

The PowerPC 970MP is based on the POWER4 architecture. The POWER4 design goals were a balanced system design, SMP optimization, native 32-bit and 64-bit PowerPC architecture, and a high processor frequency.

The PowerPC 970xx implementations have generally tried to adopt features from the POWER4 CPU. The PowerPC chips add support for the AltiVec SIMD calculation unit, and have large L2 cache. They have smaller die sizes and have lower power requirements, due to enhanced process technology, and they generally run at a higher frequency.

One notable difference between the POWER4 and the PowerPC 970MP is that the 970MP offers individual L2 cache rather than shared L2 cache. The 970MP does not provide any L3 cache. On the other hand, the chips share I/O bus and PLL.

The following block diagram illustrates the design of the 970MP processor cores:

Figure 1. The topology of a 970MP core
The topology of a 970MP core

Acronym alley

The space requirements of a diagram such as Figure 1 make it necessary to use acronyms that some readers may not be familiar with. The following acronyms (or other short names) are in use:

BIU - Bus Interface Unit
BRU - Branch Unit
CR - Condition Register
CRU - Control Register Unit
CTR - Control Register
ERAT - Effective/Real Address Translation
FPR - Floating Point Registers
FPU - Floating Point Unit
FXU - Fixed Point Unit
GPR - General Purpose Registers
LR - Link Register
LSU - Load/Store Unit
SLB - Segment Lookaside Buffer
TLB - Translation Lookaside Buffer
VALU - Vector Arithmetic Logic Unit
VPerm - Vector Permutation unit
VRF - Vector Register File

The 970MP core has hardware branch prediction to guide it in prefetching likely instructions to reduce the cost of branches. Up to eight instructions per cycle can be fetched into the instruction buffer, decoded, and dispatched.

There are four queues for instructions: one queue is split between the LSUs and the FXUs, one handles the FPUs, one handles the AltiVec units (VALU and VPerm), and one handles the BRU and CRU. The LSUs can access up to 16 bytes per cycle from the L1 cache.

Main memory access is slower, at four bytes per cycle, but the large L2 cache mitigates this. The separate instruction and data caches increase usable bandwidth.

Shared I/O

The 970MP cores share a single input/output bus. The input bus feeds data and commands to both cores. The output bus is also shared, with an arbiter controlling a multiplexer (mux) unit. The arbiter uses a simple round-robin technique giving each core equal access; in the case of simultaneous access, the core that submitted the least recent previous request is given access first. The shared I/O lets a single encoder and decoder handle the load for the full processor.

Dual-core implementation

The dual-core implementation of the 970MP is designed to offer a great deal of flexibility at runtime. Each core has dedicated resets and interrupts. C0 and C1 have separate voltage planes, so that C1 can be powered down completely without affecting C0. Doze mode is also entered independently, allowing either core to doze while the other works. Memory coherence is maintained through the North Bridge.

Some modes are shared; the nap and deep nap power modes are entered by both cores at once. Both cores, and the I/O bus, run at the same frequency, which can be full speed, F/2, or F/4.

On a 2.5GHz processor, the theoretical maximum power consumption is roughly 100W, with both cores active at high voltage and full frequency. At F/2 operation, power consumption is about 60W, and at F/4 operation with only one core active, consumption is about 6W. The low voltage mode is not available at full frequency, only at F/2 or slower.

Figure 2. Power consumption of a 2.5GHz 970MP processor
Power consumption of a 2.5GHz 970MP processor

Parametrics and conclusions

Figure 3. Specifications of the 970MP
Specifications of the 970MP

The PowerPC 970MP design uses dual 64-bit cores per chip, based on the POWER4 core with a SIMD/vector engine, and 1MB of L2 cache per core, to achieve high performance on computation and bandwidth-intensive operations over a wide power and performance range.


Acknowledgments

This article was adapted by Peter Seebach, working from the original presentation "Introducing the IBM PowerPC 970MP: A new, low-power, high-performance dual-core processor," presented at MPR Fall Processor Forum 2005 by Norman Rohrer of IBM. Peter would like to thank Tim Kelly and Norman Rohrer for technical and editorial review during the writing process.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=100654
ArticleTitle=Just like being there: Papers from the Fall Processor Forum 2005: Introducing the IBM PowerPC 970MP
publish-date=12142005