Emulation and cross-development for PowerPC

Avoid the cost of new hardware

This article introduces PowerPC emulation and cross-compiling for developers without access to real hardware. It is intended for developers familiar with computer architecture who own an x86-based workstation but are interested in experimenting with PowerPC.

Share:

Hollis Blanchard (hollisb@us.ibm.com), Software Developer, IBM

Hollis Blanchard started learning about the PowerPC architecture and the Linux kernel in 1998. He works in the IBM Linux Technology Center, where he's developed for embedded PowerPC, pSeries servers, and x86 systems. He's also one of the core contributors to penguinppc.org.



18 January 2005

Some developers may not have access to a PowerPC® Linux™ system to play with (although you can buy one for less than US$200 at the time of this writing). For the curious x86 Linux user, emulation is a convenient and inexpensive alternative. There are at least three open source PowerPC emulators available, two of which are quite new.

Accuracy

Some emulators, particularly those used by processor developers, are cycle-accurate, meaning that a particular instruction in a given context will take exactly as many cycles to run as it would on real hardware. These emulators emulate not just the instruction set, but also the internal pipelines and caches of the processor. They are particularly useful during development before real silicon exists, and they can also yield more insight into performance bottlenecks than can be gleaned from hardware performance counters. However, these emulators have some severe limitations. Because they document so much intellectual property and hardware tricks, their internals are almost never free for examination or modification. Instead, the processor designer will make binaries available, sometimes for no cost, often for a very restricted range of hosts. Another problem for higher-level software developers is that because they emulate large amounts of processor internals, they are very slow. Finally, they may not be as accurate as real hardware. For reasons of speed or complexity, even a cycle-accurate emulator can omit cache or IO emulation, yielding skewed results. They're probably pretty close for most situations, but the fact remains that an emulator is only emulating the hardware, and its behavior can diverge.

None of the emulators discussed here are cycle-accurate. In fact, they probably aren't even fully behavior-accurate. (When that happens, it's called a bug, and will usually end up being squashed... eventually).


Emulating user mode

One very convenient feature for the casual developer is user-mode emulation. If an emulator emulates only the processor and IO (such as a network device), a Linux kernel would need to be booted (and emulated) first, then the emulated application on top of that. That's certainly important for more serious work, but it's much more convenient for simple experimentation to avoid dealing with kernels entirely. If the emulator can emulate not just the processor but also the operating system kernel, that makes it much easier to run little programs that don't depend on many kernel services, such as those that only need to use the write and exit system calls.

When an emulator ordinarily encounters a PowerPC system call instruction, it emulates the exception by storing the instruction address into the SRR0 register, setting some architecture-defined bits in SRR1, and transferring control to physical address 0xC00. (Some PowerPC variants allow more control over this behavior, but this is the traditional PowerPC model.) The emulated kernel has its system call exception handler at 0xC00, just like on hardware, and so the kernel takes control of the processor.

When an emulator supporting user-mode emulation encounters a system call instruction, on the other hand, it does not transfer control to the emulated exception handler; instead it interprets the system call itself. The easiest examples are system calls like read and write: these can be almost directly converted into real system calls made by the emulator. The glue layer to translate between emulated system calls made by the emulated application and real system calls made by the emulator may have other functionality, such as logging all system calls made by the emulated application.

In addition to bypassing the complexity of building a kernel to emulate and a file system image to boot into, and configuring a virtual network device for IO, this shortcut also speeds up emulation, as the reams of kernel instructions that would have run to handle the system call -- from the exception handler through the VFS and the device driver -- are bypassed. However, it should be clear that not running the kernel inside the emulator means the overall behavior could be quite different indeed. In the worst case, a bug in the emulator's system call glue could make it seem as though the emulated application is buggy, even though it would run perfectly on a real kernel. This worst case remains pretty rare, though, and these tools are generally production-ready.

Just In Time

Just In Time (JIT) compilation is a method by which interpreted bytecode (for example, an emulated instruction stream) is translated into native instructions on the fly. Rather than simply interpreting and emulating each instruction in turn, whole sequences of instructions are converted to their native equivalents and cached so that the translation need not occur for subsequent executions of the sequence. Accordingly, tight CPU-bound loops of interpreted code should execute at near-native speeds, since the native code is kept in the cache. On the other hand, code with few loops would not see much speed improvement. JIT compilers are extremely common for Java™ virtual machines, and they can be used to great effect in emulated virtual machines as well.

Qemu

Qemu, which is relatively new, uses dynamic translation like a Java Just In Time (JIT) compiler to achieve good performance; in this case, good performance is about 4x to 10x slower than native hardware, depending on the benchmark. It supports a few different hosts and targets, but all we'll worry about is x86 host and PowerPC target, which fortunately is one of the supported configurations. Qemu also supports a remote GDB (GNU Debugger) connection, which is very valuable for debugging. Unfortunately, qemu does not support GDB connections in user-mode emulation, only in full-system mode. Qemu does not support AltiVec™ vector-processing instructions.

PearPC

PearPC is another new emulator that can use JIT dynamic translation, but only on an x86 host with a PowerPC target -- however, that environment is the goal of this article. Its performance isn't as good as qemu's, being roughly 15x slower than the host system. Unfortunately, PearPC does not support a user environment, so a kernel and basic file system would be needed as well (Linux, Darwin, and Mac OS X are currently supported). PearPC does not support a GDB connection, nor yet does it support AltiVec vector-processing instructions (although the developers plan to add them in a future release).

PSIM

PSIM (PowerPC simulator) is the granddaddy of PowerPC emulation: it was written in 1994 and assisted in some of the initial port of Linux and NetBSD to the then-new PowerPC architecture. PSIM was integrated with the GDB sources, and amazingly, although it hasn't seen development since 1996, it still builds and works. Being integrated with GDB, PSIM also supports GDB connections, including user mode. Because it predates AltiVec, PSIM does not support AltiVec vector-processing instructions.


Choosing an emulator

For the reasons discussed above, this article uses qemu; the same basic issues apply with the others, but qemu is the simplest to build for the purposes of this article. Download and extract the latest qemu tarball (see Resources), then:

Listing 1: Building qemu
$ ./configure --target-list=ppc-user
$ make

This will produce ./ppc-user/qemu-ppc, which will be used later to execute PowerPC binaries.


Cross-compiling

The second key ingredient in cross-development is a cross-compiler. A cross-compiler is a compiler that runs on one architecture but produces binary code for another. This is very convenient if the deployment system is significantly underpowered relative to the development system, as is usually the case in embedded system development. A cross-compiler does not overwrite the system's native compiler or interact with it in any way.

Crosstool

Building a GNU cross-compiler can be pretty easy depending on the architectures involved, but sometimes build breaks do happen. It can also require several stages of builds to get all the right components built for each other in the right way. To remove the guesswork and automate the process, Dan Kegel has developed a very useful build script called crosstool.

Download and extract the latest version of crosstool (see Resources). Then:

Listing 2: Building crosstool
$ sudo mkdir /opt/cross
$ sudo chown $USER /opt/cross
$ sh demo-ppc750.sh

That will run for a while, and when it finishes, binutils, GCC, and glibc will be installed for cross-compiling in /opt/crosstool. Have a look at the directory structure there, and consider adding it to the PATH environment variable to save typing later.


Hello, world

Now that an emulator and cross-compiler have been built, it is time to put them together and test the new environment. Put the following source into hello.c:

Listing 3: A strangely familiar program
#include <stdio.h>

int main(int argc, char *argv[])
{
	printf("Hello, world.\n");
	return 0;
}

For now, use static linking to avoid worrying about how to install PowerPC shared libraries on the x86 host system. To produce a 32-bit PowerPC ELF executable named "hello", run the following:

Listing 4: Cross-compiling with GCC
$ powerpc-750-unknown-gnu-gcc -static hello.c -o hello

To verify that it is the expected format, you can use this command:

Listing 5: Checking file type
$ file hello
hello: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV),
for GNU/Linux 2.4.3, statically linked, not stripped

And finally, run the executable under qemu:

Listing 5: Running an executable under qemu
$ ./ppc-user/qemu-ppc hello

"Hello, world." should be output to the terminal.


What now?

Now you know you can build C code into PowerPC executables and run them. You can also experiment with the simple assembly example given in the "Introduction to PowerPC Assembly" article, which is listed in Resources. (Note that you could use the cross-assembler directly, it's a lot easier to continue to use the compiler instead.) Once you're satisfied with that, you can move on to bigger and more interesting examples, perhaps including shared libraries (read the qemu documentation -- which is also listed in Resources -- for help with that).

64-bit PowerPC

Although crosstool can produce ppc64 toolchains just as easily, there is unfortunately no open source emulator for 64-bit PowerPC, so you would need real hardware to experiment. Of course, ppc32 executables run just as well on ppc64 hardware (but the reverse is not true).


Conclusion

An emulator will never be as fast as native hardware; the biggest reason functionality is implemented in hardware is speed. An emulator will also never be as accurate as real hardware, especially when the hardware itself could contain errata that can be triggered by subtle timing interactions of internal components. However, an emulator can be very valuable for development and even general-purpose computing. Virtual PC, a commercial emulator, is used by a large number of Macintosh,® owners to run Windows® applications. It may not be as fast as hardware, but it's cheaper and easier to maintain. When developing low-level operating system code, an emulator can provide that needed glimpse into the system's state to reveal a hardware-crippling bug. In fact, during hardware development, an emulator might be the only development platform available!

The emulators above have been and are being used for operating system development, which proves some measure of robustness. But don't let that stop you from trying them out just to experience having 32 general-purpose registers, or from going out of your way to try to support a PowerPC user of software you've written. With an unbeatable price tag and convenient environment, what do you have to lose?

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=32240
ArticleTitle=Emulation and cross-development for PowerPC
publish-date=01182005