Meet the experts: Arnd Bergmann on the Cell BE processor

The Linux kernel maintainer for the Cell BE processor talks about programming for this much-anticipated new architecture

This question and answer article features Arnd Bergmann of IBM: a kernel hacker with the IBM Linux Technology Center, the Linux® on Cell Broadband Engine™ (Cell BE) processor kernel maintainer, and author of the spufs file system.


The Cell BE processor has been hailed as revolutionary. It is often described as "many processors in one without being multicore." This is more or less accurate. It really is just one PowerPC® core, surrounded (for now) by eight parallel "Synergistic" Processing Elements (SPEs, also commonly referred to as Synergistic Processing Units, or SPUs, though the SPU is actually a part of an SPE, and the terms aren't really interchangeable). The fundamental difference between the Cell BE processor and dual or multi-core is that on the Cell BE processor, only the PowerPC core runs the operating system: the SPEs run (usually highly vectorized) application code.

developerWorks caught up with Arnd Bergmann, the IBM® Linux™ on the Cell BE processor kernel maintainer, to talk about the port, about the Cell BE processor and Cell BE-based "workstations" (which aren't workstations at all) and about programming to Cell BE -- among other things.

developerWorks: Can we start with a little bit about you? Can you describe the team that you're working with, and how you came to be working on this Cell port?

Arnd Bergmann: I'm in a team that is part of the LTC [Linux Technology Center] organization in IBM. The LTC does most of the Linux-related work within IBM. Most of the team that's working on Cell now started out working on the zSeries® Linux port. We have some other people joining us that had been on some other software projects in IBM and we founded this little group that's doing Linux on Cell work. And then we also work together with some other groups here in the lab in Böblingen and Mainz -- they are doing the firmware and hardware development.

dW: So you more or less self-organized? And volunteered yourselves to work on Cell mainly because you wanted to?

Bergmann: Well, I was offered the chance to start working on Cell, and I thought it sounded like a great idea.

dW: And before that, you had been working on zSeries -- is there any similarity between Cell and the mainframes?

Bergmann: No, it's quite different. I mean, Linux kernel work is basically the same on all architectures, be it zSeries, or Cell, or others. But each of these two has things that make them very different from all of the other architectures. They don't really have much in common.

The Cell processor

dW: Could you talk about some of the things that do make Cell so different? Many of our readers will have been keeping up with Cell news, but I think there are a lot of people who haven't been, necessarily. How would you describe Cell to somebody who hasn't been following along?

Bergmann: Well, the Cell processor on the surface is just a regular PowerPC processor like others that we have seen. But there is one thing that makes it very special, which is the fact that it actually contains multiple small processors on one processor die -- we now have eight of those so-called synergistic processing elements [SPEs], which are very high-performance, optimized all-arithmetic elements -- that act as stand-alone processors and can talk to each other very fast. They are all orchestrated by one core processor, which is actually comparable to a PowerPC running a normal operating system like Linux. The eight SPEs are not running Linux kernels, just application code.

dW: So the SPEs are the main thing that make it so revolutionary.

Bergmann: Yes.

dW: And they also are what make it so good for game programming, and for scientific applications?

Bergmann: Yes. In games, it's mostly about computational work doing vector operations: floating point operations and the instruction set that is used on the SPUs are optimized for vector operations. So every single instruction that is executed there actually behaves like a vector operation and can always work on a 128-bit word in one cycle, or in one instruction. In the 128 bits, for example, you could have four single-precision floating point operations at the same time that are done by one processor instruction.

dW: How close to VMX instructions are the SPU instructions?

Bergmann: Some of the instructions are very similar; and the SPUs have some additional instructions that go far beyond what you can do with VMX. And there are a small number of instructions in VMX that don't have a counterpart on Cell. The biggest difference is that the SPUs don't actually use PowerPC instructions. The instruction format itself is different, even though the assembler or the C intrinsics might be very similar.

dW: So, are VMX opcodes and SPU opcodes the same? Will VMX optimized code have to be rewritten to take advantage of SPU?

Bergmann: Unlike VMX code, the SPU has a complete instruction set to execute self-contained binaries. The opcodes that are used are very different on the assembly level but can be used in similar ways from high-level languages.

dW: So then the compilers have to be rewritten to include the opcodes for the SPUs, is that correct?

Bergmann: Yes. We have a special GCC port, and you can have other compilers as well that are targeted at the SPU, instead of targeted at the host instruction set. So if you want to compile the program for the SPU, you need to use this cross-compiler.

The Cell "workstation"

dW: It has been said that Cell is supposed to scale to -- everything from very tiny devices, up to huge supercomputers. What about right in the middle? Is there any point at all in having a desktop Cell machine?

Bergmann: For the regular desktop user, I think there will not be much difference, because you get all the benefits only if you write your applications specifically for the Cell processor. That's usually done either if you have just a very limited number of applications on the machine, like you have on embedded systems, [where you want] for instance to have MPEG encoding/decoding right there -- or in the supercomputer world, where you write your applications to run on the computer once.

dW: So all of the people who are hoping for a Cell computer at home are probably going to be disappointed.

Bergmann: Well, if they are programmers, they could of course write their own applications. And we might see some applications that are optimized for Cell. For example, you could have video editing software that's optimized for Cell, or some image manipulation. If you just want to run standard applications, it probably will not help you a lot.

dW: Well, but it wouldn't harm you either, would it? To run workaday applications like e-mail or Web browsing -- there wouldn't be any performance penalty, would there be?

Bergmann: No -- you'd just spend much more money for the processor itself, I guess.


dW: So for ideal use, you would really want it for stuff like fluid dynamics or plasma dynamics, where you're doing millions of -- no pun intended -- particle in cell calculations, where you're wanting to highly vectorize everything. Is that correct?

Bergmann: Yes.

dW: And one news report I saw said that you would be premiering a Cell-based workstation at LinuxTag. Can you describe what it is people will be seeing at LinuxTag?

Bergmann: I guess there was some misunderstanding. It isn't so much a "workstation" as a Cell-based Blade server -- a prototype board. The same, in fact, as the one that was shown at the E3 conference.

dW: Yes. I saw pictures of that. So it's not really -- other than being Cell-based and running Linux, it's not really related to the workstation that's coming out later this year.

Bergmann: I don't know if there are any plans about the workstation itself -- that might be the same misunderstanding.

dW: Oh. Well, I know the early articles about it said that there was going to be a workstation for game developers -- and I know that in the case of games, sometimes a board is called a workstation... So it sort of sounds like that might be it. Do you know anything about plans of IBM offering Cell-based machines other than the Blade?

Bergmann: No I don't know anything about other product plans, sorry.

dW: But we will be having a Cell-based Blade, if nothing else?

Bergmann: Well, this is the one thing that I'm working on, and I don't know about the availability for customers at all.

What we are working on now is really a technology study. And people who are interested in that can ask the IBM Engineering and Technology Services [ET&S] about how to evaluate this technology for their purposes. But as far as I know, there will not be a Cell-based IBM server product.

dW: Really? Do you have any idea why that is?

Bergmann: No.


dW: Would you say -- in your opinion, do you think we should?

Bergmann: I think there is a huge market and huge demand for those, yes.

The Linux port so far

dW: Okay... So, well -- on to the kernel. Can you tell us a bit about your work porting the Linux kernel to Cell?

Bergmann: Yes. The work that we have done is comprised mainly of three parts.

The first part is getting Linux to run at all on the Cell processor, and on the board that we have -- which means we had to add some parts to the PowerPC architecture code, for example, for the CPU ID and interrupt controller to work with the basic architecture support.

Most of the groundwork for that was done by the Sony/Toshiba/IBM Design Center (STIDC) in Austin, Texas. I've started to make this work more compatible with the standard kernel, so that it fits in a way that gets integrated into current 2.6.13 or what have you.

The second part is the device drivers for our actual prototype port, so we have an Ethernet controller and some other hardware that needs device drivers. That work was mostly done by people here on my team, but not by myself.

The third part is the exploitation of the SPEs themselves. That's where the SPU file system that has been described already (see Resources) comes in, and that work is still ongoing. There were some earlier attempts at a model, but it looks like the SPU file system is the final model now.

dW: Can you describe some of the alternatives that you went through and rejected?

Bergmann: Okay. The first approach that I know of was to have a device driver interface, where you have character devices and access each of the physical SPEs through one device node. But that's a rather clumsy interface: it's impractical to virtualize that (so that you can have more applications using the SPEs, than you have physical SPEs available in the system). And performance is likely to suffer.

Another attempt was to test a low-level system call interface. For that, it turned out that you need a large number of different system calls to get all the functionality that's required to have full application support.

The problem with that interface was that system calls are always hard to get into the kernel -- because it's very hard to get them right the first time. You can never change them if you get them wrong, and there were just so many system calls that we needed.

dW: So how many system calls are you ultimately adding?

Bergmann: With the SPU file system [spufs], I'm adding one new call -- and we're still discussing whether that would actually be a system call, or an ioctl, or some other method using simply read and write interfaces.

The original approach with the system call interface would have added dozens of system calls, though we don't have an exact number because it was just experimental.

dW: I do have one follow-up question on the discarding of the device driver model. Was that a hardware limitation, or -- might something along the lines of a descriptor-based DMA (DBDMA) have helped to alleviate some of the issues involved?

Bergmann: I think the biggest point against the device driver was that this is not exactly a device that you can drive, but something more substantial, more integrated into the CPU itself. So a device driver would not be the correct abstraction.

It's common in Linux to add new functionality via a virtual file system.

dW: So that's more of a Linux philosophy is what you're saying, but other operating systems which have a different approach might actually implement a different model.

Bergmann: Yes. If you were writing a kernel specifically just for the Cell processor, it might be a good choice to use a system call interface.

dW: Have you yet had a chance to do any benchmarks on it?

Bergmann: I haven't done any benchmark myself, no. And I don't know of any others who have done, so I can't tell you anything about that.

dW: But when you have -- you do have it running so that it's taking full advantage of the SPEs right now?

Bergmann: We have the file system running and we can run applications on it, yes. And we have done some demos, which were mostly based on one of the earlier models.

dW: So does it "feel" really fast? What does it feel like to work on it, and how do you like it?

Bergmann: I don't have -- we always use it over the serial console, which is -- very slow if you type, but [if you] actually try to run an application using the SPEs then suddenly, it feels really fast.


dW: But you are self-hosting it? Can you self-build?

Bergmann: We can, but we don't usually do that, because we have the development environment on our regular workstations, and then then we access the Cell machine when we need to run or debug something.

dW: Is that a planned milestone? self-hosting?

Bergmann: It would work right now -- we've tried it, and it works fine. It's just that we don't have so many machines, and we have to share them.

dW: Are the host machines PPC?

Bergmann: [laughs] Yes, we use JS20 Blades and other PPC970-based workstations.

The Linux on Cell team

dW: You mentioned earlier, working with Sony and Toshiba. Your group there in Boeblingen -- you are working on the Linux port, and also -- GDB? And OProfile? Is that right? How is the work split up, can you talk a bit about who is working on the different pieces right now, and how you work together?

Bergmann: Sony is working on the tool chain itself, like GCC and binutils. and we have someone here who's also doing some work on those. Most work on the kernel and GDB has been done by IBM in my team and the STIDC. Then we have some people doing library support here.

dW: And do all these keep in touch with e-mail and just using regular means -- or do you have any special way that you work together?

Bergmann: It's usually by e-mail and, inside of the Linux team, we use IRC on the public channels.

dW: Are there any members of the open source community outside of the three companies that are involved at this point?

Bergmann: At this point, it's very hard, but we want to go there very soon, once all of the stuff that is needed is available. For instance, the GCC patch and documentation need to be published first, because those will really be the pieces that people outside of the company need to even get started.

Open source Cell, GPL, and other licensing

dW: Speaking of the open source community, IBM very recently made an announcement that they're going to open-source the Cell specifications. There haven't been a lot of details about that. Have you been involved with that, or do you know anything more about it?

Bergmann: No, not really.

dW: Okay. But for the software that you're working on -- obviously, most of it is GPL'd, so it will be released as open source.

Bergmann: Yes.

dW: Do you know if there are any plans to dual-license that code, so that BSD (and other) developers can also take advantage of it? Or choosing a license like the one that was attached to the recent SLOF download on developerWorks?

Bergmann: As far as the kernel work goes, that will obviously all be GPL -- we don't plan to make that available under any other license. But the library parts that are used by userspace applications will need to have a more permissive license like a "BSD-like" license or the LGPL, so people can use our libraries without open-sourcing their own code.

dW: I know that the kernel does obviously have to be licensed as GPL, but is there a reason we can't dual-license it? I would imagine that some of the things that are in there would be of use to BSD developers as well?

Bergmann: Most of the code that I've done is very Linux-specific and wouldn't be useful in other architectures. And for some parts, of course, I hooked into some other code that's already licensed under the GPL, so it's hard to split those parts up from the parts that I purely wrote myself.

dW: So the spufs itself is going to be GPL'd?

Bergmann: ... GPL, yes.

dW: And that's because it is linked with already GPL'd code.

Bergmann: Yes

Programming the Cell: ISA, SPUs, and APIs

dW: I know that the Cell is based on one of IBM's PowerPC cores, but I don't know which one. Is it the 970, or is it a secret to say which one it is?

Bergmann: As far as I know, the PowerPC core in the Cell was designed specifically for this processor. It has not been used before that in anything else.

dW: Cool. But the LinuxTag paper does say that it's compatible with the 970 instruction set.

Bergmann: Yes. It uses the POWER4™ instruction set, with VMX extensions, like in the 970. There are small differences on the kernel side, but those aren't really interesting to most users.

dW: You'd be surprised how much detail readers are interested in. Do you mind going into them?

Bergmann: Not at all. Mainly, the differences are the numbers of special-purpose registers and the layout of some of the memory mapped registers. The interrupt controller is also different. And another important difference is that Cell uses SMT to run two threads on one core.

dW: How similar to the 970 is the Cell, as far as like the General Purpose Registers and the Special Purpose Registers?

Bergmann: I don't really know much about the 970, but as far as I know, it's mostly compatible.

dW: So is it still 32 General Purpose Registers?

Bergmann: Oh, yes -- sure. Those are obviously all the same.

dW: Because of the basic PowerPC spec, is that right?

Bergmann: Yes.

dW: So then, if you were to program just for the PPE part of it, you're pretty compatible across your entire PowerPC line. Is that correct?

Bergmann: Yes. Of course, you could have GCC extensions to have code optimized for one CPU or the other.

dW: Right. Or for the SPEs, or Cells?

Bergmann: For the SPE, you need a separate back end for the compiler for it. With PPE, you just need the optimizations if you want them.

dW: Do you know of any plans for a "virtual machine" -like environment or an emulator for Cell, for those who are waiting with fingers crossed to get a Cell down the line, so that they could be getting some development done now?

Bergmann: I don't think I can make any official statement about that.


dW: All right. So as far as programming for the SPEs -- IBM and I think Sony have both said publicly that the Cell is going to be very easy to program to. Does that mean that those kinds of things can remain hidden from the programmer and be optimized by the compiler without the programmer ever even having to think about it?

Bergmann: No, not really. There's some possibility that you can have libraries that hide the interfaces. For example, you could have a library doing Fourier transformations or doing some MPEG encoding/decoding that's using either the PowerPC code or the SPU code, and the user would just call the library interfaces. But someone has to write those libraries as well, and those people have to think about how to map the code onto the SPE.

dW: Is there a chance that the API for those library codes might be something like the Message Passing Interface (MPI), or something that is already an industry standard?

Bergmann: Yes. We are interested in making it a standard interface -- we always try to stick to already well-established interfaces when we can. For instance, the API they are using to create threads on one SPE is very similar to the pthreads API. And yes, there has been some discussion on having MPI ported to the SPEs themselves.

More on Linux: 32-bit kernels, 32-bit apps, and kernel binaries

dW: Now, going back again to the LinuxTag paper, it said that there won't be support for 32-bit Linux -- but what about 32-bit applications? Will it support those in the same way as the 970 does?

Bergmann: You can always have applications in both 32- and 64-bit. The only restriction is that you cannot have a 32-bit Linux kernel running on the Cell processor. The kernel is always 64-bit, but you can use all the 32-bit applications.

dW: And eventually -- this work that you're doing will eventually merge back into the main kernel, right?

Bergmann: Yes. I'm hoping for merging this into the next kernel release -- the 2.6.13 kernel release, because 2.6.12 is already out now.

dW: So that's really soon, actually.

Bergmann: Yes, but -- this includes only the architectural support for running Linux on our boards. It doesn't include the SPU file system, because there needs to be some more discussion about that.

dW: All right. And ultimately, once everything is merged in, in a few months or a year -- do these kinds of architecture-specific code patches and contributions end up having any impact on the kernel as a whole, or on the Linux ... project as a whole?

Bergmann: I think the only impact will be the size. We add some code, which, of course, contributes to the size of the software tree. And if you enable the code into ... your kernel binary, for example -- the code is designed to let you have a single kernel binary that can run on a pSeries® and on Power Macintosh and Cell processor. So enabling one more architecture will increase the size of the kernel binary and eat up some small fraction of the system memory.

dW: So in an embedded application, you are unlikely to run a universal kernel binary like that, correct?

Bergmann: Exactly. You would only enable the Cell part in that.

dW: And disable the other parts, and build specifically for the Cell in a memory-constrained environment.

Bergmann: Yes. But all the distributions can just enable the Cell part for the kernel, and then it will run on all PowerPC 64-bit machines.

dW: All right. Do you have any favorite articles or resources about Cell that you'd recommend to readers?

Bergmann: Not a specific one. There was some pretty good media coverage about the ISSCC conference and E3 conference, and I think those are usually pretty well-informed compared to what else is there on the Internet.

dW: Thank you so much for taking the time to join us today -- we really appreciate it.

Bergmann: Thank you.

Next time, Meet the Experts will talk with Segher Boessenkool, the author of SLOF (Slimline Open Firmware). Please send questions you have for Segher to the developerWorks Power Architecture editors. We'll include those in the next interview, or -- if you know of someone you'd like to see profiled, or if you have questions on another Power Architecture-related topic, please feel free to send those as well, and we will try to line up the right person or people to answer them in a future Meet the experts.



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into developerWorks

Zone=Multicore acceleration
ArticleTitle=Meet the experts: Arnd Bergmann on the Cell BE processor