The Cell BE processor has been hailed as revolutionary. It is often described as "many processors in one without being multicore." This is more or less accurate. It really is just one PowerPC® core, surrounded (for now) by eight parallel "Synergistic" Processing Elements (SPEs, also commonly referred to as Synergistic Processing Units, or SPUs, though the SPU is actually a part of an SPE, and the terms aren't really interchangeable). The fundamental difference between the Cell BE processor and dual or multi-core is that on the Cell BE processor, only the PowerPC core runs the operating system: the SPEs run (usually highly vectorized) application code.
developerWorks caught up with Arnd Bergmann, the IBM® Linux™ on the Cell BE processor kernel maintainer, to talk about the port, about the Cell BE processor and Cell BE-based "workstations" (which aren't workstations at all) and about programming to Cell BE -- among other things.
developerWorks: Can we start with a little bit about you? Can you describe the team that you're working with, and how you came to be working on this Cell port?
Arnd Bergmann: I'm in a team that is part of the LTC [Linux Technology Center] organization in IBM. The LTC does most of the Linux-related work within IBM. Most of the team that's working on Cell now started out working on the zSeries® Linux port. We have some other people joining us that had been on some other software projects in IBM and we founded this little group that's doing Linux on Cell work. And then we also work together with some other groups here in the lab in Böblingen and Mainz -- they are doing the firmware and hardware development.
dW: So you more or less self-organized? And volunteered yourselves to work on Cell mainly because you wanted to?
Bergmann: Well, I was offered the chance to start working on Cell, and I thought it sounded like a great idea.
dW: And before that, you had been working on zSeries -- is there any similarity between Cell and the mainframes?
Bergmann: No, it's quite different. I mean, Linux kernel work is basically the same on all architectures, be it zSeries, or Cell, or others. But each of these two has things that make them very different from all of the other architectures. They don't really have much in common.
dW: Could you talk about some of the things that do make Cell so different? Many of our readers will have been keeping up with Cell news, but I think there are a lot of people who haven't been, necessarily. How would you describe Cell to somebody who hasn't been following along?
Bergmann: Well, the Cell processor on the surface is just a regular PowerPC processor like others that we have seen. But there is one thing that makes it very special, which is the fact that it actually contains multiple small processors on one processor die -- we now have eight of those so-called synergistic processing elements [SPEs], which are very high-performance, optimized all-arithmetic elements -- that act as stand-alone processors and can talk to each other very fast. They are all orchestrated by one core processor, which is actually comparable to a PowerPC running a normal operating system like Linux. The eight SPEs are not running Linux kernels, just application code.
dW: So the SPEs are the main thing that make it so revolutionary.
dW: And they also are what make it so good for game programming, and for scientific applications?
Bergmann: Yes. In games, it's mostly about computational work doing vector operations: floating point operations and the instruction set that is used on the SPUs are optimized for vector operations. So every single instruction that is executed there actually behaves like a vector operation and can always work on a 128-bit word in one cycle, or in one instruction. In the 128 bits, for example, you could have four single-precision floating point operations at the same time that are done by one processor instruction.
dW: How close to VMX instructions are the SPU instructions?
Bergmann: Some of the instructions are very similar; and the SPUs have some additional instructions that go far beyond what you can do with VMX. And there are a small number of instructions in VMX that don't have a counterpart on Cell. The biggest difference is that the SPUs don't actually use PowerPC instructions. The instruction format itself is different, even though the assembler or the C intrinsics might be very similar.
dW: So, are VMX opcodes and SPU opcodes the same? Will VMX optimized code have to be rewritten to take advantage of SPU?
Bergmann: Unlike VMX code, the SPU has a complete instruction set to execute self-contained binaries. The opcodes that are used are very different on the assembly level but can be used in similar ways from high-level languages.
dW: So then the compilers have to be rewritten to include the opcodes for the SPUs, is that correct?
Bergmann: Yes. We have a special GCC port, and you can have other compilers as well that are targeted at the SPU, instead of targeted at the host instruction set. So if you want to compile the program for the SPU, you need to use this cross-compiler.
dW: It has been said that Cell is supposed to scale to -- everything from very tiny devices, up to huge supercomputers. What about right in the middle? Is there any point at all in having a desktop Cell machine?
Bergmann: For the regular desktop user, I think there will not be much difference, because you get all the benefits only if you write your applications specifically for the Cell processor. That's usually done either if you have just a very limited number of applications on the machine, like you have on embedded systems, [where you want] for instance to have MPEG encoding/decoding right there -- or in the supercomputer world, where you write your applications to run on the computer once.
dW: So all of the people who are hoping for a Cell computer at home are probably going to be disappointed.
Bergmann: Well, if they are programmers, they could of course write their own applications. And we might see some applications that are optimized for Cell. For example, you could have video editing software that's optimized for Cell, or some image manipulation. If you just want to run standard applications, it probably will not help you a lot.
dW: Well, but it wouldn't harm you either, would it? To run workaday applications like e-mail or Web browsing -- there wouldn't be any performance penalty, would there be?
Bergmann: No -- you'd just spend much more money for the processor itself, I guess.
dW: So for ideal use, you would really want it for stuff like fluid dynamics or plasma dynamics, where you're doing millions of -- no pun intended -- particle in cell calculations, where you're wanting to highly vectorize everything. Is that correct?
dW: And one news report I saw said that you would be premiering a Cell-based workstation at LinuxTag. Can you describe what it is people will be seeing at LinuxTag?
Bergmann: I guess there was some misunderstanding. It isn't so much a "workstation" as a Cell-based Blade server -- a prototype board. The same, in fact, as the one that was shown at the E3 conference.
dW: Yes. I saw pictures of that. So it's not really -- other than being Cell-based and running Linux, it's not really related to the workstation that's coming out later this year.
Bergmann: I don't know if there are any plans about the workstation itself -- that might be the same misunderstanding.
dW: Oh. Well, I know the early articles about it said that there was going to be a workstation for game developers -- and I know that in the case of games, sometimes a board is called a workstation... So it sort of sounds like that might be it. Do you know anything about plans of IBM offering Cell-based machines other than the Blade?
Bergmann: No I don't know anything about other product plans, sorry.
dW: But we will be having a Cell-based Blade, if nothing else?
Bergmann: Well, this is the one thing that I'm working on, and I don't know about the availability for customers at all.
What we are working on now is really a technology study. And people who are interested in that can ask the IBM Engineering and Technology Services [ET&S] about how to evaluate this technology for their purposes. But as far as I know, there will not be a Cell-based IBM server product.
dW: Really? Do you have any idea why that is?
dW: Would you say -- in your opinion, do you think we should?
Bergmann: I think there is a huge market and huge demand for those, yes.
dW: Okay... So, well -- on to the kernel. Can you tell us a bit about your work porting the Linux kernel to Cell?
Bergmann: Yes. The work that we have done is comprised mainly of three parts.
The first part is getting Linux to run at all on the Cell processor, and on the board that we have -- which means we had to add some parts to the PowerPC architecture code, for example, for the CPU ID and interrupt controller to work with the basic architecture support.
Most of the groundwork for that was done by the Sony/Toshiba/IBM Design Center (STIDC) in Austin, Texas. I've started to make this work more compatible with the standard kernel, so that it fits in a way that gets integrated into current 2.6.13 or what have you.
The second part is the device drivers for our actual prototype port, so we have an Ethernet controller and some other hardware that needs device drivers. That work was mostly done by people here on my team, but not by myself.
The third part is the exploitation of the SPEs themselves. That's where the SPU file system that has been described already (see Resources) comes in, and that work is still ongoing. There were some earlier attempts at a model, but it looks like the SPU file system is the final model now.
dW: Can you describe some of the alternatives that you went through and rejected?
Bergmann: Okay. The first approach that I know of was to have a device driver interface, where you have character devices and access each of the physical SPEs through one device node. But that's a rather clumsy interface: it's impractical to virtualize that (so that you can have more applications using the SPEs, than you have physical SPEs available in the system). And performance is likely to suffer.
Another attempt was to test a low-level system call interface. For that, it turned out that you need a large number of different system calls to get all the functionality that's required to have full application support.
The problem with that interface was that system calls are always hard to get into the kernel -- because it's very hard to get them right the first time. You can never change them if you get them wrong, and there were just so many system calls that we needed.
dW: So how many system calls are you ultimately adding?
Bergmann: With the SPU file system [spufs], I'm adding one new call -- and we're still discussing whether that would actually be a system call, or an ioctl, or some other method using simply read and write interfaces.
The original approach with the system call interface would have added dozens of system calls, though we don't have an exact number because it was just experimental.
dW: I do have one follow-up question on the discarding of the device driver model. Was that a hardware limitation, or -- might something along the lines of a descriptor-based DMA (DBDMA) have helped to alleviate some of the issues involved?
Bergmann: I think the biggest point against the device driver was that this is not exactly a device that you can drive, but something more substantial, more integrated into the CPU itself. So a device driver would not be the correct abstraction.
It's common in Linux to add new functionality via a virtual file system.
dW: So that's more of a Linux philosophy is what you're saying, but other operating systems which have a different approach might actually implement a different model.
Bergmann: Yes. If you were writing a kernel specifically just for the Cell processor, it might be a good choice to use a system call interface.
dW: Have you yet had a chance to do any benchmarks on it?
Bergmann: I haven't done any benchmark myself, no. And I don't know of any others who have done, so I can't tell you anything about that.
dW: But when you have -- you do have it running so that it's taking full advantage of the SPEs right now?
Bergmann: We have the file system running and we can run applications on it, yes. And we have done some demos, which were mostly based on one of the earlier models.
dW: So does it "feel" really fast? What does it feel like to work on it, and how do you like it?
Bergmann: I don't have -- we always use it over the serial console, which is -- very slow if you type, but [if you] actually try to run an application using the SPEs then suddenly, it feels really fast.
dW: But you are self-hosting it? Can you self-build?
Bergmann: We can, but we don't usually do that, because we have the development environment on our regular workstations, and then then we access the Cell machine when we need to run or debug something.
dW: Is that a planned milestone? self-hosting?
Bergmann: It would work right now -- we've tried it, and it works fine. It's just that we don't have so many machines, and we have to share them.
dW: Are the host machines PPC?
Bergmann: [laughs] Yes, we use JS20 Blades and other PPC970-based workstations.
dW: You mentioned earlier, working with Sony and Toshiba. Your group there in Boeblingen -- you are working on the Linux port, and also -- GDB? And OProfile? Is that right? How is the work split up, can you talk a bit about who is working on the different pieces right now, and how you work together?
Bergmann: Sony is working on the tool chain itself, like GCC and binutils. and we have someone here who's also doing some work on those. Most work on the kernel and GDB has been done by IBM in my team and the STIDC. Then we have some people doing library support here.
dW: And do all these keep in touch with e-mail and just using regular means -- or do you have any special way that you work together?
Bergmann: It's usually by e-mail and, inside of the Linux team, we use IRC on the public channels.
dW: Are there any members of the open source community outside of the three companies that are involved at this point?
Bergmann: At this point, it's very hard, but we want to go there very soon, once all of the stuff that is needed is available. For instance, the GCC patch and documentation need to be published first, because those will really be the pieces that people outside of the company need to even get started.
dW: Speaking of the open source community, IBM very recently made an announcement that they're going to open-source the Cell specifications. There haven't been a lot of details about that. Have you been involved with that, or do you know anything more about it?
Bergmann: No, not really.
dW: Okay. But for the software that you're working on -- obviously, most of it is GPL'd, so it will be released as open source.
dW: Do you know if there are any plans to dual-license that code, so that BSD (and other) developers can also take advantage of it? Or choosing a license like the one that was attached to the recent SLOF download on developerWorks?
Bergmann: As far as the kernel work goes, that will obviously all be GPL -- we don't plan to make that available under any other license. But the library parts that are used by userspace applications will need to have a more permissive license like a "BSD-like" license or the LGPL, so people can use our libraries without open-sourcing their own code.
dW: I know that the kernel does obviously have to be licensed as GPL, but is there a reason we can't dual-license it? I would imagine that some of the things that are in there would be of use to BSD developers as well?
Bergmann: Most of the code that I've done is very Linux-specific and wouldn't be useful in other architectures. And for some parts, of course, I hooked into some other code that's already licensed under the GPL, so it's hard to split those parts up from the parts that I purely wrote myself.
dW: So the spufs itself is going to be GPL'd?
Bergmann: ... GPL, yes.
dW: And that's because it is linked with already GPL'd code.
dW: I know that the Cell is based on one of IBM's PowerPC cores, but I don't know which one. Is it the 970, or is it a secret to say which one it is?
Bergmann: As far as I know, the PowerPC core in the Cell was designed specifically for this processor. It has not been used before that in anything else.
dW: Cool. But the LinuxTag paper does say that it's compatible with the 970 instruction set.
Bergmann: Yes. It uses the POWER4™ instruction set, with VMX extensions, like in the 970. There are small differences on the kernel side, but those aren't really interesting to most users.
dW: You'd be surprised how much detail readers are interested in. Do you mind going into them?
Bergmann: Not at all. Mainly, the differences are the numbers of special-purpose registers and the layout of some of the memory mapped registers. The interrupt controller is also different. And another important difference is that Cell uses SMT to run two threads on one core.
dW: How similar to the 970 is the Cell, as far as like the General Purpose Registers and the Special Purpose Registers?
Bergmann: I don't really know much about the 970, but as far as I know, it's mostly compatible.
dW: So is it still 32 General Purpose Registers?
Bergmann: Oh, yes -- sure. Those are obviously all the same.
dW: Because of the basic PowerPC spec, is that right?
dW: So then, if you were to program just for the PPE part of it, you're pretty compatible across your entire PowerPC line. Is that correct?
Bergmann: Yes. Of course, you could have GCC extensions to have code optimized for one CPU or the other.
dW: Right. Or for the SPEs, or Cells?
Bergmann: For the SPE, you need a separate back end for the compiler for it. With PPE, you just need the optimizations if you want them.
dW: Do you know of any plans for a "virtual machine" -like environment or an emulator for Cell, for those who are waiting with fingers crossed to get a Cell down the line, so that they could be getting some development done now?
Bergmann: I don't think I can make any official statement about that.
dW: All right. So as far as programming for the SPEs -- IBM and I think Sony have both said publicly that the Cell is going to be very easy to program to. Does that mean that those kinds of things can remain hidden from the programmer and be optimized by the compiler without the programmer ever even having to think about it?
Bergmann: No, not really. There's some possibility that you can have libraries that hide the interfaces. For example, you could have a library doing Fourier transformations or doing some MPEG encoding/decoding that's using either the PowerPC code or the SPU code, and the user would just call the library interfaces. But someone has to write those libraries as well, and those people have to think about how to map the code onto the SPE.
dW: Is there a chance that the API for those library codes might be something like the Message Passing Interface (MPI), or something that is already an industry standard?
Bergmann: Yes. We are interested in making it a standard interface -- we always try to stick to already well-established interfaces when we can. For instance, the API they are using to create threads on one SPE is very similar to the pthreads API. And yes, there has been some discussion on having MPI ported to the SPEs themselves.
dW: Now, going back again to the LinuxTag paper, it said that there won't be support for 32-bit Linux -- but what about 32-bit applications? Will it support those in the same way as the 970 does?
Bergmann: You can always have applications in both 32- and 64-bit. The only restriction is that you cannot have a 32-bit Linux kernel running on the Cell processor. The kernel is always 64-bit, but you can use all the 32-bit applications.
dW: And eventually -- this work that you're doing will eventually merge back into the main kernel, right?
Bergmann: Yes. I'm hoping for merging this into the next kernel release -- the 2.6.13 kernel release, because 2.6.12 is already out now.
dW: So that's really soon, actually.
Bergmann: Yes, but -- this includes only the architectural support for running Linux on our boards. It doesn't include the SPU file system, because there needs to be some more discussion about that.
dW: All right. And ultimately, once everything is merged in, in a few months or a year -- do these kinds of architecture-specific code patches and contributions end up having any impact on the kernel as a whole, or on the Linux ... project as a whole?
Bergmann: I think the only impact will be the size. We add some code, which, of course, contributes to the size of the software tree. And if you enable the code into ... your kernel binary, for example -- the code is designed to let you have a single kernel binary that can run on a pSeries® and on Power Macintosh and Cell processor. So enabling one more architecture will increase the size of the kernel binary and eat up some small fraction of the system memory.
dW: So in an embedded application, you are unlikely to run a universal kernel binary like that, correct?
Bergmann: Exactly. You would only enable the Cell part in that.
dW: And disable the other parts, and build specifically for the Cell in a memory-constrained environment.
Bergmann: Yes. But all the distributions can just enable the Cell part for the kernel, and then it will run on all PowerPC 64-bit machines.
dW: All right. Do you have any favorite articles or resources about Cell that you'd recommend to readers?
Bergmann: Not a specific one. There was some pretty good media coverage about the ISSCC conference and E3 conference, and I think those are usually pretty well-informed compared to what else is there on the Internet.
dW: Thank you so much for taking the time to join us today -- we really appreciate it.
Bergmann: Thank you.
Next time, Meet the Experts will talk with Segher Boessenkool, the author of SLOF (Slimline Open Firmware). Please send questions you have for Segher to the developerWorks Power Architecture editors. We'll include those in the next interview, or -- if you know of someone you'd like to see profiled, or if you have questions on another Power Architecture-related topic, please feel free to send those as well, and we will try to line up the right person or people to answer them in a future Meet the experts.
The LTC doesn't currently have a home page on the IBM Web site, but you can
learn more about the group that does most of the Linux within IBM in IBM:
LTC core to Linux development (LinuxWorld, 2004).
Arnd has written a paper for presentation at this week's LinuxTag 2005
that describes the spufs SPU file system and the Linux for Cell
programming model in detail. A slightly expanded version of that paper has also been posted here on developerWorks.
Arnd says that some of the best coverage on Cell was published just after
the ISSCC and E3 conferences: articles like ISSCC
2005: The CELL Microprocessor and Jon "Hannibal" Stokes' two-part Introducing
the Cell Processor series at Ars Technica. The ISSCC papers
themselves, as well as two Microprocessor Report studies, have been
publicly posted to the Cell
section of the IBM Microelectronics Technical Library.
The 2005 E3 conference saw the launch or media debut of three new game
consoles from Sony, Nintendo, and Microsoft® -- all of which are powered by
Power Architecture™ processors (The Sony PlayStation3 by Cell itself). Find
of E3 Power-related coverage and related stories in the June 8 issue
of the developerWorks Power Architecture Community (PAC) Newsletter
See a picture
of the prototype Cell board -- affectionately known in some circles as
"frisbee" -- as demonstrated at the 2005 E3 conference.
Find more good links on Cell -- and a detailed block diagram of the Cell
processor -- at the IBM
Research CELL Architecture pages.
The 2005 IEEE paper, Power
Efficient Processor Architecture and The Cell Processor authored by
H. Peter Hofstee of the IBM Server & Technology Group (and one of Cell's
architects) is available in PDF.
entry on the Cell processor says that the Cell resembles a modern
desktop computer on a single chip. Read the history, an explanation of
SPEs and SPUs, and find more good
Sony this year published a paper on Programming
CELL. An outstanding resource, it also presents some alternative
models for programming Cell. In particular, slide 24 offers a good
overview of Cell programming models that are explored in subsequent
The next Meet the Experts will speak with Segher Boessenkool, the author
(Slimline Open Firmware).
- Have experience you'd be willing to share with Power Architecture zone
readers? Article submissions on all aspects of Power Architecture technology from authors inside and outside
IBM are welcomed. Check out the Power Architecture author
FAQ to learn more.
- Have a question or comment on this story, or
on Power Architecture technology in general?
Post it in the Power Architecture technical forum
or send in a letter to the editors.
- The Power Architecture Community Newsletter includes full-length articles as well as recent news about members of the Power Architecture community and upcoming events of interest.
about the Power Architecture Community Newsletter and how to contribute to it. Subscription is free.
- All things Power-related are chronicled in the developerWorks Power
Architecture editors' blog, which is just one of many developerWorks
- Find more articles and resources on Power Architecture
technology and all things
related in the developerWorks Power
Architecture technology zone.
- Download a IBM PowerPC 405 Evaluation Kit to demo a SoC in a simulated
environment, or just to explore the fully licensed version of
Power Architecture technology. This and other fine Power Architecture-related downloads are listed in
the developerWorks Power Architecture technology zone's downloads section.
The developerWorks Power Architecture editors welcome your comments on this article. E-mail them at firstname.lastname@example.org.