Skip to main content

Standards and specs: Chip interconnects: When 133 MBps is too slow

Without high-speed interconnects, moving data between components can be quite a chore

Peter Seebach, Freelance author, Plethora.net
Peter Seebach
Peter Seebach is still using a heavily asymmetrical interconnect protocol based on downstream VGA and upstream PS/2 packets. Please contribute to his Infiniband keyboard fund.

Summary:  New interconnect protocols and standards offer a variety of options in price and performance. Peter Seebach looks at four new protocols and ponders why we have more than one.

View more content in this series

Date:  24 Apr 2006
Level:  Introductory

Activity:  2590 views
Comments:  

It's all very nice to have incredibly fast processors, ludicrous amounts of disk space, and so much RAM that a comparable amount of magnetic core would outweigh the planet. However, if you can't move data from the CPU to memory or from memory to your disk controller, a blown-out system doesn't do you a lot of good.

In the world of chip interconnects, some degree of standardization is absolutely essential -- you obviously can't function without a way to connect devices together. And as I've demonstrated in previous columns, usually a few standardized methods are better for the developer, consumer, and the manufacturer than an uncountable host of competing, proprietary protocols.

As computer speeds and bandwidth needs have doubled, tripled, and more, the need for faster data transfer has led to a great deal of work in this field. Unfortunately for nearly everybody, a huge amount of that work has been proprietary. Still, some of the designs are becoming standardized now, and this is leading to a dramatic reduction in costs to developers and users alike. Five years ago, many hardware developers felt the need to craft custom interconnects. Now it seems like a silly task; the options are multiple.

The current four big contenders are (in alphabetical order):

  • HyperTransport (HT)
  • Infiniband (IBA)
  • PCI Express (PCIe)
  • RapidIO.

This article examines the ways in which these standards are competing, cooperating, and interoperating. Note that all of these are trade association standards first and foremost; they are based on the needs and technical input of companies which have invested in membership in industry consortiums. It is particularly interesting to note that some companies are involved with more than one of these standards.

What are we talking about, anyway?

All of these standards provide architectures for moving data from one chip to another -- for instance, from memory to CPU and back, or from a CPU to a video card. In essence, these standards solve the same kinds of problems that PCI solved (see Resources for the Standards and specs column on PCI that provides some history on that).

These specifications can cover both software and hardware layers. HyperTransport's FAQ (which, I must admit, struck me as perhaps slightly influenced by marketing decisions) points out that HyperTransport is software-compatible with PCI, even though its other technical specifications are, to put it lightly, a bit different.

All of these interconnect specifications offer much larger bandwidth than traditional PCI (and PCI Extended, or PCI-X). The canonical bandwidth of PCI is 133MBps (33MHz and 32-bit wide transfers gives you four bytes per cycle for about 133MBps). Bandwidth numbers are hard to calculate, harder to measure, and still harder to get any useful information out of. Someone (it is hotly debated -- I think it was Mark Twain) once said that there are "lies, damn lies, and statistics;" benchmarks have generally been considered in their own class, making conventional statistics seem reliable and informative. In practice, you may reasonably assume that all four are "much faster than PCI."

The PCI specification was a parallel specification; everyone shared a single 32-bit wide bus. HyperTransport is a parallel interface, running between two and 32 bits wide at clock speeds between 200 and 1400MHz. Data can move on both sides of a clock, allowing 2.8 billion transfers per second.

By contrast, Infiniband, RapidIO, and PCI Express are serial interfaces, although some implementations might use multiple serial channels. You might think that a six-bit parallel interface and half a dozen single-bit serial interfaces are pretty much interchangeable, but there are key differences. One is that with the serial interface each signal contains its own clock. It doesn't matter if two adjacent serial channels have slightly different speeds or even wildly different speeds -- they are not required to be in sync with each other.

In all of these specs, commands and data are somewhat mixed together, so the theoretical bandwidth of the bus is higher than the practical available bandwidth for moving data. Although a HyperTransport bus can, in principle, move 22GBps, some significant portion of that data will be meta data like commands or addresses rather than the actual bits being moved. PCI Express and Infiniband use 8-/10-bit encoding to keep signal clocks clear, but this has the side effect of imposing some overhead; out of 10GB of data transmitted over a PCI Express bus, 2GB will be protocol overhead. RapidIO also uses 8-/10-bit encoding for its serial implementation.

Nominal top performance for Infiniband is 120GBps at 12 lines and quad data rate. Most systems today are probably using 4X single or double data rates, giving a more sedate 10-20GBps of raw bandwidth or 8-16GBps of real data transfer. PCI Express uses single-bit bidirectional "lanes," each of which uses two wires each way and can support anywhere from 1 to 32 "lanes." So, in principle, an x16 slot (the kind used for most video cards) supports 80Gbps of raw data transfer, or about 4GBps each way -- 40Gbps each way and about 10 bits of raw data per byte of real data. Future versions of PCIe might support signaling rates of 5 or 10GHz.

The other specification that matters is latency. Latency is, of course, even harder to give a concrete single-number measurement to. The latency of fetching a single byte might be much lower than the latency of fetching a full 64-bit word, or they might be about the same. This is one area where HyperTransport seems to have an edge with its greater capability for parallelism.

In all cases, information tends to need to be chunked some. No matter what width of HyperTransport bus you're using, packets come in 32-bit chunks; that takes eight transfers over a 4-bit bus, but only one transfer on a 32-bit bus. With serial interfaces, data must be aggregated over a handful of cycles. On the other hand, serial interfaces might be able to manage higher clock speeds.


What are these used for?

Interconnects get used all over the place. For instance, the connection between the northbridge chip (which handles CPU, memory, and high-speed bus interfaces, such as AGP or PCIe) and the southbridge chip (which handles devices such as a regular PCI bus and legacy system devices such as non-volatile RAM, real-time clock, and so forth) might be built using HyperTransport, even if the rest of the system doesn't use it. Bridges from one interface to another are not uncommon; in general, they have little effect on potential bandwidth, but a significant effect on latency. However, latency can have a huge impact on effective bandwidth if a protocol requires a great deal of back-and-forth communication.

Not all systems need a northbridge chip anymore; some AMD64 chips have an integrated memory controller, dramatically reducing latency of memory communications; and PCI Express systems have a lot less reason for a dedicated AGP controller providing "fast" access to the CPU and main memory.

Infiniband has some focus on longer connections. While HyperTransport and PCIe connections are typically found within a motherboard, Infiniband is used for connecting external peripherals as well. For instance, multiple machines could share access to a disk server device using Infiniband (curiously reminiscent of college computer labs connecting several computers to a single SCSI disk, back when it was too expensive to have multiple hard disks!). It's also useful for improving the performance of clusters, giving machines a communications protocol with better bandwidth and latency than Ethernet. HyperTransport has been used for CPU-to-CPU communications, especially on multiprocessor AMD systems. By contrast, PCI Express and RapidIO are more focused on CPU-to-I/O device connections.


Protocol and physical layer

As is so often the case, protocols have layers. A system using HyperTransport might well use it to connect a PCI bus to a CPU. A system using PCI Express is very, very likely to have a regular PCI bus available for compatibility with old hardware. The overwhelming majority of "PCI Express" products you can buy in stores today are video cards targeted at systems that have a single PCI Express slot, so many systems offer only a single PCIe slot, but several old-fashioned PCI slots. (Annoyingly, they are often just 33MHz/32 bits, not even offering the enhancements of higher clock speeds or data widths.)

Protocol can matter a lot. The difference between performance in streaming gigabytes of writes to a memory or disk device, and performance with a more elaborate protocol, such as USB, can be stunning. It is often hard to explain why we need such huge bus bandwidth to get the full performance of devices with much lower-rated raw performance.

Tunneling -- using one protocol to move packets for another protocol -- likewise increases overhead. But the benefits can be significant since tunneling allows multiple interconnects to be used together on a system. In general, these interconnects are intended to provide transparent access to existing protocols. For instance, if you have a PCI bridge attached to a newer interconnect architecture, existing PCI driver code should work on it without any modifications.


Who's winning

I can hardly talk about multiple specifications without putting in a section on this, but really, the answer is that developers are the clear winners. These architectures coexist nicely. A machine might use HyperTransport to connect CPU and memory and also have a PCI Express video card. Protocols can be tunneled; in fact there's even a protocol (HyperTunnel) for tunneling HyperTransport packets over Infiniband hardware.

People who want to build or use hardware are winning. Multiple competing specs exist for moving gigabytes of data around. What would have been a design challenge for a team of engineers with a large budget a few years back is now a question of deciding which off-the-shelf product to use.

As much as I hate to say it, it might be for the best that there are multiple standards in this field. While bridges between interconnects have some performance impact, it's quite clear from the continued investment in each of these standards that they are meeting real needs.

HyperTransport's low latency for very short distances serves a need, but so does Infiniband's ability to run a cable more than a foot long. Interoperability between these standards seems to be unusually good. You might speculate on two likely causes of this:

  • First, these specifications are necessarily being worked on by people with an interest in interconnection and interoperability.
  • Second, many of the same players are active on two or more of these standards.

Doing research for this turned up various other articles talking about who was "winning." I don't think any two of these articles gave the same list. This provides a basis for an alternative strategy when you are trying to decide which of two competing standards to go with: Get involved with both, make sure they can talk to each other, and lower your risks.

As Andrew Tanenbaum once said, "The wonderful thing about standards is that there's so many of them to choose from."


Resources

Learn

Get products and technologies

Discuss

  • Need to find out about how these specs can affect your PowerPC® processor project? Post your query on a developerWorks forum.

About the author

Peter Seebach

Peter Seebach is still using a heavily asymmetrical interconnect protocol based on downstream VGA and upstream PS/2 packets. Please contribute to his Infiniband keyboard fund.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=109320
ArticleTitle=Standards and specs: Chip interconnects: When 133 MBps is too slow
publish-date=04242006
author1-email=developerworks@seebs.plethora.net
author1-email-cc=dwpower@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers