 | Level: Introductory Sam Siewert (Sam.Siewert@Colorado.edu), Adjunct Professor, University of Colorado
05 May 2006 Since its emergence about a decade ago, the SoC (system-on-a-chip) architecture has become the underlying architecture for many embedded systems and scalable supercomputers and is starting to find its way into general purpose computing as well. The SoC embodies what many believe to be the ultimate level of integration: an entire system on one chip. Moore's law and higher levels of integration made the SoC inevitable, but can this continue? And what's next? This article takes a step back to gain perspective on the SoC and to see where it is going in the future. Perhaps the more important question is: where should the highest level of integration be, and what will it enable 25 years from today?
SoCs (systems-on-chips) as configurable application-specific integrated circuits (ASICs) and reconfigurable computing platforms have become common in less than a decade. The definition of an SoC is simply a chip where an entire system is designed into a single ASIC. The emergence of the SoC clearly follows from
Moore's law, where larger and larger scales of integration continue to drive printed circuit boards to become chipsets and, finally, individual chips.
In essence, the SoC is the realization of the ultimate level of integration: everything on one chip.
Likewise, because the SoC is the ultimate level of integration, it is also the ultimate
embedded system -- or is it? About twenty-five years after Robert Noyce and Jack Kilby invented the integrated circuit, Gene Amdahl envisioned whole systems on a wafer (called wafer-scale integration), which
suffered from fabrication complexities, and never came to fruition. The SoC eventually realized Amdahl's goal, but at an even greater density. Now, twenty-five years after Amdahl envisioned a system
on a wafer, the SoC has found its way into everything from cell phones to supercomputers.
This article is the seventh in the SoC drawer series. The series aims to help architects better understand SoCs, design methods, and -- in this article -- to consider future directions for SoCs.
The first article in this series introduced the concept of a resource cube. The resource cube is important because
it defines the design trade-off space between hardware and software function allocation and defines more fundamentally the scale of integration that an SoC embodies.
From a theoretical viewpoint, if one SoC features more processing throughput, more I/O bandwidth, and more volatile and non-volatile memory than another, that indicates larger-scale integration and more efficient use of integrated circuit
resources. Looking forward, emergent architectures will no longer boast about raw resources such as gates, flops, and cells as they once did; rather, their selling points will be presented in terms of function, resources, performance, and overall efficiency. The right levels of resources, scalability, and reconfigurability are being recognized as more important than sheer numbers. Hardware designs that simply
pack in more processor, memory, and I/O resources for software, but with less efficiency, less determinism, and poor balance are hitting limits of diminishing utility. This observation has prompted
research into technologies like PIM (processor in memory) and the IBM Cell Broadband Engine™ (Cell BE) architecture.
Going forward, hardware and software are necessarily becoming convergent and fully concurrent endeavors, to the extent that
a casual observer of a configurable or reconfigurable SoC project might not be able to tell the software and hardware engineers apart. Hardware engineers now spend countless hours with simulators, designing and
verifying designs for field-programmable gate array (FPGA) bitstream and ASIC logic synthesis. Software engineers (who are really more of the firmware bent) are often quite involved with post-silicon hardware verification, system testing, and early integration with
hardware emulators. For an SoC, there's far less to probe and white-wire on a board, anyway; all verification and debug must be done in simulation with software test benches and on-chip built-in test devices.
Recognizing
the ways in which the SoC has already significantly altered engineering process, this article looks forward to extensions of SoC architecture, emergent new device technologies, and new architectures that threaten
to break the Von Neumann bottleneck. It presents a survey of what appear today to be promising and important technologies, along with some thought-provoking and hopefully entertaining ideas for how breakthroughs
might enable fantastic new killer applications.
Hardware/software convergence
Early hardware and software process pioneers advocated concurrent design, virtual machines, emulators, and simulators for years; but, for the most part,
the development of systems incorporating hardware, firmware, and software was stratified and serialized. The reasons for this were related to the tools available, engineering culture, and the logistics of overlapping design,
implementation, and test activities. Concurrent design was rarely realized early on in completely novel
projects, though it was often put into practice as a product family was revised. Figure 1 depicts the typical scenario for system development, especially for embedded systems, that persisted in many (though not all) organizations into the 1980s.
Figure 1. Hardware, firmware, software serial, and stratified process
In the 1990s, emphasis shifted to hardware/software codesign and, eventually, codevelopment, simulation, and concurrent testing and verification.
This was greatly facilitated by the advent of electronic design automation (EDA) and the use of emulators and simulators to support early firmware and software
development before physical hardware was available for testing. Figure 2 shows the overlap and speed-up that was enabled by EDA and concurrent engineering practices used to enable
software development long before hardware becomes available.
Figure 2. Hardware and software concurrent engineering
 |
The Rosetta Stone for software and hardware languages
SystemC, Impulse C, and SRC Carte C all provide the ability to compile and synthesize traditional procedural
programming languages like C, C++, and Fortran into bitstreams suitable for download and execution on a reconfigurable computer that
includes FPGA fabric and hard or soft processor cores. Essentially, these tools compile and synthesize the code, extracting inherent parallelism in non-timed sequential code and providing automated allocation
of functionality to concurrent hardware resources or traditional Von Neumann processing, or both. These compilers for FPGA-reconfigurable
computing platforms take the concept of hardware/software codesign to a new level of iteration and optimization in function allocation.
Further development of these tools and integrated languages will blur the line between hardware and software development.
|
|
With the emergence of SoC architectures, the need to coordinate hardware and software engineering has grown along with market pressures to increase the degree of concurrency
so as to get products to market quickly; projects now often go from concept to a beta version in a year. The idea of firmware and software taking time to be brought up on new hardware was also squeezed; currently, firmware and software are expected to be brought up on the same day that first silicon arrives, with full post-silicon verification complete only a few months after.
This requires more overlay during design, earlier
system-level testing, and better coordination between hardware and software engineers. The pressure has created hardware and software convergence, such that today, many hardware and software engineers
are sharing tools, especially the C language, for testing and even for specification. You can now mix procedural and logic design in the extended C language, which can be directly
synthesized and downloaded as FPGA bitstreams. Concurrency and convergence have led to unprecedented efficiencies, such as automation of function allocation between procedural code run on a processor core
and functions synthesized as combinational logic and state machines; in the end, a single image may be downloaded to a reconfigurable computer.
Configurable SoCs have enjoyed a lesser degree of convergence. For example, test benches and cosimulation allow software and hardware engineers to verify the hardware/software interface with cycle accuracy
and simulated logic analyzer traces -- setting up the software and hardware engineers for very well defined post-silicon verification. Figure 3 illustrates
the concepts of concurrency and convergence. The use of common languages such as C and hardware definition languages that can interface to C have provided significant leverage, because hardware and software
engineers can now assist each other and speak some common language.
Figure 3. Convergent and concurrent hardware/software engineering
The impact of process automation and the level of concurrency and convergence in SoC design should not be understated, and this type of process is
now required for competitive product development. At the same time, rapid advances have been made in processor core, interconnection I/O, storage capacity, non-volatile memory,
and depth of device integration. This advance has required re-thinking of the Von Neumann architecture (see the article "Effect of increasing chip density on the evolution of computer architectures" linked to in Resources for more on this).
Processor speed and memory have become so unbalanced that not even large three-level caches can maintain good processor pipeline efficiency. When the cache
miss penalty is hundreds of core cycles, the impact of even a very occasional miss is significant. While this is simple math, it's often not stressed enough. For example, if the miss
penalty is 200 core cycles, and a core misses cache just once out of 100 accesses, the clocks per instruction (CPI) will increase from a baseline of 1 up to 2.99: the efficiency of a core is reduced by
a factor of three for a very occasional cache miss. Many high-speed processors have average CPIs that are not only greater than 1, but in the double digits, so that only a small percentage of processor capability is
being used. This huge mismatch has prompted researches to look into PIM architectures, where multiple lower-speed processors are designed into cells with on-chip, tightly coupled memories.
The IBM Cell BE architecture and Blue Gene®/L and Blue Gene/C architectures have taken this approach and extended it.
 |
Emergent processing architectures
Overall, the ability to complete new SoCs and to build products using configurable and reconfigurable architecture has allowed for a wide range of new
architectures to emerge that are addressing the Von Neumann bottleneck. Configurable and reconfigurable SoCs have enabled quick architectural experiments with multiple cores, core interconnection, on-chip memory, and more radically different approaches, such as PIM and the Cell BE architecture.
With these architectures, the SoC architect can make quick turns on system designs to experiment with and evaluate acceleration and automation of functions with FPGAs or with configurable simulation.
Acceleration most often involves simple low-level operations that might even be implemented as extensions to the instruction set, much like the AltiVec instruction set extensions. Hardware automation
involves the offloading of significant functions from software implementation to hardware state machines. The re-allocation of functions, along with PIM and Cell BE architectures, has led to better
resource balance, so that the efficiency losses for grossly mismatched cores can be reclaimed in SoCs using multiple cores, most often providing asymmetric multiprocessing.
The Blue Gene/L and Cell BE architectures have shown significant performance improvements over predecessor architectures; in some cases, processor cores based on these architectures can outperform older processors running at higher clock rates.
In general, SoCs have played a role in a number of emergent architectures, including:
Configurable ASIC design (open IP cores/cells): Cell BE and core re-use, along with the open core and open IP concepts, have been central to SoCs (see the article "Free chips for all" linked to in Resources for more information).
Reconfigurable SoC platforms: Platforms such as Virtex II and IV have been central to concurrent and convergent design as well as hardware acceleration and automation offload.
Hardware acceleration and automation: Configurable and reconfigurable SoCs include synthesizable cores providing tighter integration to offload state machines and ability to extend instruction sets.
Processor in memory or intelligent RAM: The IBM Cell BE architecture has extended concepts pioneered at UC Berkeley, the University of Notre Dame, and Caltech to address the issue of grossly mismatched processor core clock rates relative to memory access. The POWER™ architecture provides reusable cores that are simple to match to memory speed and to integrate on an SoC with significant on-chip memory.
Cellular architectures: These continue to evolve with Blue Gene/C, formerly known as Cyclops, which is extending PIM concepts employed in Blue Gene/L to provide highly efficient processing that is less strangled by the Von Neumann bottleneck and Dance Hall architecture for multiprocessor systems.
Neural and analog computing: The SoC provides levels of integration that will allow for future experimentation with even more novel architectures, which might include neural or mixed-signal analog front-ends on chip. These radically different architectures might be able to be used more easily when integrated with an SoC rather than standing on their own.
Quantum computing: Ultimately, the only way to achieve computability that exceeds the capabilities of the Von Neumann architecture is to make a radical departure from traditional architecture, a huge leap in integration, and use inherently different device physics such as that provided by spintronics and quantum computing.
 |
Emergent memory devices
 |
Spintronics
The basic idea behind spintronics is to store 1s and 0s based upon electron spin rather than charge. The spin is detected as a weak magnetic energy state and can be used
for memory devices and in quantum computers. Basic research is required to put this into practice, and it will take significant time before this new information physics can be harnessed. In the meantime, though, projections show
that there is no danger of Moore's law failing to hold beyond the end of this decade, as 50-nanometer technology is seen as achievable by 2011.
|
|
The emergent PIM and Cell BE architectures have shown that the ability to integrate memory on chip has a huge advantage. Furthermore, the need for external non-volatile memory for booting SoCs violates the SoC design
goal of a single-chip design. Several new memory technologies look promising for SoC application. They aim to provide better execution efficiency with significant on-chip, well-matched
memory, and to provide potentially sizable on-chip non-volatile memory stores.
Ferroelectric RAM (FRAM): A non-volatile memory, erasable based upon electric field orientation, with nearly unlimited erase cycles and physical similarity to current DRAMs with an added ferroelectric layer. It is expected that FRAM will ultimately be able to provide a higher density than current flash technology.
Magnetoresistive RAM (MRAM): A non-volatile memory, believed to be the ultimate memory in terms of density and non-volatility once it is perfected based upon research in spintronics (see the sidebar for more information).
Microelectronmechanical or nanomechanical memory: The integration of microelectronmechanical systems, whereby mechanical devices are manufactured on chip into SoCs, may provide novel non-volatile memories. The IBM Millipede device (see Resources) has demonstrated this possibility.
How these emergent memory technologies will play out is not clear, but once again it appears that performance and density will continue to increase as expected through the end of this
decade with further refinement of on-chip SRAM, DRAM, and flash memory. The advancement with PIM and Cell BE architectures has shown that, while density is important for bringing memory on chip, access latency is
also important, and both will help distinguish the emergent technologies that will be most promising for future SoCs.
Interconnection network advances
Advancement in interconnection networks for SoCs has come from on-chip buses such as CoreConnect™ as well as SoC-to-SoC interconnections networks, such as the Torus used in Blue Gene/L.
One of the big questions is whether on-chip optical interconnection will provide the next huge advancement in interconnection networks. Presently, high-speed differential serial and fiber optic links have provided gigabit
interconnection at the board and network level (gigabit Ethernet, 10G Ethernet, and PCI-Express 1.0 and 2.0 at 2.5 and 5 gigabits, respectively), but the use of optical on-chip technology has not yet proven to be worthwhile. Presently, copper wiring is meeting
performance needs, especially given the trend of integrating more memory and less random logic on chip.
At the same time, optical-electronic-optical transceivers for on-chip integration still have hurdles to overcome at the device level before they
become practical. (To learn more, see "The future of interconnection technology" linked in Resources.) It is more likely that optical interconnection will be used to increase speeds on board-level interconnections like PCI-Express before optical technology is used on chip.
However, encoded differential serial, such as PCI-Express 1.0, has proven to work well over copper at gigabit rates. 10G Ethernet makes use of fiber, but new differential serial DC-balance encoding methods and signal processing continue to enable more bandwidth to be
squeezed out of copper.
Killer applications that SoCs of the future might enable
Predictions for processing, memory, and I/O indicate that Moore's law will continue to apply through this decade, and emergent technologies rooted in quantum devices and optics
will offer huge jumps ahead once current methods begin to hit physical limits. Much of the SoC work of the future will likely continue to perfect the engineering process and to refine architecture for better resource balance.
When emergent device physics are fully ready, it very well may be the case that SoC designers will have unprecedented resources that, for today's applications, will almost seem limitless. History, however, has shown that
software designers have an amazing propensity to consume newly available resources as Moore's law advances. The following list is a prognostic and imaginative list of killer applications that just might be able to soak up
future resources:
Supercomputing. Petaflop computing would be capable of tackling problems like carbon cycle modeling, protein folding, and cryptanalysis.
Real-time wearable artificial intelligence applications. This might include real-time natural language translation (yes, I mean a Babel fish), real-time automatic sign language recognition, or high-performance computer vision.
Passing the Turing test. A convincing dialogue with a computer that exudes intelligence might be possible.
Super Turing computability. Unprecedented amounts of resources could produce bounded solutions for current NP-hard problems, such as the traveling salesman problem that applies to place and route in synthesis.
Fully immersive virtual realities. We could see full surround and direct neural or vision interfaces (described in The Age of Spiritual Machines; see Resources).
Significant physical/mental augmentation and life extension. Artificial vision, human memory augmentation, and embedded therapeutic devices may all be possible.
Very skilled prognosticators like Raymon Kurzweil have imagined the emergence of fantastic killer applications such as those listed above. The emergence of quantum
and optical computing devices suggests that computing capabilities rivaling or exceeding human intelligence will perhaps be possible. Kurzweil believes that in fact this will happen, and that
not only will Moore's law continue to hold, but that the rate of processor improvement will even increase. Most notable on the other side of this debate is Roger Penrose, a quantum physicist who simply believes that even with massive resource
advancements, the silicon mind will never match wit with the carbon.
Conclusion
Vernor Vinge and Ray Kurzweil promote the concept of the singularity, a moment that will arrive when computers or machines created by humans outsmart us. Reviewing the rapid pace of SoC
advancement requires pause and some wonder -- not only have device physics advanced in line with or ahead of Moore's law, but the ability of SoC designers has likewise kept pace with new advances in
hardware/software concurrent design and the convergence in hardware/software tools and practices. Clearly, the final outcome of this first decade of the new millennium promises some exciting new SoC applications, ranging
from embedded devices to supercomputing. These advancements can significantly improve the quality of human life if they can be harnessed in products that are useable, reliable, and cost effective. This is the challenge that
faces the SoC designer going forward: how to use the plethora of resources effectively.
Resources Learn
- SoC development is aided by the reuse of design available as open IP cores described in the IBM developerWorks article "Free chips for all," Jamil Khatib (August 2000). This movement has followed in the footsteps of open source software to a large degree.
- The IBM POWER™ architecture has been an integral part of
SoC configurable and reconfigurable development, as summarized in "POWER to the people," Nora Mikes (developerWorks, April 2004). The ability to concurrently develop software and hardware, and to rapidly
turn designs using reconfigurable hardware and simulation software, has allowed developers to keep pace with rapid advancements in device physics,
fabrication, and massive scales of integration.
-
Where the ever-increasing chip densities will take us, how long Moore's law will prevail, and what sort of applications will be made
possible is the topic of "Effect of increasing chip density on the evolution of computer architectures," R. Nair (IBM Journal of Research and Development, 2002).
-
Many researchers and architects have asked: when will Moore's law cease to hold true? With current technology it seems it must, but with new device physics such as spintronics, perhaps
increases will continue. For more background, see "Spintronics: A retrospective and perspective," S. A. Wolf,
A. Y. Chtchelkanova, and D. M. Treger (IBM Journal of Research and Development, 2006).
- Development of basic logic circuits using spintronics
is key to the adoption of spintronic computing. Advances in room temperature spintronic logic devices have been made at the University of Notre Dame, described in "A logical leap" (Economist, January 2006, reprinted on Notre Dame's Web site) and more fully in Science, Volume 311, page 205 (not available online).
- Increases in computing and memory performance must go in hand with higher performance interconnection networks. "The future of interconnection technology," T. N. Theis (IBM Journal of Research and Development, 2000) provides
insights on this topic.
- Storage can become a bottleneck as capacity increases and the ability to access non-volatile storage still lags
RAM significantly. New technologies such as Millipede may help significantly.
- The ability to fuse the process of hardware and software development into more integrated approaches,
as described in Practical FPGA Programming in C, David Pellerin and Scott Thibault (Prentice Hall, 2005), can
help speed development by enabling a greater degree of concurrent development.
-
Ray Kurzweil has innovated numerous killer applications taking advantage of Moore's Law, including artificial intelligence (KurzweilAI.net). He's also started
numerous ventures, including Kurzweil Companies, Kurzweil Education, and Kurzweil Music Systems.
- Considering the promise of breakthrough technologies like spintronics, quantum computing, optical interconnection, and the ever increasing chip density seen so far, Kurzweil not only believes Moore's Law will continue to hold, but that
it will accelerate. For more on Ray's view of the future, Wikipedia provides great background; he predicts that eventually silicon processing will surpass the cognitive abilities of the human mind. In contrast, Roger Penrose fundamentally disagrees with Kurzweil and provides insight into why the human
mind has computational abilities that are impossible to replicate in silicon.
Find a brief overview of the technological singularity in debate between Roger Penrose and Ray Kurzweil on Wikipedia as well. (Although Wikipedia is often criticized for containing much that is
apocryphal and/or wildly inaccurate, it is even more often "a good
starting place" for learning more on a given subject.)
- Penrose's
The Emperor's New Mind (Penguin, 1991) provides a compelling summary of his arguments that is technically accessible and very well written. Likewise, Kurzweil's The Age of Spiritual Machines (Penguin, 2000)
provides the counter view that the technological singularity is inevitable and a positive event to welcome.
- Find out more about the IBM Blue Gene projects.
Get products and technologies
About the author  | 
|  | Dr. Sam Siewert is an embedded system design and firmware engineer
who has worked in the aerospace, telecommunications, and storage
industries. He also teaches at the University of Colorado at Boulder
part-time in the Embedded Systems Certification Program, which he
co-founded. His research interests include autonomic computing,
firmware/hardware co-design, microprocessor/SoC architecture, and embedded
real-time systems. |
Rate this page
|  |