GCC is the cornerstone of development in both the open source and closed source worlds. It's the enabler of architectures and operating systems. When a new processor appears, its success depends on a version of GCC that will support it (a back end that can generate code for it). GCC is also the enabler of Linux®. Linux as an operating system is widely successful because it is run on so many different architectures. Once again, a port of GCC to the target environment enables Linux to be ported and run on it. Without trying to put too fine a point on it, GCC paves the way for Linux and embedded development.
But GCC can't just sit still. New processor architectures continue to appear, and new research finds better ways to optimize and generate code. So GCC moves forward and has now matured into its fourth major release. This article explores the fundamental changes in GCC version 4 to show you why—if you haven't switched yet—the time has come to use the compiler standard.
GCC originally stood for GNU C Compiler when it was first
released by Richard Stallman in 1987. (A historical timeline of GCC is
shown in Figure 1.) Richard started the project in
1984 with the desire to build a free
that could be used, modified, and evolved. GCC originally ran on early Sun
and DEC VAX systems.
As an open compiler (that is, the source was freely available), others began to provide fixes and—more importantly—updates for new languages and target architectures. Not long after, its acronym was changed to mean GNU Compiler Collection, as it supported numerous languages targeted to the most popular (and esoteric) architectures.
Figure 1. A modern history of GCC releases
Today, GCC is the most popular compiler toolchain available. The same
source base can be used to build compilers for Ada, Fortran, the
Java™ language, variants of
C) and covers the largest number of
target processor architectures of any compiler (30 supported processor
families). The source base is also completely portable and runs on more
than 60 platforms. The compilers are highly tunable, with a large number
of options for tweaking the generated code. GCC, in short, is the Swiss
Army knife of compilers and redefines the meaning of flexibility. It is
also the most complex open source system that exists: Today, GCC is made
up of almost 1.5 million lines of source code.
Wow! With all of that, you'd think I was truly enamored with GCC. Let's just say that when I'm developing software with GCC and my wife walks into the room, I feel a little uncomfortable.
Compilers are constructed in a pipeline architecture made up of several stages that communicate different forms of data (see Figure 2). The front end of a compiler is language specific and includes a parser for the given language that results in parsed trees and the intermediate representation (the Register Transfer Language, or RTL). The back end is then responsible for using this language-independent representation and product instructions for the particular target architecture. To do this, the optimizer uses the RTL to create fast or more compact code (or both, when possible). The optimized RTL is then fed to the code generator, which produces target code.
Figure 2. Simplified view of the compiler stages
GCC 4 brings many changes to the standard compiler suite, the biggest of which is around support for optimizations with the introduction of the tree Static Single Assignment (SSA) form. But in general, the compiler is faster in some optimization modes and provides many new enhancements, including new target support. GCC 4 is also much more thorough when it comes to warnings and errors (in fact, certain warnings may now show up as errors with GCC 4). One drawback to GCC 4 is that it is not binary-compatible with objects built with the GCC 3 compilers (which means that source must be recompiled with GCC 4)—unfortunate, but it's the price to be paid to move forward.
Let's look at some of the key advancements introduced with the new GCC 4.
The 4.0 release (4.0.4 being the last in the series) is the first step into GCC 4. As such, it was not recommended for production development until a stabilization process could be completed. This release included a large number of changes—two in particular being the introduction of a new optimization framework (Tree SSA) and support for autovectorization.
Prior to GCC 4, the intermediate representation used was called Register Transfer Language (RTL). RTL is a low-level representation very close to assembly language (inspired by LISP S-expressions). The problem with RTL is that the optimizations it enables are those close to the target. Optimizations that require higher-level information about the program may not be possible, because their expression is lost in RTL. Tree SSA is designed to be both language independent and target independent while supporting improved analysis and richer optimizations.
Tree SSA introduces two new intermediate representations. The first is called GENERIC, which is a generic tree representation that's formed from the language front-end trees. The GENERIC trees are converted into GIMPLE form and a subsequent control flow graph to support SSA-based optimizations. Finally, the SSA trees are converted into RTL, which the back end uses for target code generation. An overly simplified description, but the result is a new intermediate form better suited for high- and low-level optimizations. (See Resources for more details on this process.)
As the changes really represent a new framework, it's possible to define new optimizations. Several new optimizations have been implemented to date, but more work is obviously ahead to ensure that GCC generates the most compact and efficient code possible.
Another interesting change for GCC 4 is the addition of a loop vectorizer (based on the Tree SSA framework). Autovectorization is a feature that allows the compiler to identify scalar processing loops within code that can benefit from vector instructions available in the target processor. The result is tighter and more efficient target code. Another loop-based optimization is Swing Modulo Scheduling (SMS), which is used to construct instruction pipelines with the goal of minimizing cycle counts by exploiting instruction-level parallelism. More information on each of these new approaches is available in Resources.
Finally, the 4.0 series also introduced (in addition to many
C++ changes) a
new Fortran front end that supports Fortran 90 and 95 (rather than the
older Fortran 77, which was supported in GCC 3). New Ada 2005 features can
also be found as well as support for Ada features on many more target
With the new optimization framework in place, the 4.1 release series introduced a larger number of optimizations, such as improved profiling support and more accurate branch probability estimation. Two of the more useful optimizations are better inline support and the ability to exploit the instruction cache locality. When functions are to be inlined, the compiler no longer inlines functions that are not executed frequently. Instead, hot call sites are more likely to be inlined to keep the code size small while still getting inline function benefits. GCC can also help to partition functions into hot and cold sections. Keeping hot functions together (that is, those functions that are used more often) results in better instruction cache use compared to polluting the cache with cold functions.
The front end saw a number of updates, including support for
C++. There were also a very large
number of updates for the Java core library (libgc). The back end saw the
introduction of support for the IBM® System z™ 9-109
processor, including 128-bit Institute of Electrical and Electronics
Engineers (IEEE) floating point numbers and atomic memory access built in.
If that weren't enough, the back end can now emit code to protect against
stack-smashing attacks (that is, buffer overflow detection and reordering
to protect against pointer corruption). Some built-in functions have also
been updated to protect against buffer overruns with a minimal amount of
The 4.2 release series continued with new optimizations and enhancements that covered both languages and processor architectures. The back end was updated to include support for Sun's UltraSPARC T1 processor (codenamed Niagara) as well as Broadcom's SB-1A MIPS core.
The front end also saw changes in version 4.2 with the overhauling of
C++ visibility handling and support for Fortran
2003 streaming input/output (I/O) extensions. But one of the most
interesting changes in the 4.2 release was the addition of OpenMP for the
C++, and Fortran
compilers. OpenMP is a multi-threading implementation that allows the
compiler to generate code for task and data parallelism.
Using one aspect of OpenMP, code is annotated with areas in which parallelism should occur using preprocessor directives. The code is converted into a multi-threaded program for the duration of block, then joined back together as each thread within the block finishes.
Figure 3 provides a look at how this process works in
practice. OpenMP provides not only a set of
pragmas (that is, preprocessor directives) but
also functions for
C++, and Fortran. In Figure 3, you see a simple
program that directs elements of code into multiple threads (parallelizing
for block). The effect is shown graphically
in Figure 3: A traditional program would execute the loop sequentially,
whereas the OpenMP implementation creates threads to parallelize the
for block. You can learn more about OpenMP in
Figure 3. Simple example of OpenMP support
The current release series in GCC 4 is 4.3. This release series shows an acceleration of features and supported architectures (as well as unsupported architectures, as many obsolete architectures and ports have been removed). New language support was added for Fortran 2003 as well as a host of general optimizer improvements.
New processors supported in this release include several in the Coldfire processor family, the IBM System z9 EC/BC processor, the Cell broadband engine architecture's Synergistic Processor Unit (SPU), support for SmartMIPS, and numerous others. You'll also find compiler and library support for Thumb2 (compressed ARM instructions) and for the ARMv7 architecture as well as tuning support for Core2 processors and the Geode processor family.
In the front end of the compiler, the internal representation for GIMPLE was redefined, meaning that the compiler consumes less memory.
Work has already begun on the 4.4 release series, and its moving toward a
general release. In version 4.4, you'll find numerous bug fixes and more
general optimizer improvements. Version 3.0 of the OpenMP specification
has also been integrated for
C++, and Fortran.
The compiler will also now allow you to define an optimization level at the
function level (instead of at the file level, which was the previous
default). This functionality is provided by the
optimize attribute, which also allows you to
specify the individual options for the optimizer.
Finally, processor support was added for the Picochip, which is a 16-bit multi-core processor. What's interesting about the Picochip is that each core can be programmed independently, with communication provided in a mesh.
The future is obviously bright for GCC. The toolchain continues to
evolve—both architecturally and incrementally—to support
the latest in processor architectures. You'll also find that the language
landscape is well covered by GCC. Under development is support for a
number of different languages, such as Mercury, GHDL (a GCC front end for
VHDL), and the Unified Parallel
In addition to GCC's bright future, its continued improvement means benefits for all types of software (from Linux and Berkeley Software Distribution [BSD] to Apache and everything in between). Software compiled with GCC 4 will be generally more compact and faster, meaning software industry goodness all around.
Collection home page is the source for all things GCC.
You'll find news and the latest source for GCC (including
historical and leading-edge distributions). You can also find the detailed
release histories for
GCC 4.2, and
The paper "Tree
SSA: A New Optimization Infrastructure for GCC" (PDF, 14 pages) introduces Tree SSA, the new optimization
framework that is one of the most important new features of
The IBM Research paper
in GCC" (PDF, 14 pages) describes an
optimization technique that identifies parallel operations in a program
and applies Single Instruction, Multiple Data (SIMD) instructions to make
them more efficient.
The IBM Research paper "Swing
Modulo Scheduling for GCC" (PDF, 10 pages) describes Swing Modulo Scheduling,
a software pipelining technique that
improves the scheduling of instructions in loops by overlapping
instructions and reducing memory pressure.
Read Wikipedia's introduction to
OpenMP, a great new addition to GCC
that provides improved performance through multi-threading. You can also read more in the
Read more of Tim's
articles on developerWorks.
- In the
developerWorks Linux zone,
find more resources for Linux developers (including developers who are
new to Linux), and scan our
most popular articles and
Linux tips and
Linux tutorials on developerWorks.
Stay current with
developerWorks technical events and Webcasts.
Get products and technologies
Order the SEK for Linux,
a two-DVD set containing the latest IBM trial software for Linux from DB2®,
Lotus®, Rational®, Tivoli®, and WebSphere®.
IBM trial software,
available for download directly from developerWorks, build your next development
project on Linux.
Get involved in the
developerWorks community through blogs, forums, podcasts, and spaces.
M. Tim Jones is an embedded software engineer and the author of GNU/Linux Application Programming, AI Application Programming (now in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado.