Way back when, before ISO® and ANSI® got involved, C was defined, for most practical purposes, by a book called The C Programming Language, written by Brian Kernighan and Dennis Ritchie (see Resources). Most often referred to simply as "K&R," this book was as much of a standard as there was for the C programming language from its publication in 1978 until its replacement with a second edition in 1988. (The very early 2nd edition K&R books, inconveniently, ended up not being quite aligned with the final standard; this was corrected in 1989, when the final standard was released.)
By the time standardization work began, multiple C compilers were available, produced by several vendors. New features were being added, gradually, but support for a new feature might be unreliable, or erratic, from one system to another. The C standard, released by ANSI in 1989 (and again, by ISO, in 1990) offered vendors a fixed target to implement and developers a stable platform to develop for.
Yes, that's right; the standard was released twice. An ANSI committee did the original work, and the involvement with ISO took a while to straighten out. Since then, the ANSI and ISO committees have met at the same time, and in the same place, to try to prevent any similar problems in the future.
This article offers a brief overview of the C standard as it exists today and some insights into the technical decisions, and compromises, that made the standard possible. Rather than focusing on a broad overview of C, however, this article looks at some of the nitty-gritty that makes a standard work.
The road to good standards is paved with compromises
A recurring theme in standardization is compromises. Users want the fastest possible performance, and they also want the most predictable behavior. Vendors want the latitude they need to provide both of these, and they want reasonable development costs. In many cases, the interests of different groups may not align. For instance, vendors might need to target a CPU on which integer division has certain properties, while users might want predictable properties across a variety of systems. Providing the predictable properties users want could impose a substantial performance cost on systems whose CPUs don't work that way by default. Neither users nor vendors will be happy with reduced performance. The C standard is a great place to study the art of compromise in a standard.
The C language is intended to be portable to a very broad range of systems and to allow for reasonable performance on those systems. Some languages specify a lot more about the order of operations, and the exact outcomes of operations, than C does. C's comparatively loose specification comes from a desire to, as often as possible, allow operations to be implemented on a broad variety of hardware, using native instructions.
For instance, in C, the order in which function arguments are evaluated is not specified; indeed, up to a point, they might be evaluated simultaneously. A simple line of code such as:
printf("%d, %d\n", i++, i++);
produces what is called "undefined behavior." That means not only that the language doesn't guarantee which argument is evaluated first -- it doesn't guarantee anything at all. The resulting code could, for instance, crash. It might only increment the variable i once. It might increment i twice, then restore it to its previous value. There are no promises made about what happens.
That's often regarded as inconvenient. On the other hand, consider a similar line of code:
printf("%f, %f\n", f * f + g, g * g + f);
In this case, the behavior is perfectly well-defined, and the user might well appreciate having a machine with a modern floating point unit performing all of the floating point operations at once, or coalescing them into a pair of fused multiply add instructions, the second of which might start before the first is complete.
A rule which strictly enforced the order of evaluation of function arguments could make some optimizations like this harder to perform consistently. The result is a compromise: a position which leaves users the option of achieving specific results (by spelling them out more specifically), and implementors leeway to optimize aggressively. Some languages, such as Java™, have gone the route of specifying operations more completely, at some possible cost in performance.
Compromises like this often evolve during the development of new features. The C standard library is a collection of useful and widely portable routines, which are sufficiently widely used to merit formal inclusion in the C standard. Most of the library evolved as a compromise between the difficulty of porting some functionality to a variety of systems, and the importance of that functionality to average users.
One of the recurring themes in C standardization is that some issues are left as "quality of implementation" issues. This basically means that the marketplace is expected to be able to select better implementations over worse ones, without the standard strictly requiring something that could be impractical for some vendors.
Freestanding and hosted environments
In some cases, the compromises needed to make everyone happy simply aren't possible. Users developing for desktop systems, workstations, and servers generally want a great deal of functionality that can be readily implemented across a broad variety of systems. Unfortunately, this functionality may be dependent on features that, while common to "computers" as most people think of them, are hardly available on embedded systems.
Since C is also used for embedded development, the trade-off between size and functionality becomes an issue. Furthermore, embedded systems often don't have the framework or resources needed to provide many of the standard library functions. People developing garage door openers with a kilobyte or two of memory don't necessarily want to have the code for formatting floating point numbers in ASCII loaded in their very limited memory footprint. Similarly, many of the standard library functions are of limited applicability in some environments: the garage door opener probably doesn't have any kind of disk, so most of the file manipulation functions are moot.
Building a standard that addresses the needs of such diverse markets is tricky. Users want real guarantees from a standard, and removing most of the library would make the language much less useful for desktop users. On the other hand, requiring the whole library is unacceptable for many embedded users.
The compromise reached was to distinguish between "hosted" and "freestanding" environments. A hosted environment is, roughly speaking, an environment in which programs are being supported by some kind of operating system that provides the usual selection of services. A freestanding environment is likely one in which the program lives on its own and controls the hardware directly. These generalizations don't always apply, but they provide a good basis for understanding the distinctions between these types of environments.
For instance, in a freestanding environment, it is not required that
the printf function be available.
Freestanding environments may not have any recognizable input or output
devices, so demanding that they provide a way to format output might not
make much sense.
The features considered to be part of the C language (as opposed to the standard library) are required in both hosted and freestanding environments. By contrast, only small portions of the standard library are required. In C99, the features required in a freestanding environment are those specified in these header files:
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>,
<stdbool.h>, <stddef.h>, <stdint.h>
These headers provide only support for use of the basic C types
(including functions with variable argument lists). For instance, they
define macros related to the core integer and floating point data types,
as well as the new (in C99) bool type.
There are a few surprises for many people in looking at real examples
of hosted and freestanding environments. For instance, the kernel of a
Unix-like system is probably considered to be compiled in a freestanding
environment. On NetBSD, the compilation flags for building the kernel
include -ffreestanding, the flag used to warn
the GNU C compiler to override its usual compilation support for the
standard library.
In freestanding mode, the compiler doesn't provide built-in support for
common library functions; normally, GCC is aware of some library
functions even when corresponding headers haven't been included, and
might replace them with inline instructions or otherwise optimize them.
Also, in freestanding mode, the compiler is less fussy about the
declaration of main.
There's another, more subtle, distinction. In a hosted environment,
execution of a C program begins at a function named main, returning int and
taking either zero or two arguments. Other signatures may be allowed by
specific implementations, but they are pure extensions, and an
implementation is not required to accept any other declarations of main. In a freestanding environment, there is no
standardized place for code to begin. For instance, the previously
mentioned NetBSD kernel starts execution in a routine named start. While the kernel does have a main routine, that routine is called by the initial
start routine (after roughly 400 lines of
assembly code to identify the CPU type, build initial data structures, and
otherwise prepare the machine for execution). The main routine in the kernel, unlike one in a hosted
environment, has a return type of void, because
there is no host environment to which a value could be returned.
Most desktop compilers are, or at least can be, a hosted environment, and the hosted environment is what most programmers who aren't doing embedded work think of when they talk about "C." In many cases, especially on modern systems, developers are very likely to prefer a hosted environment. The promise of the functionality of the standard library (standardized I/O, memory management, the math library) and other features (such as complex arithmetic) is worth a great deal to a developer.
Unfortunately, it's also potentially very expensive to an implementor. A simple development kit might leave the question of providing such an environment to the developer. In some cases, the work necessary to get a Unix-like kernel running on a board may be less than the work it would take to implement even the subset of kernel functionality directly needed for an application. As an obvious example, many simple home routers and gateways are, under the hood, running some Unix-like operating system. A more famous example is the TiVO®, which was one of the first Linux-based appliances on the market, and certainly one of the most visible.
In short, in many cases, if all you have is a freestanding environment, the logical step is to load a hosted environment on it. Hosted environments are, in most cases, much more comfortable development environments, in which a lot more code can be written portably.
A particularly famous example of a freestanding implementation -- and
one which caught many developers by surprise -- is Microsoft® Windows® (both
win16 and win32). Programs starting at a function called WinMain, which do not start with input or output
streams to manipulate, are clearly not traditional hosted programs. What
surprised a lot of people, though, was that sprintf wasn't provided either; instead, you had to
use a special function called wsprintf. So,
because there's no textual console for most Windows applications, the
function most programmers use to construct strings with data in them (say,
printing numbers in the middle of a message) is simply unavailable.
Modern development systems are more likely to provide a more complete
library; this is considered a quality of implementation issue.
This highlights a limitation of the simple two-way split. While most people would expect a desktop computer operating system to provide a hosted environment, a graphical OS might not adapt well to the simple textual input and output model the standard presumes. But this still leaves a vendor free to omit major pieces of support that programmers are reasonably expecting to be available in such an environment. Of course, just adding more types of implementations makes life harder for vendors, test suites, and ultimately developers. The current compromise is a good one.
Conveniently, complete source for implementations of the standard C library is available from a variety of sources (see Resources); this generally makes it possible to add the specific functions needed to an environment which doesn't provide them.
One of the features that was standardized in C89 was limit macros. For
instance, there's a macro, INT_MAX, which
defines the maximum value an object of type int
can hold. These macros can be used to test whether a given operation can
be done using the (typically optimal for speed) plain int type, or whether it needs the larger guaranteed
range of the long type. Sometimes, a macro is
defined only if a given feature is available; this can be called a
"feature test macro."
Note that limit macros existed before C89, but they weren't predictable.
The nature of the problem was that you couldn't necessarily guess the
range of int without help from the compiler.
Moving ahead to C99, the current iteration of the standard
provides feature test macros for a number of optional
features. For instance, an implementation providing an integer type
exactly 32 bits wide should define the corresponding macro, INT32_MAX. If the type is unavailable, the macro
should not be defined. Macros are provided for testing the availability
of a number of floating point features as well.
Along a similar line, some systems developed standard ways to request,
or signal the availability of, common extensions. For instance, a program
running on a UNIX® system might define the _POSIX_C_SOURCE macro before including headers, to
indicate that POSIX extensions to the standard headers were desired.
Note that this usage is subtly different from the Open Group's use of
macros to request that features be provided. In the C99 standard, the
implementation defines macros, and the application tests for them.
However, the use of macros to request or enable features is not unheard
of. The most obvious example is the NDEBUG
macro, which disables the assert()
function-like macro, if it has already been defined when <assert.h> is included.
Of course, both types of macros are called "feature test macros," so the terminology can be a little ambiguous.
Informative and normative annexes
There are some parts of the C standard that are attached to the end of the main body of the text. These come in two varieties: a normative annex has the same force as the rest of the standard, while an informative annex is merely polite suggestions. The same terminology is used for the text in the body of the standard: the standard's text is normative, but the footnotes and examples in the standard, and the associated Rationale document, are informative. Informative material is offered to help vendors do the best possible job, without imposing unduly burdensome requirements.
A normative annex may seem a little counterintuitive; why not just put it in the main body of the standard? One reason would be an entire feature which is optional, but which, if it is provided, must comply with specific requirements (this is another application for feature test macros). One example of how this can be useful is the normative annex for floating point mathematics.
There is a standard, IEC 60559, for floating point arithmetic. This standard makes a number of promises about the precision available, the handling of overflow, rounding modes, and other things which floating point users are likely to care about a great deal. (This standard is also sometimes called IEEE 754.) However, not every processor supports IEC 60559 floating point.
And so requiring all implementations to provide complete support for a fairly large standard would be unduly burdensome. But at the same time, it's very hard on developers to have no way of checking for the availability of reliable or predictable floating point. How to solve?
A normative annex nicely solves this problem: in this case, Annex F.
An implementation which conforms to the additional requirements described
in this annex can predefine a preprocessor macro, __STDC_IEC_559__, which can be used in preprocessor
code to test for conformance to these additional requirements. This is
normative text, so any implementation defining the preprocessor macro
must actually provide conforming arithmetic. Once again, a
compromise between requiring everything of everyone, and requiring nothing
of anyone.
An example of an informative annex is the list of undefined and implementation-defined behavior. This is provided because it is useful, not because the information couldn't be extracted from the standard through diligent effort. These are to be contrasted with documents such as a Technical Corrigendum or a Normative Addendum, which are expected to be incorporated into future revisions of the standard.
Note that when new features are added, they don't necessarily get adopted right away. Some of the new features in C99 aren't as widely available as one might hope, and many users are wary of using features that haven't yet been implemented everywhere. However, many of the new features are widely enough available to make it practical for developers of new code to start using them, with the reasonable expectation that there is support on enough of the key target platforms to keep the code running. The GNU C compiler's aggressive work on supporting C99 features has done a lot to make this possible.
Another way the standard changes is through reported "bugs" in the standard. A Defect Report, or DR, is (as the name suggests) a report about a defect in the standard. DRs may cover errors or omissions in the standard; responses to them are bundled up and released. The first batch of fixes, Technical Corrigendum 1, was released in 2001.
Market forces, C, and standardization
Some languages, such as Java (TM) (R) (C) (Ph.D.), use trademark law or other similar laws to enforce standardization. By contrast, the conformance to the C standard is enforced mostly by market forces. The major limitation is that a compiler which is sold as a conforming compiler could open a vendor to false advertising claims if it is not conforming. In practice, though, most compilers are used in a mildly non-conforming mode, full of extensions and special features. Still, standards conformance matters to many customers, and test suite vendors make a living being able to check compilers out reliably.
One major factor in C's success has been the competitive market. The early C compiler market was fairly competitive, with a lot of very good C compilers. The landscape shifted, fairly dramatically, with the development of the GNU C compiler. When GCC first came out in 1987, it wasn't serious competition for the commercial compilers. Times have changed, and GCC is now the compiler of choice for some vendors, including Apple®.
The competitive landscape has been substantially altered by GCC. It's not just that it's a free compiler, coming into a field where professional products still sell for hundreds (or thousands) of dollars. It's also that GCC has consistently pushed hard at the boundaries of quality of implementation. I've seen a compiler whose sole diagnostic was "Error in source code." I've had compilers that couldn't reliably identify where an error was. By contrast, GCC's diagnostics are excellent. Likewise, GCC has pushed for new features where other compilers haven't. Some of the new C99 features probably exist largely because of prior art in GCC.
Furthermore, GCC has provided a sandbox everyone can benefit from playing in. After some initial conflicts with the NeXT people over the need to contribute new code back to the project, GCC has provided a strong incentive for vendors to open up, make changes and related documentation available, and generally improve the state of the art for all GCC users on all platforms.
It is instructive to compare the success and history of GCC with the effect that free compilers (such as GNAT, for Ada) have had on other languages. It's clear that GCC has made C more accessible to the masses.
Other extensions and ongoing development
Some of the new features in C99 aren't as widely available as one might hope, and many users are wary of using features that haven't yet been implemented everywhere. However, many of the new features are widely enough available to make it practical for developers of new code to start using them, with the reasonable expectation that there is support on enough of the key target platforms to keep the code running. The GNU C compiler's aggressive work on supporting C99 features has done a lot to make this possible.
The C standard is still being developed; in fact, I wrote the first draft of this column from the hotel after the October 2004 meeting of the committee. One major component of C standardization is the effort to retain as much compatibility as possible with the C++ language standard. For instance, the preprocessors of both languages are supposed to remain as compatible as possible. Much of this work has to do with processing Defect Reports. A Defect Report, or DR, is as the name suggests a report about a defect in the standard. DRs may cover errors or omissions in the standard; responses to them are bundled up and released. The first batch of fixes, Technical Corrigendum 1, was released in 2001.
Other new areas of work include a technical report on a "secure library." The name is something of a misnomer, but the intent is to create a library of functions which replaces functions from the standard C library, and provides better support for bounds-checking. This is a direct response to the observed tendency of security problems in C programs to come from buffer overflows.
The C committee meets twice a year, immediately following the meetings
of the ISO C++ committee. People in the U.S. who want to attend meetings
need to become members of the ANSI C committee. Pay dues, get on the
mailing list, and start attending meetings. It's fun! Experienced C
programmers will enjoy finding a room full of people who laugh when you
suggest that a new feature should use the static keyword.
- Way back when, before ISO and ANSI got involved, C was defined, for
most practical purposes, by a book called The C Programming
Language, by Brian Kernighan and Dennis Ritchie
- Dinkumware has a
complete online reference for the C99 standard library. (Click on the
little footprints icon to see it.)
- Many companies (including Apple) have online copies of the
GCC manual section discussing hosted and freestanding environments (as
interpreted by GNU C).
- Microsoft's list
of the C runtime functions that you have to use something else for in
Win32 makes for good bedtime reading.
- Did you know that you can run GNU
GCC on Windows? Free Software packages like Cygwin and MinGW include not only the GNU compiler
collection, but also a Unix-like shell environment (aka command shell or
text console).
- The IBM C and C++
compilers include optimizations for Power Architecture processors and
are built on a common code base allowing for easier porting of your
applications between platforms. At the time of this writing, XL
C/C++ Advanced Edition V7.0 for Linux (Y-HPC) is available as a 60-day
beta download (registration required).
- For an overview of how C for embedded differs from regular old C, see
Erich Styger's PowerPoint presentation ANSI-C
for Embedded and Optimized Compiler Usage (Freescale/Metrowerks,
2004). If your computer doesn't do ppt, see the
Google cache version.
- For more on embedded programming, see also Linux
system development on an embedded device (developerWorks, March 2002).
- For more on C99, see also Peter's previous developerWorks article,
Open Source development using C99 (developerWorks, March 2004).
-
David
Wheeler's Secure
Programmer column
on developerWorks' Linux zone is a valuable resource for C
programmers.
- The home page for
the C committee has links on documentation, membership, and more.
- NCITS handles membership for the ANSI
C committee (which they know as "J11").
- Users interested in C might find the Usenet groups comp.std.c and comp.lang.c.moderated to be of
interest.
- Have experience you'd be willing to share with Power Architecture
zone readers? Article submissions on all aspects of Power Architecture
technology from authors inside and outside IBM are welcomed. Check out the
Power
Architecture author FAQ to learn more.
- Have a question or comment on this story, or on Power Architecture
technology in general? Post it in the Power
Architecture technical forum or send in a letter to the editors.
-
Get a subscription to the Power Architecture Community Newsletter when
you Join the Power Architecture community.
- All things Power are chronicled in the developerWorks Power
Architecture editors' blog, which is just one of many developerWorks
blogs.
- Find more articles and resources on Power Architecture
technology and all things
related in the developerWorks Power
Architecture technology content area.
-
You'll find more valuable resources for Power Architecture developers in
the Linux
on Power Architecture Developer's corner and developerWorks eServer
domain.
- Download a IBM PowerPC 405 Evaluation Kit to demo a SoC in a simulated
environment, or just to explore the fully licensed version of
Power Architecture technology. This and other fine Power Architecture-related downloads are listed in
the developerWorks Power Architecture technology content area's downloads section.
Comments (Undergoing maintenance)





