Skip to main content

Standards and specs: The nitty-gritty on the C committee

Find out what goes into the making of an ISO standard

Peter Seebach has been using computers for years and is gradually becoming acclimated. He still doesn't know why mice need to be cleaned so often, though.

Summary:  The C standard is a few hundred pages full of specifications and requirements. This month's Standards and specs looks at some of the different components of the C standard, and how they might affect Power Architecture developers and implementors.

Date:  23 Nov 2004
Level:  Introductory
Activity:  906 views

Way back when, before ISO® and ANSI® got involved, C was defined, for most practical purposes, by a book called The C Programming Language, written by Brian Kernighan and Dennis Ritchie (see Resources). Most often referred to simply as "K&R," this book was as much of a standard as there was for the C programming language from its publication in 1978 until its replacement with a second edition in 1988. (The very early 2nd edition K&R books, inconveniently, ended up not being quite aligned with the final standard; this was corrected in 1989, when the final standard was released.)

By the time standardization work began, multiple C compilers were available, produced by several vendors. New features were being added, gradually, but support for a new feature might be unreliable, or erratic, from one system to another. The C standard, released by ANSI in 1989 (and again, by ISO, in 1990) offered vendors a fixed target to implement and developers a stable platform to develop for.

Yes, that's right; the standard was released twice. An ANSI committee did the original work, and the involvement with ISO took a while to straighten out. Since then, the ANSI and ISO committees have met at the same time, and in the same place, to try to prevent any similar problems in the future.

This article offers a brief overview of the C standard as it exists today and some insights into the technical decisions, and compromises, that made the standard possible. Rather than focusing on a broad overview of C, however, this article looks at some of the nitty-gritty that makes a standard work.

The road to good standards is paved with compromises

A recurring theme in standardization is compromises. Users want the fastest possible performance, and they also want the most predictable behavior. Vendors want the latitude they need to provide both of these, and they want reasonable development costs. In many cases, the interests of different groups may not align. For instance, vendors might need to target a CPU on which integer division has certain properties, while users might want predictable properties across a variety of systems. Providing the predictable properties users want could impose a substantial performance cost on systems whose CPUs don't work that way by default. Neither users nor vendors will be happy with reduced performance. The C standard is a great place to study the art of compromise in a standard.

The C language is intended to be portable to a very broad range of systems and to allow for reasonable performance on those systems. Some languages specify a lot more about the order of operations, and the exact outcomes of operations, than C does. C's comparatively loose specification comes from a desire to, as often as possible, allow operations to be implemented on a broad variety of hardware, using native instructions.

For instance, in C, the order in which function arguments are evaluated is not specified; indeed, up to a point, they might be evaluated simultaneously. A simple line of code such as:

printf("%d, %d\n", i++, i++);

produces what is called "undefined behavior." That means not only that the language doesn't guarantee which argument is evaluated first -- it doesn't guarantee anything at all. The resulting code could, for instance, crash. It might only increment the variable i once. It might increment i twice, then restore it to its previous value. There are no promises made about what happens.

That's often regarded as inconvenient. On the other hand, consider a similar line of code:

printf("%f, %f\n", f * f + g, g * g + f);

In this case, the behavior is perfectly well-defined, and the user might well appreciate having a machine with a modern floating point unit performing all of the floating point operations at once, or coalescing them into a pair of fused multiply add instructions, the second of which might start before the first is complete.

A rule which strictly enforced the order of evaluation of function arguments could make some optimizations like this harder to perform consistently. The result is a compromise: a position which leaves users the option of achieving specific results (by spelling them out more specifically), and implementors leeway to optimize aggressively. Some languages, such as Java™, have gone the route of specifying operations more completely, at some possible cost in performance.

Compromises like this often evolve during the development of new features. The C standard library is a collection of useful and widely portable routines, which are sufficiently widely used to merit formal inclusion in the C standard. Most of the library evolved as a compromise between the difficulty of porting some functionality to a variety of systems, and the importance of that functionality to average users.

One of the recurring themes in C standardization is that some issues are left as "quality of implementation" issues. This basically means that the marketplace is expected to be able to select better implementations over worse ones, without the standard strictly requiring something that could be impractical for some vendors.


Freestanding and hosted environments

In some cases, the compromises needed to make everyone happy simply aren't possible. Users developing for desktop systems, workstations, and servers generally want a great deal of functionality that can be readily implemented across a broad variety of systems. Unfortunately, this functionality may be dependent on features that, while common to "computers" as most people think of them, are hardly available on embedded systems.

Since C is also used for embedded development, the trade-off between size and functionality becomes an issue. Furthermore, embedded systems often don't have the framework or resources needed to provide many of the standard library functions. People developing garage door openers with a kilobyte or two of memory don't necessarily want to have the code for formatting floating point numbers in ASCII loaded in their very limited memory footprint. Similarly, many of the standard library functions are of limited applicability in some environments: the garage door opener probably doesn't have any kind of disk, so most of the file manipulation functions are moot.

Building a standard that addresses the needs of such diverse markets is tricky. Users want real guarantees from a standard, and removing most of the library would make the language much less useful for desktop users. On the other hand, requiring the whole library is unacceptable for many embedded users.

The compromise reached was to distinguish between "hosted" and "freestanding" environments. A hosted environment is, roughly speaking, an environment in which programs are being supported by some kind of operating system that provides the usual selection of services. A freestanding environment is likely one in which the program lives on its own and controls the hardware directly. These generalizations don't always apply, but they provide a good basis for understanding the distinctions between these types of environments.

For instance, in a freestanding environment, it is not required that the printf function be available. Freestanding environments may not have any recognizable input or output devices, so demanding that they provide a way to format output might not make much sense.

The features considered to be part of the C language (as opposed to the standard library) are required in both hosted and freestanding environments. By contrast, only small portions of the standard library are required. In C99, the features required in a freestanding environment are those specified in these header files:

<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>

These headers provide only support for use of the basic C types (including functions with variable argument lists). For instance, they define macros related to the core integer and floating point data types, as well as the new (in C99) bool type.

There are a few surprises for many people in looking at real examples of hosted and freestanding environments. For instance, the kernel of a Unix-like system is probably considered to be compiled in a freestanding environment. On NetBSD, the compilation flags for building the kernel include -ffreestanding, the flag used to warn the GNU C compiler to override its usual compilation support for the standard library. In freestanding mode, the compiler doesn't provide built-in support for common library functions; normally, GCC is aware of some library functions even when corresponding headers haven't been included, and might replace them with inline instructions or otherwise optimize them. Also, in freestanding mode, the compiler is less fussy about the declaration of main.

There's another, more subtle, distinction. In a hosted environment, execution of a C program begins at a function named main, returning int and taking either zero or two arguments. Other signatures may be allowed by specific implementations, but they are pure extensions, and an implementation is not required to accept any other declarations of main. In a freestanding environment, there is no standardized place for code to begin. For instance, the previously mentioned NetBSD kernel starts execution in a routine named start. While the kernel does have a main routine, that routine is called by the initial start routine (after roughly 400 lines of assembly code to identify the CPU type, build initial data structures, and otherwise prepare the machine for execution). The main routine in the kernel, unlike one in a hosted environment, has a return type of void, because there is no host environment to which a value could be returned.

Which one do I want?

Most desktop compilers are, or at least can be, a hosted environment, and the hosted environment is what most programmers who aren't doing embedded work think of when they talk about "C." In many cases, especially on modern systems, developers are very likely to prefer a hosted environment. The promise of the functionality of the standard library (standardized I/O, memory management, the math library) and other features (such as complex arithmetic) is worth a great deal to a developer.

Unfortunately, it's also potentially very expensive to an implementor. A simple development kit might leave the question of providing such an environment to the developer. In some cases, the work necessary to get a Unix-like kernel running on a board may be less than the work it would take to implement even the subset of kernel functionality directly needed for an application. As an obvious example, many simple home routers and gateways are, under the hood, running some Unix-like operating system. A more famous example is the TiVO®, which was one of the first Linux-based appliances on the market, and certainly one of the most visible.

In short, in many cases, if all you have is a freestanding environment, the logical step is to load a hosted environment on it. Hosted environments are, in most cases, much more comfortable development environments, in which a lot more code can be written portably.

A particularly famous example of a freestanding implementation -- and one which caught many developers by surprise -- is Microsoft® Windows® (both win16 and win32). Programs starting at a function called WinMain, which do not start with input or output streams to manipulate, are clearly not traditional hosted programs. What surprised a lot of people, though, was that sprintf wasn't provided either; instead, you had to use a special function called wsprintf. So, because there's no textual console for most Windows applications, the function most programmers use to construct strings with data in them (say, printing numbers in the middle of a message) is simply unavailable. Modern development systems are more likely to provide a more complete library; this is considered a quality of implementation issue.

This highlights a limitation of the simple two-way split. While most people would expect a desktop computer operating system to provide a hosted environment, a graphical OS might not adapt well to the simple textual input and output model the standard presumes. But this still leaves a vendor free to omit major pieces of support that programmers are reasonably expecting to be available in such an environment. Of course, just adding more types of implementations makes life harder for vendors, test suites, and ultimately developers. The current compromise is a good one.

Conveniently, complete source for implementations of the standard C library is available from a variety of sources (see Resources); this generally makes it possible to add the specific functions needed to an environment which doesn't provide them.


Feature test macros

One of the features that was standardized in C89 was limit macros. For instance, there's a macro, INT_MAX, which defines the maximum value an object of type int can hold. These macros can be used to test whether a given operation can be done using the (typically optimal for speed) plain int type, or whether it needs the larger guaranteed range of the long type. Sometimes, a macro is defined only if a given feature is available; this can be called a "feature test macro."

Note that limit macros existed before C89, but they weren't predictable. The nature of the problem was that you couldn't necessarily guess the range of int without help from the compiler.

Moving ahead to C99, the current iteration of the standard provides feature test macros for a number of optional features. For instance, an implementation providing an integer type exactly 32 bits wide should define the corresponding macro, INT32_MAX. If the type is unavailable, the macro should not be defined. Macros are provided for testing the availability of a number of floating point features as well.

Along a similar line, some systems developed standard ways to request, or signal the availability of, common extensions. For instance, a program running on a UNIX® system might define the _POSIX_C_SOURCE macro before including headers, to indicate that POSIX extensions to the standard headers were desired.

Note that this usage is subtly different from the Open Group's use of macros to request that features be provided. In the C99 standard, the implementation defines macros, and the application tests for them. However, the use of macros to request or enable features is not unheard of. The most obvious example is the NDEBUG macro, which disables the assert() function-like macro, if it has already been defined when <assert.h> is included.

Of course, both types of macros are called "feature test macros," so the terminology can be a little ambiguous.


Informative and normative annexes

There are some parts of the C standard that are attached to the end of the main body of the text. These come in two varieties: a normative annex has the same force as the rest of the standard, while an informative annex is merely polite suggestions. The same terminology is used for the text in the body of the standard: the standard's text is normative, but the footnotes and examples in the standard, and the associated Rationale document, are informative. Informative material is offered to help vendors do the best possible job, without imposing unduly burdensome requirements.

A normative annex may seem a little counterintuitive; why not just put it in the main body of the standard? One reason would be an entire feature which is optional, but which, if it is provided, must comply with specific requirements (this is another application for feature test macros). One example of how this can be useful is the normative annex for floating point mathematics.

There is a standard, IEC 60559, for floating point arithmetic. This standard makes a number of promises about the precision available, the handling of overflow, rounding modes, and other things which floating point users are likely to care about a great deal. (This standard is also sometimes called IEEE 754.) However, not every processor supports IEC 60559 floating point.

And so requiring all implementations to provide complete support for a fairly large standard would be unduly burdensome. But at the same time, it's very hard on developers to have no way of checking for the availability of reliable or predictable floating point. How to solve?

A normative annex nicely solves this problem: in this case, Annex F. An implementation which conforms to the additional requirements described in this annex can predefine a preprocessor macro, __STDC_IEC_559__, which can be used in preprocessor code to test for conformance to these additional requirements. This is normative text, so any implementation defining the preprocessor macro must actually provide conforming arithmetic. Once again, a compromise between requiring everything of everyone, and requiring nothing of anyone.

An example of an informative annex is the list of undefined and implementation-defined behavior. This is provided because it is useful, not because the information couldn't be extracted from the standard through diligent effort. These are to be contrasted with documents such as a Technical Corrigendum or a Normative Addendum, which are expected to be incorporated into future revisions of the standard.

Note that when new features are added, they don't necessarily get adopted right away. Some of the new features in C99 aren't as widely available as one might hope, and many users are wary of using features that haven't yet been implemented everywhere. However, many of the new features are widely enough available to make it practical for developers of new code to start using them, with the reasonable expectation that there is support on enough of the key target platforms to keep the code running. The GNU C compiler's aggressive work on supporting C99 features has done a lot to make this possible.

Another way the standard changes is through reported "bugs" in the standard. A Defect Report, or DR, is (as the name suggests) a report about a defect in the standard. DRs may cover errors or omissions in the standard; responses to them are bundled up and released. The first batch of fixes, Technical Corrigendum 1, was released in 2001.


Market forces, C, and standardization

Some languages, such as Java (TM) (R) (C) (Ph.D.), use trademark law or other similar laws to enforce standardization. By contrast, the conformance to the C standard is enforced mostly by market forces. The major limitation is that a compiler which is sold as a conforming compiler could open a vendor to false advertising claims if it is not conforming. In practice, though, most compilers are used in a mildly non-conforming mode, full of extensions and special features. Still, standards conformance matters to many customers, and test suite vendors make a living being able to check compilers out reliably.

One major factor in C's success has been the competitive market. The early C compiler market was fairly competitive, with a lot of very good C compilers. The landscape shifted, fairly dramatically, with the development of the GNU C compiler. When GCC first came out in 1987, it wasn't serious competition for the commercial compilers. Times have changed, and GCC is now the compiler of choice for some vendors, including Apple®.

The competitive landscape has been substantially altered by GCC. It's not just that it's a free compiler, coming into a field where professional products still sell for hundreds (or thousands) of dollars. It's also that GCC has consistently pushed hard at the boundaries of quality of implementation. I've seen a compiler whose sole diagnostic was "Error in source code." I've had compilers that couldn't reliably identify where an error was. By contrast, GCC's diagnostics are excellent. Likewise, GCC has pushed for new features where other compilers haven't. Some of the new C99 features probably exist largely because of prior art in GCC.

Furthermore, GCC has provided a sandbox everyone can benefit from playing in. After some initial conflicts with the NeXT people over the need to contribute new code back to the project, GCC has provided a strong incentive for vendors to open up, make changes and related documentation available, and generally improve the state of the art for all GCC users on all platforms.

It is instructive to compare the success and history of GCC with the effect that free compilers (such as GNAT, for Ada) have had on other languages. It's clear that GCC has made C more accessible to the masses.


Other extensions and ongoing development

Some of the new features in C99 aren't as widely available as one might hope, and many users are wary of using features that haven't yet been implemented everywhere. However, many of the new features are widely enough available to make it practical for developers of new code to start using them, with the reasonable expectation that there is support on enough of the key target platforms to keep the code running. The GNU C compiler's aggressive work on supporting C99 features has done a lot to make this possible.

The C standard is still being developed; in fact, I wrote the first draft of this column from the hotel after the October 2004 meeting of the committee. One major component of C standardization is the effort to retain as much compatibility as possible with the C++ language standard. For instance, the preprocessors of both languages are supposed to remain as compatible as possible. Much of this work has to do with processing Defect Reports. A Defect Report, or DR, is as the name suggests a report about a defect in the standard. DRs may cover errors or omissions in the standard; responses to them are bundled up and released. The first batch of fixes, Technical Corrigendum 1, was released in 2001.

Other new areas of work include a technical report on a "secure library." The name is something of a misnomer, but the intent is to create a library of functions which replaces functions from the standard C library, and provides better support for bounds-checking. This is a direct response to the observed tendency of security problems in C programs to come from buffer overflows.

The C committee meets twice a year, immediately following the meetings of the ISO C++ committee. People in the U.S. who want to attend meetings need to become members of the ANSI C committee. Pay dues, get on the mailing list, and start attending meetings. It's fun! Experienced C programmers will enjoy finding a room full of people who laugh when you suggest that a new feature should use the static keyword.


Resources

About the author

Peter Seebach

Peter Seebach has been using computers for years and is gradually becoming acclimated. He still doesn't know why mice need to be cleaned so often, though.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=32324
ArticleTitle=Standards and specs: The nitty-gritty on the C committee
publish-date=11232004
author1-email=crankyuser@seebs.plethora.net
author1-email-cc=htc@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers