Open source development using C99

Is your C code up to standard?

What is C99? Who needs it? Is it available yet? Peter Seebach discusses the 1999 revision of the ISO C standard, with a focus on the availability of new features on Linux and BSD systems.

Peter Seebach (developerworks@seebs.plethora.net), Writer, Freelance

Peter Seebach Peter Seebach has been a member of the ISO C standards committee since late 1996. He is only a little bitter that strsep() didn't make it into C99, and tends to use it anyway out of spite. He can be reached at developerworks@seebs.plethora.net.



24 March 2004

Also available in Japanese

Not all of the new C99 features are supported in the versions of gcc distributed with open source operating systems. However, a sufficient number are now widely available, so you can start looking seriously at adopting C99 features in new development, especially where they make a substantial difference in efficiency or clarity.

This article reviews the availability of C99 language and library features on recent releases of Linux and BSD. Because many of these features are standard features of gcc, a recent version of gcc will do the same thing on most other platforms. Library support, of course, varies from one distribution to another, or from one operating system to another.

Invoking gcc with a language standard

The GNU C compiler supports a number of different versions of the C programming language. You can select the version of the C standard to use on the command line, using the -std option. The default is not any version of the standard, but rather, the "GNU C" language, which has its own set of extensions. You can select common versions of the C standard with the following options:

C-ninety-what?

The C99 standard is the most recent revision of the ISO standard for C. A bit of historical background may be helpful. The C language was developed without committees and went through a lot of changes early on. Eventually, most vendors stabilized somewhere near the language described in the first edition (1978) of Kernighan & Ritchie's The C Programming Language, although extensions were commonplace. ANSI began work on a standard based on this book and on existing practice, and a standard became widely available in 1989-1990. This standard is widely referred to as "C89"; some wags refer to the language described in the 1978 edition of K&R as "C78."

Over the next ten years, compiler vendors continued developing new extensions and new features, and in 1999, a revised standard was released, representing a number of years of work on standardizing many of the most useful and widely supported new features. This standard is often referred to as the "C99" standard.

  • -std=c89 or -std=iso9899:1990
    The original C89 standard
  • -std=iso9899:199409
    C89, plus the changes in Normative Addendum 1
  • -std=c99 or -std=iso9899:1999
    The C99 revised standard

To enforce full compliance with a version of the standard, use the -pedantic option. This option is primarily useful for making sure your code will survive the transition to other compilers. For instance, if you're sharing a codebase with people who aren't using gcc, you probably want it on all the time. Note that the -pedantic flag will occasionally get some of the details of a given standard wrong. For instance, it might try to enforce a C89 rule on a C99 program, or might fail to enforce an obscure rule. It's still worth having it for testing. If you're trying to write portable code, there's a lot to be said for -std=c99 -pedantic -Wall.

The C89 standard introduced a new concept: the distinction between freestanding and hosted environments. A hosted environment is what most people are used to; it provides the full standard library, and execution always starts at main().

If you want the slightly different set of warnings and behaviors that are implied for a freestanding environment, use the -ffreestanding option.

The default is to assume a hosted environment. To address a common FAQ, yes, it is intentional that gcc gives warning for declarations of main() with arguments or return type other than those listed in the standard. While the C99 standard allows implementations to provide alternative declarations, they're never portable. In particular, the common practice of declaring main() with a return type of void is simply incorrect. (This is why NetBSD's kernels are compiled with the -ffreestanding flag.)


Language features

There are two parts of the C programming language. These are, confusingly, called the "language" and the "library." Historically, there was a bundle of commonly used utility code that everyone tended to reuse; this was eventually standardized into what's called the Standard C Library. The distinction was pretty easy to understand at first: If the compiler did it, it was the language; if it was in the add-on code, it was the library.

With time, however, the distinction has been blurred. For instance, some compilers will generate calls to an external library for 64-bit arithmetic, and some library functions might be handled magically by the compiler. For the purposes of this article, the division follows the terminology of the standard: features from the "Library" section of the standard are library features and are discussed in the next section of the article. This section looks at everything else.

The C99 language introduces a number of new features that are of potential interest to software developers. Many of these features are similar to features of the GNU C set of extensions to C; unfortunately, in some cases, they are not quite compatible.

A few features popularized by C++ have made it in. In particular, // comments and mixed declarations and code have become standard features of C99. These have been in GNU C forever and should work on every platform. In general, though, C and C++ remain separate languages; indeed, C99 is a little less compatible with C++ than C89 was. As always, trying to write hybrid code is a bad idea. Good C code will be bad C++ code.

C99 added some support for Unicode characters, both within string literals and in identifiers. In practice, the system support for this probably isn't where it needs to be for most users; don't expect source that uses this to be accessible to other people just yet. In general, the wide character and unicode support is mostly there in the compiler, but the text processing tools aren't quite up to par yet.

The new variable-length array (VLA) feature is partially available. Simple VLAs will work. However, this is a pure coincidence; in fact, GNU C has its own variable-length array support. As a result, while simple code using variable-length arrays will work, a lot of code will run into the differences between the older GNU C support for VLAs and the C99 definition. Declare arrays whose length is a local variable, but don't try to go much further.

Compound literals and designated initializers are a wonderful code maintainability feature. Compare these two code fragments:

Listing 1. Delaying for n microseconds in C89
    /* C89 */
    {
        struct timeval tv = { 0, n };
        select(0, 0, 0, 0, &tv);
    }
Listing 2. Delaying for n microseconds in C99
    // C99
    select(0, 0, 0, 0, & (struct timeval) { .tv_usec = n });

The syntax for a compound literal allows a brace-enclosed series of values to be used to initialize an automatic object of the appropriate type. The object is reinitialized each time its declaration is reached, so it's safe with functions (such as some versions of select) that may modify the corresponding object. The designated initializer syntax allows you to initialize members by name, without regard to the order in which they appear in an object. This is especially useful for large and complicated objects with only a few members initialized. As with a normal aggregate initializer, missing values are treated as though they'd been given 0 as an initializer. Other initialization rules have changed a bit. For instance, you're now allowed to have a trailing comma after the last member of an enum declaration, to make it just a bit easier to write code generators.

For years, people have been debating extensions to the C type system, such as long long. C99 introduces a handful of new integer types. The most widely used is long long. Another type introduced by the standards process is intmax_t. Both of these types are available in gcc. However, the integer promotion rules are not always correct for types larger than long. It's probably best to use explicit casts.

There are also a lot of types allowing more specific descriptions of desired qualities. For instance, there are types with names like int_least8_t, which has at least 8 bits, and int32_t, which has exactly 32 bits. The standard guarantees access to types of at least 8, 16, 32, and 64 bits. There is no promise that any exact-width types will be provided. Don't use such types unless you are really, totally sure that you can't accept a larger type. Another optional type is the new intptr_t type, which is an integer large enough to hold a pointer. Not all systems provide such a type (although all current Linux and BSD implementations do).

The C preprocessor has a number of new features. It allows empty arguments, and it supports macros with varying numbers of arguments. There is a _Pragma operator for macro-generating pragmas, and there's a __func__ macro, which always contains the name of the current function. These features are available in current versions of gcc.

C99 added the inline keyword to suggest function inlining. GNU C also supports this keyword, but with slightly different semantics. If you're using gcc, you should always use the static keyword on inline functions if you want the same behavior as C99 would give for the code. This may be addressed in future revisions; in the meantime, you can use inline as a compiler hint, but don't depend on the exact semantics.

C99 introduced a qualifier, restrict, which can give a compiler optimization hints about pointers. Because there is no requirement that a compiler do anything with this, it's done in that gcc accepts it. The degree of optimization done varies. It's safe to use, but don't count on it making a huge difference yet. On a related note, the new type-aliasing rules are fully supported in gcc. This mostly means that you must be more careful about type punning, which is almost always going to invoke undefined behavior, unless the type you're using to access data of the wrong sort is unsigned char.

Array declarators as function arguments now have a meaningful difference from pointer declarators: you can put in type qualifiers. Of particular interest is the very odd optimizer hint of giving an array declarator the static type modifier. Given this declaration: int foo(int a[static 10]);

It is undefined behavior to call foo() with a pointer that doesn't point to at least 10 objects of type int. This is an optimizer hint. All you're doing is promising the compiler that the argument passed to the function will be at least that large; some machines might use this for loop unrolling. As old hands will be well aware, it's not a new C standard without an entirely new meaning for the static keyword.

One last feature to mention is flexible array members. There is a common problem of wanting to declare a structure that is essentially a header followed by some data bytes. Unfortunately, C89 provided no good way to do this without giving the structure a pointer to a separately allocated region. Two common solutions included declaring a member with exactly one byte of storage, then allocating extra and overrunning the bounds of the array, and declaring a member with more storage than you could possibly need, underallocating, and being careful to use only the storage available. Both of these were problematic for some compilers, so C99 introduced a new syntax for this:

Listing 3. A structure with a flexible array
    struct header {
        size_t len;
        unsigned char data[];
    };

This structure has the useful property that if you allocate space for (sizeof(struct header) + 10) bytes, you can treat data as being an array of 10 bytes. This new syntax is supported in gcc.


Library features

That's fine for the compiler. What about the standard library? A lot of the library features added in C99 were based on existing practice, especially practices found in the BSD and Linux communities. So, many of these features are preexisting ones already found in the Linux and BSD standard libraries. Many of these features are simple utility functions; almost all of them could in principle be done in portable code, but many of them would be exceedingly difficult.

Some of the most convenient features added in C99 are in the printf family of functions. First, the v*scanf functions have become standardized; for every member of the scanf family, there is a corresponding v*scanf function that takes a va_list parameter instead of a variable argument list. These functions serve the same role as the v*printf functions, allowing user-defined functions that take variable argument lists and end up calling a function from the printf or scanf family to do the hard work.

Secondly, the 4.4BSD snprintf function family has been imported. The snprintf function allows you to print safely into a buffer of fixed size. When told to print no more than n bytes, snprintf guarantees that it creates a string of length no more than n-1, with a null terminator at the end of the string. However, its return code is the number of characters it would have written if n had been large enough. Thus, you can reliably find out how much buffer space you would need to format something completely. This function is available everywhere, and you should use it just about all the time; a lot of security holes have been based on buffer overruns in sprintf, and this can protect against them.

A number of new math features, including complex math features and special functions designed to help optimizing compilers for specific floating point chips are in the new standard, but not reliably implemented everywhere. If you need these functions, it is best to check on the exact platform you're targeting. The floating point environment functions are not always supported, and some platforms will not have support for IEEE arithmetic. Don't count on these new features yet.

The strftime() function has been extended in C99 to provide a few more commonly desired formatting characters. These new characters appear to be available on recent Linux and BSD systems; they aren't always widely available on somewhat older systems, though. Check the documentation before using new formats.

As noted, most of the internationalization code is not reliably implemented yet.

Other new library features are typically not universally available; the math functions are likely to be available in supercomputer compilers, and the internationalization functions are likely to be available in compilers developed outside the United States. Compiler vendors implement the features they have a call for.


Looking forward

It is generally best to be conservative in adopting new features. However, many of the C99 features are now sufficiently widespread that new development projects can reasonably take advantage of them. The gcc compiler suite is sufficiently widely available that most projects can reasonably assume that it will be an option on a broad variety of target platforms. If you're primarily targeting Linux or BSD systems, or both, you can count on at least partial support for a great number of the new C99 features. These features were adopted based on perceived need and real-world implementation experience, and they should serve you well.

When deciding which features you're willing to depend on, don't just look at what's available on the computer you're typing on; think about the target system or systems. Do you want to require people to upgrade to a more recent distribution of an operating system? Will your target market mind having to get a new compiler? Test a feature on likely target systems before you commit to using it.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11385
ArticleTitle=Open source development using C99
publish-date=03242004