The previous articles in this series describe the C type system. This article caps the series with a review of how to effectively use the C type system, discussing coding conventions (such as Hungarian notation) and common pitfalls in the type system.
The type of a variable documents several things, both to the compiler and to future readers of your code. The compiler is looking to find out what your technical requirements are. How much storage do you need? What semantics are you expecting? Future readers are also looking at these, but have additional questions in mind. The compiler needs to know the anticipated range of the variable i so it can allocate storage and knows when to perform overflow checking (if it does that sort of thing). The reader needs to know the anticipated range of the variable i because the system stopped working when the 32,763rd user signed up, and it's important to get a feel for how much of your code is going to assume that there are never more than 32,767 records in a database.
Articles on types in C can be divided into a couple of categories. Some
will tell you what the Windows® type sizes are, although some will tell
int is 16 bits, and others will say it's
32; either way they'll just say it's "C." Others
will tell you how important it is not to make any assumptions. In
practice, both are mostly right.
If you don't need a range of over 16 bits, just use
int. It will almost
always be the most convenient native type, and will generally perform
acceptably. If you want to give the compiler more leeway, use
int_least16_t. If you need more, use
int_leastXX_t types offer you a bit more control, but are only reliably
available in C99 compilers.
Avoid the exact-width types unless you really, really, mean it. None of them are guaranteed to be provided. You might assume that every platform you ever see will offer the standard 8/16/32-bit types, and you might even be right. Still, it is rarely the case that an algorithm depends on those exact sizes, and allowing the compiler more leeway might help a lot. Your code might not ever need to run on a 60-bit machine, but even on a 64-bit system, the compiler might be able to dramatically increase performance by using a 64-bit native type instead of going to a lot of trouble to fake up a 32-bit type.
Always code defensively. Read the C standard, and read your compiler's documentation. Some books and articles advise you to try things out; this can be very dangerous advice. You have no guarantee that a given experiment's results will be the same on the next version of your compiler, let alone on a new computer. Write to the specification, not to what happens to work this week. Even a simple optimization flag might change how the compiler treats your subtly buggy code!
Someone once proposed an elegant and effective way to improve the
of code: prefix variables with something indicating their function. For
instance, the prefix
ct might indicate a counter.
The proponent, Charles Simonyi of Microsoft, was originally from Hungary,
and the pattern became known as Hungarian Notation (for more information about Hungarian Notation, see Resources).
Then tragedy struck. Someone who had entirely and catastrophically missed
the point of Hungarian Notation started using it to denote, not the usage
or function of variables -- their "type" in an abstract sense -- but the
specific language type chosen to implement them.
For instance, instead of using prefixes to distinguish between
counters and array indexes, you would use prefixes to distinguish
There is no better word for it: This is stupid. In C, the type of a variable is always known to the compiler, and is usually known to the programmer. Encoding the storage type in the name gives you no useful information at all. Worse, it makes it complicate, tedious, and often error-prone to correct a poor choice of type.
The compiler already catches type clashes; this notation isn't helping you do anything the compiler couldn't.
When this article talks about "types," it refers to the things handled automatically by the compiler, which should never be encoded in a variable name. Actual Hungarian Notation, by contrast, is a potentially useful habit.
For example (this example is based in spirit on a wonderful blog article about Hungarian notation; see Resources), consider the following code snippets:
iTotal = iOne + iTwo; // add two integers
cchTotal = cchOne + coTwo; // add count of characters to count of octets
The first example gives you no indication that anything might be amiss. The second, using "cch" for counts of characters, and "co" for counts of octets, shows that the code only works if characters are always eight bits; it will produce meaningless numbers on Unicode data. In the second example, you can see the bug; the first example actually makes it harder to see. If this code occurred in an editor that was supposed to handle Unicode, it would be a potentially serious bug. Tracking "types" of this sort is a useful thing to do. But, as you can see, recording the declared type can, at best, give you the mistaken impression that you've checked for the compatibility of types when you really haven't.
One key aspect of the C type system is its extensibility. You can
declare a new type using the
keyword, and then use it transparently thereafter. Unfortunately,
most compilers do not give you any extra type checking from the
typedef. For instance, if you
define two types which are both equivalent to
int, the compiler probably won't warn you if
you mix them.
typedef int foo; typedef int bar; foo *a; bar b; a = &b;
This code is semantically incorrect, but it is likely to compile without warnings. Don't count on the compiler's type-checking to catch mistakes like this.
Typedefs are most useful with
union types, or other
aggregate types. They can also be somewhat useful with arithmetic types,
for standardizing a coding
decision, such as the type of object used to represent indexes into a
table. Not all uses of
typedef are good, though. For instance, many
programmers will come up with typedefs with names such as "byte" or
"word," then use those for everything, and change only the typedefs when
going to a new platform. This is an exceptionally bad idea.
It might not seem immediately obvious why this is such a bad idea. Part of the problem is that why it's a bad idea depends on how you're using it. If you're using "word" to refer to "an object that is exactly the native word size of our first platform," then on other platforms, you have a typedef "word" that represents something which isn't a native word. If "word" always refers to the native word size, it has unpredictable characteristics. In fact, in most cases when you see typedefs for "WORD" or "DWORD," the problem is not that the developer chose poorly between these alternatives; it's that the developer never gave the question a moment's thought, and each developer to touch the system since then has, unconsciously, picked one or the other of these interpretations. As a result, the entire code base is now full of typedefs used in painfully inconsistent ways.
Much like constants, typedefs provide a layer of abstraction. It is important to make sure that you have thought carefully about what you are trying to abstract. A poor choice might be just as rigid as an explicit type, but suffer from being obfuscated, or might be so vague as to be useless.
Furthermore, at least on modern compilers,
standard types are available for the sorts of things that "word"
typedefs are typically intended to accomplish.
If you want
int32_t, you know where to find it:
Most code ought to be able to work comfortably within that set of guarantees which the C standard provides for all implementations. C has been described as defining an abstract machine for which you can program; this can be a good way to think about C for most development work. You can pursue portability in two ways. One is to isolate unportable constructs so you can replace or redefine them on different platforms. Another is to avoid unportable constructs in the first place.
The second is almost always preferable. You might occasionally need to write something unportable for performance reasons, but this is rare. On most systems, the marginal differences in performance will be tiny. More often, unportable constructs are adopted for developer convenience, or because the developer simply doesn't know the construct is nonportable.
Don't get careless about overflow. While many systems do something convenient or predictable in the case of overflow on signed types, the standard offers no guarantees. Unsigned types are guaranteed safe, but their behavior might not be what you want. On a typical 32-bit system, a small negative number turns into a tad over four billion. A classic example of this going horribly wrong is a delay loop trying to delay for -3 seconds, with an unsigned value somewhere along the way.
Do not treat pointers as integers. While many systems tolerate some amount of abuse along these lines, some do not. Trickery such as storing flags in "unused" bits of a pointer is sheer madness, and often results in a porting nightmare on a new platform.
Even if the code you've written to read and write data from a file is "portable," in that it will compile everywhere, the resulting files might not be portable between separately compiled instances of your program. Unless you're careful, data files written on one system are often unreadable on another, especially when binary data formats are used. If you want data files to be portable, you should generally use plain text. The downside of this, of course, is the comparatively large amount of space this takes up. Plain text is somewhat inefficient, especially for storing numbers; the text "100000" is six bytes for the numbers, and at least one more for a space, comma, or other separator; on most systems, typical numbers can easily be stored in four bytes. The corresponding up side is a much better chance of reading data in correctly, and you can always compress the file if you want. File formats such as XML can exacerbate the space cost, moving from relatively inefficient to woefully inefficient. Such formats might improve readability, but it's not an automatic win; a poorly considered XML file is worse than plain text.
Floating point numbers are particularly non-portable, unless all the
systems involved use a common format. They might not, although a surprising
number of modern systems have adopted the IEEE 754 specification.
If unsure, write things out as text. Remember that, while all binary floating
can be represented as decimal integers, not all will be accurately
represented using printf's default precision; you might wish to explicitly
specify the precision you need.
The hexadecimal floating point
%a) introduced in C99 are there to
help reduce the inaccuracy of printing and scanning floating point numbers.
If you want reliable reproduction of floating point numbers, you probably
ought to use the hexadecimal formats in nearly all cases.
You should consider a few possible kinds of efficiency when programming
in C. If you are concerned about data storage requirements,
give careful consideration to the
possible range of values, and use the smallest type that can represent the
range you care about. Don't worry about this for intermediate values and
calculations unless you have insanely huge arrays of very small numbers;
in many cases, using types smaller than
make your code larger and slower, eradicating any potential savings you
might have gotten.
If you are concerned about
int whenever possible, and look to the qualifiers and
storage-class specifiers to give the compiler hints about what you're
The most important kind of efficiency, for most programming projects,
remains developer efficiency. For this, make liberal (but carefully
considered) use of
typedef, use structure types rather than bundles of
variables passed separately, and make sure your compiler has as many
warnings enabled as you can find. You might prefer to use types with larger
ranges than you think you need, as anyone doing COBOL maintenance work in the
late '90s will remember.
This section covers some of the particularly common things that introduce mysterious bugs in C programs; I've mentioned some of them previously, but they're worth stressing again. Most common pitfalls involve undefined behavior, but a few involve cases where the behavior is perfectly well defined, but merely surprising.
- Pitfall #1: Overflow and underflow
If the value you are calculating or representing cannot be represented in the type you're using, strange things might happen. The most common overflow problems involve intermediate values calculated during the evaluation of an expression. Consider the expression
x * 3 / 5. It is obvious to even the most casual observer that, if x can be represented as an
int, that 3/5 of x cannot overflow. However, a slightly less casual observer might notice that
x * 3might well overflow, yielding a very different value than expected (or just a crash). You might ask why the programmer doesn't just rewrite as
x / 5 * 3. However, while this will no longer overflow, it might underflow. (4*3/5) is two, but (4/5*3) is zero.
3 / 5 * xis even worse, since it will always produce a nice round zero. It might seem tempting to switch to floating point operations to avoid these problems, but it is easy to do so incorrectly. For instance, one might attempt
(float) (3/5) * x; this tells the compiler to perform the integer calculation 3/5 (producing an integer zero), then convert the result to floating point.
- Pitfall #2: Misunderstanding
const char *pdeclares a pointer, which can be changed, to a character, which can't. The declaration
char * const pdeclares a pointer, which cannot be changed, to a character, which can. Secondly, while you can pass a pointer without the
constqualifier to a function expecting a pointer with the
constqualifier, you can't do the same with pointers to pointers; the language in the standard which allows the qualifier mismatch to be ignored is not recursive. It is, in fact, safe to convert a pointer to add more
constqualifiers to it through casting; the resulting pointer will never be used for any modifications that were not already permitted. Casting away the
constqualifier is not nearly so safe, and might result in undefined behavior.
Pitfall #3: Another
constmisunderstanding is assuming that since a string literal isn't declared as
const, the contents are modifiable. I don't think I know anyone who programs in C and hasn't been bitten by this one at least once. It's particularly nasty due to its tendency to show up only when something else has already gone wrong. (As a special case, while the contents of the members of
argvare modifiable, and the
argvvalue itself is modifiable, it is not necessarily the case that the pointers in
argvare modifiable in portable code, nor is it clear what effect modifying them should have.)
Pitfall #4: Portability
You think you've got the C type system figured out, and then you have to move to a new platform. Maybe you're a Mac programmer who's converting from 68k to PowerPC®, or from PowerPC to Intel®. Maybe you're an Xbox programmer converting from Intel to PowerPC. Maybe you need to build a robot submarine on PowerPC. Whatever your program is, if it was worth writing, you will probably have to port it. The worst part about this pitfall is that it's really not that hard to write portable code from the start, and it's so much harder to port the code later. Whenever people say "oh, it would be too hard to port," there's generally a poorly considered assumption that could have been cheaply avoided a year or five earlier.
Pitfall #5: Pointers
The number of things people misunderstand about pointers can easily cause the mind to boggle. Common mistakes involve not realizing that they need to point to something, not realizing that the addresses of automatic variables become invalid when their scope is left, or trying to cast pointer types rather than casting values. (Strictly speaking, some of these are more questions of semantics than of the type system in general.)
At this point, you should know just about everything you need to know
types and have the resources to go look up anything that slipped through
the cracks. As always, there's no substitute for an actual copy of the C
standard, although a good C book can go a long way. The rumors of Usenet's
death are greatly exaggerated; you can still get good information about C
in comp.lang.c, comp.lang.c.moderated, and comp.std.c. But get out there
and write some code. Use an
enum instead of a big list of
directives. Come up with convenient
typedefs for pointers to incomplete
struct types for data hiding. Clean up some old code that puts
on all the objects of type
int, or that uses
WORD as though it were a
meaningful type name. Have fun!
Steve Summit, author of the comp.lang.c FAQ, contributed many helpful corrections and suggestions for this article; it is not, however, his fault that some of my errors remain.
- Hungarian horntail:
- Hungarian notation is often misunderstood.
- Joel Spolsky talks more about Hungarian notation.
- Simonyi's original paper on Hungarian Notation is pretty good.
- In his important work "How to write unmaintainable code," Roedy Green points out that thanks to this, nothing can kill a maintenance engineer faster than a well planned Hungarian Notation attack.
This article represents everything you ever wanted to know about C
types. For a gentle introduction to C types, see Types by P.J. Plauger and
A much more detailed history of C
was written by dmr, who ought to know.
FAQ is full of useful information about C, including the type system,
common pitfalls, and nearly everything else. The book version, which has
additional content, is particularly valuable.
Andrew Koenig's paper, "C Traps and
Pitfalls" (in PDF format), was later expanded into an excellent book of the same name.
Henry Spencer's "Ten commandments
for C programmers" remain topical and relevant today.
Learn more of the ins and outs of C programming in the comp.lang.c FAQ
(Frequently Asked Questions) and the comp.lang.c IAQ
(Infrequently Asked Questions).
See all four articles in this series.
- Take the tutorial series
An introduction to compiling for the Cell Broadband Engine
Find more articles of interest in the IBM developerWorks Power
Architecture technology zone.
Keep abreast of all the latest Power Architecture-related news, articles,
and downloads: subscribe to the Power
Architecture Community Newsletter.
Get products and technologies
See all Power Architecture-related downloads on one page.
Take part in the IBM developerWorks Power Architecture discussion
Send a letter to the editor.
Peter Seebach joined the ISO C committee as a hobby some years ago. His
favorite type is
int. He has never had any of his own code fail to run
on 64-bit systems. He would love to hear about errors or omissions in
these articles; contact him at email@example.com.