Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Everything you ever wanted to know about C types, Part 4: Portability and pitfalls

Notation, programmer efficiency, and avoiding common mistakes

Peter Seebach, Freelance author, Plethora.net
Peter Seebach
Peter Seebach joined the ISO C committee as a hobby some years ago. His favorite type is int. He has never had any of his own code fail to run on 64-bit systems. He would love to hear about errors or omissions in these articles; contact him at developerworks@seebs.plethora.net.

Summary:  Effectively use the C type system, with help from Peter Seebach, as he covers Hungarian notation (the good kind and the bad kind), using typedef, portability issues, and major pitfalls.

View more content in this series

Date:  02 May 2006
Level:  Intermediate

Activity:  7506 views
Comments:  

The previous articles in this series describe the C type system. This article caps the series with a review of how to effectively use the C type system, discussing coding conventions (such as Hungarian notation) and common pitfalls in the type system.

Using types in C

The type of a variable documents several things, both to the compiler and to future readers of your code. The compiler is looking to find out what your technical requirements are. How much storage do you need? What semantics are you expecting? Future readers are also looking at these, but have additional questions in mind. The compiler needs to know the anticipated range of the variable i so it can allocate storage and knows when to perform overflow checking (if it does that sort of thing). The reader needs to know the anticipated range of the variable i because the system stopped working when the 32,763rd user signed up, and it's important to get a feel for how much of your code is going to assume that there are never more than 32,767 records in a database.

Articles on types in C can be divided into a couple of categories. Some will tell you what the Windows® type sizes are, although some will tell you that int is 16 bits, and others will say it's 32; either way they'll just say it's "C." Others will tell you how important it is not to make any assumptions. In practice, both are mostly right.

If you don't need a range of over 16 bits, just use int. It will almost always be the most convenient native type, and will generally perform acceptably. If you want to give the compiler more leeway, use int_least16_t. If you need more, use long, or int_least32_t. The int_leastXX_t types offer you a bit more control, but are only reliably available in C99 compilers.

Avoid the exact-width types unless you really, really, mean it. None of them are guaranteed to be provided. You might assume that every platform you ever see will offer the standard 8/16/32-bit types, and you might even be right. Still, it is rarely the case that an algorithm depends on those exact sizes, and allowing the compiler more leeway might help a lot. Your code might not ever need to run on a 60-bit machine, but even on a 64-bit system, the compiler might be able to dramatically increase performance by using a 64-bit native type instead of going to a lot of trouble to fake up a 32-bit type.

Always code defensively. Read the C standard, and read your compiler's documentation. Some books and articles advise you to try things out; this can be very dangerous advice. You have no guarantee that a given experiment's results will be the same on the next version of your compiler, let alone on a new computer. Write to the specification, not to what happens to work this week. Even a simple optimization flag might change how the compiler treats your subtly buggy code!

Pseudo-Hungarian notation

Someone once proposed an elegant and effective way to improve the quality of code: prefix variables with something indicating their function. For instance, the prefix ct might indicate a counter. The proponent, Charles Simonyi of Microsoft, was originally from Hungary, and the pattern became known as Hungarian Notation (for more information about Hungarian Notation, see Resources).

Then tragedy struck. Someone who had entirely and catastrophically missed the point of Hungarian Notation started using it to denote, not the usage or function of variables -- their "type" in an abstract sense -- but the specific language type chosen to implement them. For instance, instead of using prefixes to distinguish between counters and array indexes, you would use prefixes to distinguish between int and long.

There is no better word for it: This is stupid. In C, the type of a variable is always known to the compiler, and is usually known to the programmer. Encoding the storage type in the name gives you no useful information at all. Worse, it makes it complicate, tedious, and often error-prone to correct a poor choice of type.

The compiler already catches type clashes; this notation isn't helping you do anything the compiler couldn't.

When this article talks about "types," it refers to the things handled automatically by the compiler, which should never be encoded in a variable name. Actual Hungarian Notation, by contrast, is a potentially useful habit.

For example (this example is based in spirit on a wonderful blog article about Hungarian notation; see Resources), consider the following code snippets:

	iTotal = iOne + iTwo; // add two integers
 

and

	cchTotal = cchOne + coTwo; // add count of characters to count of octets
 

The first example gives you no indication that anything might be amiss. The second, using "cch" for counts of characters, and "co" for counts of octets, shows that the code only works if characters are always eight bits; it will produce meaningless numbers on Unicode data. In the second example, you can see the bug; the first example actually makes it harder to see. If this code occurred in an editor that was supposed to handle Unicode, it would be a potentially serious bug. Tracking "types" of this sort is a useful thing to do. But, as you can see, recording the declared type can, at best, give you the mistaken impression that you've checked for the compatibility of types when you really haven't.

Typedefs

One key aspect of the C type system is its extensibility. You can declare a new type using the typedef keyword, and then use it transparently thereafter. Unfortunately, most compilers do not give you any extra type checking from the use of typedef. For instance, if you define two types which are both equivalent to int, the compiler probably won't warn you if you mix them.

	typedef int foo;
	typedef int bar;
	foo *a;
	bar b;
	a = &b;
 

This code is semantically incorrect, but it is likely to compile without warnings. Don't count on the compiler's type-checking to catch mistakes like this.

Typedefs are most useful with struct and union types, or other aggregate types. They can also be somewhat useful with arithmetic types, for standardizing a coding decision, such as the type of object used to represent indexes into a table. Not all uses of typedef are good, though. For instance, many programmers will come up with typedefs with names such as "byte" or "word," then use those for everything, and change only the typedefs when going to a new platform. This is an exceptionally bad idea.

It might not seem immediately obvious why this is such a bad idea. Part of the problem is that why it's a bad idea depends on how you're using it. If you're using "word" to refer to "an object that is exactly the native word size of our first platform," then on other platforms, you have a typedef "word" that represents something which isn't a native word. If "word" always refers to the native word size, it has unpredictable characteristics. In fact, in most cases when you see typedefs for "WORD" or "DWORD," the problem is not that the developer chose poorly between these alternatives; it's that the developer never gave the question a moment's thought, and each developer to touch the system since then has, unconsciously, picked one or the other of these interpretations. As a result, the entire code base is now full of typedefs used in painfully inconsistent ways.

Much like constants, typedefs provide a layer of abstraction. It is important to make sure that you have thought carefully about what you are trying to abstract. A poor choice might be just as rigid as an explicit type, but suffer from being obfuscated, or might be so vague as to be useless.

Furthermore, at least on modern compilers, standard types are available for the sorts of things that "word" typedefs are typically intended to accomplish. If you want int32_t, you know where to find it: <stdint.h>.

Even more extensions

During discussions over the long long type, many people proposed alternatives, such as very long. Here are a few modifiers that will probably never be implemented:

  • too: The too modifier can only be used on another modifier, such as long or short. A too long int cannot be declared, because the machine does not have enough storage to represent one. A too short int has 0 bits available for data storage, and cannot represent any value other than 0.
  • sufficiently: The sufficiently modifier, available only on magic compilers, figures out what you need. In particular, a sufficiently long int will hold any value you need to represent, and you can allocate as many sufficiently short int objects as you need.
  • very: The very modifier introduces better-named synonyms for existing types. A very long int is the same as a long long, and a very short int is the same as signed char. A very const object is a compile-time constant.

Code portability

Most code ought to be able to work comfortably within that set of guarantees which the C standard provides for all implementations. C has been described as defining an abstract machine for which you can program; this can be a good way to think about C for most development work. You can pursue portability in two ways. One is to isolate unportable constructs so you can replace or redefine them on different platforms. Another is to avoid unportable constructs in the first place.

The second is almost always preferable. You might occasionally need to write something unportable for performance reasons, but this is rare. On most systems, the marginal differences in performance will be tiny. More often, unportable constructs are adopted for developer convenience, or because the developer simply doesn't know the construct is nonportable.

Don't get careless about overflow. While many systems do something convenient or predictable in the case of overflow on signed types, the standard offers no guarantees. Unsigned types are guaranteed safe, but their behavior might not be what you want. On a typical 32-bit system, a small negative number turns into a tad over four billion. A classic example of this going horribly wrong is a delay loop trying to delay for -3 seconds, with an unsigned value somewhere along the way.

Do not treat pointers as integers. While many systems tolerate some amount of abuse along these lines, some do not. Trickery such as storing flags in "unused" bits of a pointer is sheer madness, and often results in a porting nightmare on a new platform.

Data portability

Even if the code you've written to read and write data from a file is "portable," in that it will compile everywhere, the resulting files might not be portable between separately compiled instances of your program. Unless you're careful, data files written on one system are often unreadable on another, especially when binary data formats are used. If you want data files to be portable, you should generally use plain text. The downside of this, of course, is the comparatively large amount of space this takes up. Plain text is somewhat inefficient, especially for storing numbers; the text "100000" is six bytes for the numbers, and at least one more for a space, comma, or other separator; on most systems, typical numbers can easily be stored in four bytes. The corresponding up side is a much better chance of reading data in correctly, and you can always compress the file if you want. File formats such as XML can exacerbate the space cost, moving from relatively inefficient to woefully inefficient. Such formats might improve readability, but it's not an automatic win; a poorly considered XML file is worse than plain text.

Floating point numbers are particularly non-portable, unless all the systems involved use a common format. They might not, although a surprising number of modern systems have adopted the IEEE 754 specification. If unsure, write things out as text. Remember that, while all binary floating point values can be represented as decimal integers, not all will be accurately represented using printf's default precision; you might wish to explicitly specify the precision you need. The hexadecimal floating point formats (mostly %a) introduced in C99 are there to help reduce the inaccuracy of printing and scanning floating point numbers. If you want reliable reproduction of floating point numbers, you probably ought to use the hexadecimal formats in nearly all cases.

Efficiency

You should consider a few possible kinds of efficiency when programming in C. If you are concerned about data storage requirements, give careful consideration to the possible range of values, and use the smallest type that can represent the range you care about. Don't worry about this for intermediate values and calculations unless you have insanely huge arrays of very small numbers; in many cases, using types smaller than int will make your code larger and slower, eradicating any potential savings you might have gotten. If you are concerned about speed, use int whenever possible, and look to the qualifiers and storage-class specifiers to give the compiler hints about what you're doing.

The most important kind of efficiency, for most programming projects, remains developer efficiency. For this, make liberal (but carefully considered) use of typedef, use structure types rather than bundles of variables passed separately, and make sure your compiler has as many warnings enabled as you can find. You might prefer to use types with larger ranges than you think you need, as anyone doing COBOL maintenance work in the late '90s will remember.


Pitfalls

This section covers some of the particularly common things that introduce mysterious bugs in C programs; I've mentioned some of them previously, but they're worth stressing again. Most common pitfalls involve undefined behavior, but a few involve cases where the behavior is perfectly well defined, but merely surprising.

  • Pitfall #1: Overflow and underflow
    If the value you are calculating or representing cannot be represented in the type you're using, strange things might happen. The most common overflow problems involve intermediate values calculated during the evaluation of an expression. Consider the expression x * 3 / 5. It is obvious to even the most casual observer that, if x can be represented as an int, that 3/5 of x cannot overflow. However, a slightly less casual observer might notice that x * 3 might well overflow, yielding a very different value than expected (or just a crash). You might ask why the programmer doesn't just rewrite as x / 5 * 3. However, while this will no longer overflow, it might underflow. (4*3/5) is two, but (4/5*3) is zero. 3 / 5 * x is even worse, since it will always produce a nice round zero. It might seem tempting to switch to floating point operations to avoid these problems, but it is easy to do so incorrectly. For instance, one might attempt (float) (3/5) * x; this tells the compiler to perform the integer calculation 3/5 (producing an integer zero), then convert the result to floating point.

  • Pitfall #2: Misunderstanding const
    The declaration const char *p declares a pointer, which can be changed, to a character, which can't. The declaration char * const p declares a pointer, which cannot be changed, to a character, which can. Secondly, while you can pass a pointer without the const qualifier to a function expecting a pointer with the const qualifier, you can't do the same with pointers to pointers; the language in the standard which allows the qualifier mismatch to be ignored is not recursive. It is, in fact, safe to convert a pointer to add more const qualifiers to it through casting; the resulting pointer will never be used for any modifications that were not already permitted. Casting away the const qualifier is not nearly so safe, and might result in undefined behavior.

  • Pitfall #3: Another const misunderstanding
    Another const misunderstanding is assuming that since a string literal isn't declared as const, the contents are modifiable. I don't think I know anyone who programs in C and hasn't been bitten by this one at least once. It's particularly nasty due to its tendency to show up only when something else has already gone wrong. (As a special case, while the contents of the members of argv are modifiable, and the argv value itself is modifiable, it is not necessarily the case that the pointers in argv are modifiable in portable code, nor is it clear what effect modifying them should have.)

  • Pitfall #4: Portability
    You think you've got the C type system figured out, and then you have to move to a new platform. Maybe you're a Mac programmer who's converting from 68k to PowerPC®, or from PowerPC to Intel®. Maybe you're an Xbox programmer converting from Intel to PowerPC. Maybe you need to build a robot submarine on PowerPC. Whatever your program is, if it was worth writing, you will probably have to port it. The worst part about this pitfall is that it's really not that hard to write portable code from the start, and it's so much harder to port the code later. Whenever people say "oh, it would be too hard to port," there's generally a poorly considered assumption that could have been cheaply avoided a year or five earlier.

  • Pitfall #5: Pointers
    The number of things people misunderstand about pointers can easily cause the mind to boggle. Common mistakes involve not realizing that they need to point to something, not realizing that the addresses of automatic variables become invalid when their scope is left, or trying to cast pointer types rather than casting values. (Strictly speaking, some of these are more questions of semantics than of the type system in general.)

Okay, now go write some code

At this point, you should know just about everything you need to know about types and have the resources to go look up anything that slipped through the cracks. As always, there's no substitute for an actual copy of the C standard, although a good C book can go a long way. The rumors of Usenet's death are greatly exaggerated; you can still get good information about C in comp.lang.c, comp.lang.c.moderated, and comp.std.c. But get out there and write some code. Use an enum instead of a big list of #define directives. Come up with convenient typedefs for pointers to incomplete struct types for data hiding. Clean up some old code that puts i prefixes on all the objects of type int, or that uses WORD as though it were a meaningful type name. Have fun!

Steve Summit, author of the comp.lang.c FAQ, contributed many helpful corrections and suggestions for this article; it is not, however, his fault that some of my errors remain.


Resources

Learn

Get products and technologies

Discuss

About the author

Peter Seebach

Peter Seebach joined the ISO C committee as a hobby some years ago. His favorite type is int. He has never had any of his own code fail to run on 64-bit systems. He would love to hear about errors or omissions in these articles; contact him at developerworks@seebs.plethora.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=109906
ArticleTitle=Everything you ever wanted to know about C types, Part 4: Portability and pitfalls
publish-date=05022006
author1-email=developerworks@seebs.plethora.net
author1-email-cc=dwpower@us.ibm.com

IBM SmartCloud trial. No charge.

IBM PureSystems on a kaleideoscope background

Unleash the power of hybrid cloud computing today!


Special offers