The types created from other types are collectively called derived types. Array and structure types, which contain values of other types, are called aggregate types. Other derived types are pointers and functions. The rules for creating derived types may be applied recursively; you can have an array of pointers to functions, for instance.
Pointers are much misunderstood. On many systems, a pointer is just a raw numeric value which is directly interpreted as a memory address. However, this is not always the case. All that is necessary is that the system can tell which object a given pointer is directed to; this may be arranged in different ways, both on different systems, and even within different pointers on the same system.
Pointers to different types of objects might be structured differently.
For
instance, on some systems, pointers to larger objects might be able to point
only to aligned regions of memory, and pointers to smaller objects might have
a regular pointer plus an offset. The most famous example, although it's not
standard C, is the near and far pointers used on some DOS and Microsoft® Windows® systems, where
a "near" pointer was smaller and faster, but couldn't necessarily point to
objects from other modules of the program.
Converting pointers to other types and back might lose data. For instance, on
a system where all floating point pointers must be aligned to four-byte
boundaries, converting a pointer to some other type to a pointer to
float and back could change the pointer's value.
A few cases are safe. Most significantly,
pointers to any of the char types, and
pointers to void, must have the same representation, and must be able to
represent a pointer to any other type of object, so converting a pointer to
void * and back is always safe.
You can also obtain pointers to functions, even though functions aren't objects. A function pointer is, in its simplest form, the address at which some code is stored. However, function pointers might contain additional information, such as the function's signature and calling conventions. Because this information might be lost when converting to an object type, converting to an object type and back isn't safe. Function pointers can be converted from one type of function pointer to another and back; however, it is not safe to call a function with the wrong type.
Pointer arithmetic in C does what you want most of the time; if you add
one to a pointer, it is adjusted to point to the next object of the type
it points to, as though the object were simply one member of an array.
This does have the implication that you cannot perform
pointer arithmetic on pointers to incomplete types. You need to know the size
of an object to perform pointer arithmetic on pointers to it. Thus, you can
perform pointer arithmetic on unsigned char * pointers, but
not on void * pointers. Some compilers, as an
extension, allow you to perform arithmetic on pointers to void as though
the void type had a size of 1. This is a very convenient extension, but
it's also very misleading, and can lead to confusion about types.
If you want to perform arithmetic on generic pointers, use pointers to
unsigned char.
A pointer is considered valid when it points to an object; only valid pointers can be dereferenced. There is no portable way to check whether a pointer is valid, however. When an object is freed, pointers to it become invalid. What C provides is a special kind of pointer guaranteed to be invalid, but which can be compared with it, called a null pointer. A null pointer does not point to any object, but it is safe to compare other pointers to a null pointer, and any valid pointer is guaranteed to compare unequal to a null pointer. Syntactically, you can create a null pointer by using a 0 in place of a pointer, or by casting 0 to pointer type.
Pointers might only be compared to other valid pointers (through a compatible
type, such as void *) or to null pointers.
By convention, programmers tend to set pointers to null
after freeing them, to make it possible to check their validity.
Comparisons with invalid pointers are undefined behavior; in most cases,
though, they will compare as non-null, leading careless programmers to think
a given pointer is valid. This can cause horrible crashing bugs, as a
program continues to access memory that might still have data in it, but which
will likely be overwritten later.
Contrary to popular belief, it is not necessarily the case that a null
pointer is actually represented by all bits of the pointer being set to
zero. The conversion from an integer constant zero to a pointer may
produce any value the implementation likes, as long as it matches any
other null pointer (such as the return from a failed malloc()) and is
not the address of any object. A canny implementation could declare a new
object and use its address for null pointers, as long as that object was
not accessible through any strictly conforming code.
An array holds a number of objects of a given type next to each other in memory. The size of an array of N objects of type T is N times the size of an object of type T.
When used in an expression, the name of an array is almost always converted
immediately into a pointer to the first member of the array. So, normal array
operations are simply a special case of pointer arithmetic; the syntax a[i] means the same thing as
*(a+i), whether a is
really a pointer or an array. This leads to the common belief that arrays are
just pointers. This isn't true. One exception to this is the
sizeof operator, which indicates the size of the
array rather than the size of a pointer to the array's first member.
In C89, all arrays had to have a constant size at compile time; C99 introduces
two exceptions. The first is variably
modified array types. A variably-modified type produces a variable-length
array. VLAs are new in C99, although many compilers supported them
previously. A VLA is the one exception to the rule that sizeof is a
compile-time constant expression. VLAs have mostly the same semantics
as regular arrays. They cannot be declared at file scope.
Another special case of array declarations is array members with no specified size in a structure, called flexible array members; the next section describes these.
A structure, introduced with the struct
keyword, is an aggregate object
containing other objects. Structure types are used to collate related data
into a single object. Pointers to structures can be copied around quickly
and cheaply, and structures help reduce namespace pollution, as each structure
introduces its own namespace. A C++ class is just a structure with associated
functions, and the option of making some members private. C structures cannot
have real member functions, although they can contain function pointers; these
are not the same as member functions (each instance of the structure contains
its own copy of the pointers), but may actually be more flexible.
Arrays within a structure must have constant size, and cannot be variably modified types. There is one special exception to this; see the sidebar on flexible array members.
Unions are declared in a manner nearly identical to structures. However, instead of allocating space for each member of a union, the compiler provides space for only one member at a time; the storage for all members overlaps. A union can be used to store heterogeneous data selected from a known limited set of types, or can in some cases be used for explicit type-punning, reinterpreting bit patterns from one type of data to another. Unions are not aggregate types, because a union only contains one object at a time.
When a value is stored into a member of a union, any storage used by the union, but not by that member, may change, or may not change; it's unspecified what it holds. You can refer to the union as a whole, but not to members which use the unspecified bits. In fact, in most cases, accessing any member of the union other than the one most recently stored to invokes undefined behavior. (Part 3 will explain more about this.)
One very elaborate exception exists: if a union contains multiple
structure types with a common initial sequence, that common sequence of
any of them can be examined if one of them is the most recently modified.
For instance, if you have a number of structures, each of which starts
with a magic number field of type int, you can
look into a union of them and look at the magic number to find out which
type of structure you've got. In the majority of cases, though, you
should access only the member you most recently stored to.
The type-punning uses, while historically common, are inadvisable. When you
access a union through a type other than the one you stored through, the
results are generally unpredictable and not particularly useful. If you need
to access an object as a series of bytes, take its address and cast it into
unsigned char *.
Aggregate types, such as arrays and structures, are objects which contain other objects. Many C students are surprised to find out that a structure whose members have a total size of, say, 7 might have a size other than 7. The reason for this is that some processors can access certain objects only when their addresses have certain qualities -- for instance, when the address is a multiple of two, four, or even sixteen, or (stated another way) when the lowest bit or two, or even four, of the address is 0. On other systems, this may just have an effect on efficiency; aligned access might be an order of magnitude faster.
Therefore, compilers can insert padding between members of an aggregate object, to ensure that all of the components are efficiently accessible. However, in an array, this is not always enough. It might be necessary to ensure that the second member of the array is aligned, so there has to be space between the members of the array.
C guarantees that the size of an array of N objects is N times the size of the first object in the array. Thus, if padding would be needed for an object to be used efficiently in an array, the object must always have padding. Padding always comes after objects; a pointer to a structure, suitably converted, is a pointer to its first member. If multiple structures have the same initial sequence of object types (including the widths of bit-fields), the padding and layout must be the same.
On some systems, even the primitive types might, in effect, be padded. Nothing requires that all the bits used in an object
are actually used in representing the object's value. On a system
where CHAR_BIT is 8, it's still possible to have
sizeof(int) be 4 (giving 4 * CHAR_BIT, or 32, bits), but
INT_MAX be 8,388,607. (This would be a 24-bit
int, with 8 bits of
padding in it. A DSP processor might have such a beast.) The one exception
is that all of the bits in
unsigned char are used to represent the object's
value.
It's not enough to know that an integer takes up, for example, 8 bytes of storage. Many processors cannot access some data types at arbitrary addresses, but only at even-numbered addresses, or addresses which are a multiple of some larger number of bytes. Generically, this is referred to as alignment. A type that requires 4-byte alignment can be stored or accessed only at addresses that are multiples of 4. On some systems, unaligned access is possible but slow; the slowdown can be anything from a few percent to an order of magnitude. On other systems, unaligned access will cause a program to crash or misbehave. Alignment is the reason for structure padding.
Some vector units, such as the AltiVec/VMX units in some PowerPC® processors, or the SIMD-only synergistic processor elements in the Cell Broadband Engine™ processor, will simply always perform memory access at a given alignment by truncating addresses to match alignment requirements. If you try to load an AltiVec register from an unaligned address, it will round down to the nearest multiple of 16 bytes, and load from there. There's no performance penalty, and there's no error indication, but you won't get the data you thought you were requesting.
Bit-fields are a special type of object, which can exist only within
structure or union types. Bit-fields can hold a specific number of bits,
and need not take up a full byte. There are no pointers to bit-fields.
Multiple bit-fields specified near each other in a structure might be
packed together into a single byte, and in fact must be packed together if
they fit. Whether a bit-field can cross byte boundaries, or will be packed
into a new byte, is implementation-defined. In some cases, bit-fields are
used as a way to pack multiple flags into an object, or more clearly denote
the structure of a value. A bit-field is declared as a structure or union
member of type _Bool, signed int, or unsigned int, with
the name followed by a colon and a width in bits.
If the type is _Bool, the bit-field always
holds either true or false; otherwise, it stores values normally, in a pure
binary representation. If the type specified is plain int, it is
implementation-defined whether it is signed or unsigned.
If the name of a bit-field is omitted, the space is allocated but unused; this can be used to represent padding in an externally imposed format. If a bit-field is specified with a width of zero, it prevents following bit-fields from being packed into the previous object.
Although they are not formally derived types, enumerated types are
nonetheless
new types created by the programmer. Enumerated types are a special kind of
integer type representing only specific, named values, declared using the enum
keyword. Many of the purposes for which developers use #define statements
are better suited to enumerated types.
An enumerated type introduces a list of names and corresponding values. If no values are specified, each name has a value one higher than the value of the previous object, or 0 for the first object. For instance, a first-year computer science student might write:
enum integers { ZERO, ONE, TWO, THREE };
The names of the enumerated constants declared in an enumerated type are new identifiers, which are accessible anywhere in a program, not just in references to that type. This can lead to namespace clashes, where multiple enumerated types try to declare the same name for some member. In general, you should use a meaningful prefix on enumeration values.
Each member of an enumerated type becomes available as soon as its declaration is complete, so later values can refer to earlier ones. This can be used when using an enumerated type to give symbolic names to bits:
Listing 1. Enumerated values might refer to previously defined values
enum flags {
FLAG_ONE = 1,
FLAG_TWO = FLAG_ONE << 1,
FLAG_THREE = FLAG_TWO << 1,
...
};
|
Enumerated types offer some advantages over preprocessor macros. Most
debuggers can recognize the symbolic names introduced by an enumerated type.
As a particularly nice example, gcc can warn you if a switch statement with an expression
of an enumerated type does not handle every value enumerated for that type.
The enum, struct, and union keywords share a namespace for their tags.
Thus, if you have declared struct foo, you cannot also declare union foo.
The rules for interactions between objects of the same type are the easy part. When you have multiple objects of different types, the rules get more complicated. In general, C proceeds by converting both operands of an operator into a single common type, on which the operation is performed. These conversions are called the usual arithmetic conversions .
Integer types are subject to the integer promotions, which reflect C's heritage as a language written to address hardware's native capabilities.
Anything smaller than an int promotes to int, if the signed type can
express the full range of the original object, or to unsigned int
otherwise. This rule reflects the typical situation where the int type
represents a native hardware register.
In the case where an expression involves types which differ only in width
-- say, int and long,
or float and double --
the rule is simple
enough. The smaller object is promoted into the larger type. Thus, if
you multiply an int by a long, you really multiply two long values
together. The C89 standard gave a list of specific rules for these promotions.
The C99 standard introduced the concept of integer conversion rank,
which is a formalized description of what people mean when they talk about
one type being larger or wider than another. This is necessary because
C99 allows for implementation-defined types which might not be in the list
of standard types. In the case where an extended integer type (one of the
ones an implementation provides, but which is not described in the
standard) has the same range as a standard type, the standard type has
greater rank.
The usual arithmetic conversions are almost all obvious; values are converted to the largest common type. However, when signed and unsigned types are mixed, the correct course is less obvious. Historical implementations generally chose one of two sets of rules. One is that, if you have used unsigned types, you probably did it on purpose, so everything should convert to unsigned. The other is that values should be preserved whenever possible. The C99 standard settled on slightly more elaborate semantics. If the unsigned type has greater rank than the signed type, the signed type is converted to the unsigned type. If the signed type can represent all the values of the unsigned type, the unsigned type is converted to the signed type. Finally, if the two types have the same rank, the signed value is converted to the corresponding unsigned type.
Functions are operators too, and their arguments are operands, but they do
not all get converted to the same type; rather, each argument is converted
separately.
When there is no prototype for a function, or the function takes a
variable number of arguments, the function's arguments are subjected to
the default argument promotions, namely, the integer promotions used
in integer expressions, plus the conversion of float to double.
(With variable arguments, only the variable arguments are converted this way.)
This is why the %f format for printf() is used for both float and
double objects. When there is a prototype in scope, each
argument is converted to the type given in the prototype.
In both cases, function argument promotions have no effect on pointer types;
a pointer to float does not become a pointer to
double, and passing one where the other is expected
is a plain error.
Cast operations are conversions of data from one type to another. For
the most part, casting does what you'd expect; if you write (float) 1, you get an object of type float with the value
1.0.
Casting is referred to in a few places in this series, mostly when there's something unusual about it. One general caveat here, though, is that casting a pointer has no effect on the data pointed to. Inexperienced programmers sometimes try to convert an array of objects by accessing it through the wrong type of pointer. It doesn't work.
Note that you can convert
any object pointer to a pointer to unsigned char and go poking about at
the representation of an object, if you wish.
Floating point and integer values
In any combination of floating point and integer values, the integer
values are promoted into floating point types, which typically have
significantly wider ranges. However, while the floating point types have
wider ranges, they lose precision at the extremes of their range. Floating
point values are represented as an exponent and a mantissa; the exponent
is the power of FLT_RADIX by which the mantissa
is multiplied [note that "usually" here means "almost always;" while there
have occasionally been decimal floating point systems, they have been few
and far between].
For small numbers, the exponent is negative. For integers up through the
range of the mantissa, the exponent is zero, and the mantissa represents
the number precisely. However, if a number larger than this must be
stored, it will be stored as a multiple of a power of FLT_RADIX.
On my laptop system, it is impossible to represent 16,778,001 exactly as a
32-bit float value; it gets truncated to
16,778,000. In some cases, this loss of precision might matter.
For instance, if you're trying to use a floating point number as a
counter, it might stop working. The following trivial test program
illustrates the problem. (On some systems, there might not be values in the
range of long which
have this problem. It shows up on most common processors, such as PPC
and x86 architectures, on both of which the magic value is 16,777,216 exactly,
that being 2^24.)
Listing 2. Finding an unrepresentable float
#include <stdio.h>
#include <limits.h>
int
main(void) {
unsigned long l;
float f, g;
for (l = 0; l < (ULONG_MAX - 1000); l += 1000) {
f = l;
g = l + 1;
if (f == g) {
printf("(float) %ld+1 == (float) %ld\n", l, l);
break;
}
}
f = l;
printf("%f++ => ", f);
f++;
printf("%f\n", f);
return 0;
}
|
The philosophical implications of being able to describe a number you can't add one to and get a higher number are staggering. Never show this to a mathematician. (For more on scientific programming in C, see Resources.)
Many programmers have implemented fixed-point arithmetic manually, by simply adopting a consistent scaling factor, and writing code to handle common tasks, such as addition or multiplication, with their fixed-point numbers. Also, some processors have built-in fixed-point types. Such types are not yet supported by standard C. Work is ongoing to standardize these features.
If you need to use hardware support for fixed-point math, you are not going to be able to write very portable code. For now, that means you should isolate the fixed-point math part of your code away from the main body of code as much as possible.
Although the C type system doesn't directly address I/O hardware, many
systems provide access to real hardware through special objects; see the
discussion of the volatile keyword in Part 1. While many programmers never see such things in
their code, kernel developers are pretty much buried in them.
I/O devices create interesting challenges for a compiler writer, because
the programmer needs to know, and control, exactly when and how often
the hardware is accessed. If the volatile keyword
doesn't give you enough control when working with hardware registers, consider
doing all of your work in temporaries of a suitable type, and restricting
operations on the registers to direct assignments.
As with much advice, you can't always do this, but it helps when you can.
There are some efforts to standardize access to such hardware. The NetBSD
kernel has a particularly interesting (and, obviously, exceptionally
portable) set of code and data structures to handle representations of
registers, ports, and other hardware bits in a way compatible with many
platforms and bus architectures. Work in this field reflects the tension
between accommodating the needs of various different architectures, and
the desire to have as little overhead as possible when running on any specific
machine. If you're trying to write data to an I/O controller as fast as
possible, the last thing you want is a three-layer deep nesting of
routines which have to perform elaborate table lookups when the net
result will be equivalent to *port=*src++; on your
primary platform. On the other hand, you want some kind of portability.
The next article in this series looks at more implementation-specific magic, then gets into the details of changes in the type system in the C99 standard.
Steve Summit, author of the comp.lang.c FAQ, contributed many helpful corrections and suggestions for this article; it is not, however, his fault that some of my errors remain.
Learn
-
Steve Summit has some excellent material on floating point
rounding
errors and related issues.
-
This article series represents everything you ever wanted to know
about C
types. For a gentle introduction to C types, see Types by P.J. Plauger and
Jim Brodie.
-
The comp.lang.c
FAQ is full of useful information about C, including the type system,
common pitfalls, and nearly everything else. The book version, which has
additional content, is particularly valuable.
-
See also the comp.lang.c IAC
(Infrequently Asked Questions).
-
Andrew Koenig's paper, C Traps and
Pitfalls, was later expanded into an excellent book of the same name.
-
Henry Spencer's Ten commandments
for C programmers remain topical and relevant today.
-
See
all the parts in this
series.
-
Find more articles of interest in the IBM developerWorks Power
Architecture technology zone.
-
Keep abreast of all the latest Power Architecture-related news, articles,
and downloads: subscribe to the Power
Architecture Community Newsletter.
Get products and technologies
-
See all Power Architecture-related
downloads on one page.
Discuss
-
Take part in the IBM developerWorks Power Architecture discussion
forums.
-
Send a letter to the editor.

Peter Seebach joined the ISO C committee as a hobby some years ago. His
favorite type is int. He has never had any of his own code fail to run
on 64-bit systems. He would love to hear about errors or omissions in
these articles; contact him at developerworks@seebs.plethora.net.



