Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Everything you ever wanted to know about C types, Part 2: Floating point and derived types

Interoperability and the single C standard

Peter Seebach, Freelance author, Plethora.net
Peter Seebach
Peter Seebach joined the ISO C committee as a hobby some years ago. His favorite type is int. He has never had any of his own code fail to run on 64-bit systems. He would love to hear about errors or omissions in these articles; contact him at developerworks@seebs.plethora.net.

Summary:  The C type system is often misunderstood or overlooked. This article, the second in a series, discusses the derived types, or types that are built from other types, and some of the interactions that occur when data of multiple types are mixed.

View more content in this series

Date:  14 Feb 2006
Level:  Introductory

Activity:  15764 views
Comments:  

The types created from other types are collectively called derived types. Array and structure types, which contain values of other types, are called aggregate types. Other derived types are pointers and functions. The rules for creating derived types may be applied recursively; you can have an array of pointers to functions, for instance.

Pointers

Pointers are much misunderstood. On many systems, a pointer is just a raw numeric value which is directly interpreted as a memory address. However, this is not always the case. All that is necessary is that the system can tell which object a given pointer is directed to; this may be arranged in different ways, both on different systems, and even within different pointers on the same system.

Pointers to different types of objects might be structured differently. For instance, on some systems, pointers to larger objects might be able to point only to aligned regions of memory, and pointers to smaller objects might have a regular pointer plus an offset. The most famous example, although it's not standard C, is the near and far pointers used on some DOS and Microsoft® Windows® systems, where a "near" pointer was smaller and faster, but couldn't necessarily point to objects from other modules of the program.

Converting pointers to other types and back might lose data. For instance, on a system where all floating point pointers must be aligned to four-byte boundaries, converting a pointer to some other type to a pointer to float and back could change the pointer's value. A few cases are safe. Most significantly, pointers to any of the char types, and pointers to void, must have the same representation, and must be able to represent a pointer to any other type of object, so converting a pointer to void * and back is always safe.

You can also obtain pointers to functions, even though functions aren't objects. A function pointer is, in its simplest form, the address at which some code is stored. However, function pointers might contain additional information, such as the function's signature and calling conventions. Because this information might be lost when converting to an object type, converting to an object type and back isn't safe. Function pointers can be converted from one type of function pointer to another and back; however, it is not safe to call a function with the wrong type.

Pointer arithmetic in C does what you want most of the time; if you add one to a pointer, it is adjusted to point to the next object of the type it points to, as though the object were simply one member of an array. This does have the implication that you cannot perform pointer arithmetic on pointers to incomplete types. You need to know the size of an object to perform pointer arithmetic on pointers to it. Thus, you can perform pointer arithmetic on unsigned char * pointers, but not on void * pointers. Some compilers, as an extension, allow you to perform arithmetic on pointers to void as though the void type had a size of 1. This is a very convenient extension, but it's also very misleading, and can lead to confusion about types. If you want to perform arithmetic on generic pointers, use pointers to unsigned char.

A pointer is considered valid when it points to an object; only valid pointers can be dereferenced. There is no portable way to check whether a pointer is valid, however. When an object is freed, pointers to it become invalid. What C provides is a special kind of pointer guaranteed to be invalid, but which can be compared with it, called a null pointer. A null pointer does not point to any object, but it is safe to compare other pointers to a null pointer, and any valid pointer is guaranteed to compare unequal to a null pointer. Syntactically, you can create a null pointer by using a 0 in place of a pointer, or by casting 0 to pointer type.

Pointers might only be compared to other valid pointers (through a compatible type, such as void *) or to null pointers. By convention, programmers tend to set pointers to null after freeing them, to make it possible to check their validity. Comparisons with invalid pointers are undefined behavior; in most cases, though, they will compare as non-null, leading careless programmers to think a given pointer is valid. This can cause horrible crashing bugs, as a program continues to access memory that might still have data in it, but which will likely be overwritten later.

Contrary to popular belief, it is not necessarily the case that a null pointer is actually represented by all bits of the pointer being set to zero. The conversion from an integer constant zero to a pointer may produce any value the implementation likes, as long as it matches any other null pointer (such as the return from a failed malloc()) and is not the address of any object. A canny implementation could declare a new object and use its address for null pointers, as long as that object was not accessible through any strictly conforming code.

Arrays

An array holds a number of objects of a given type next to each other in memory. The size of an array of N objects of type T is N times the size of an object of type T.

When used in an expression, the name of an array is almost always converted immediately into a pointer to the first member of the array. So, normal array operations are simply a special case of pointer arithmetic; the syntax a[i] means the same thing as *(a+i), whether a is really a pointer or an array. This leads to the common belief that arrays are just pointers. This isn't true. One exception to this is the sizeof operator, which indicates the size of the array rather than the size of a pointer to the array's first member.

In C89, all arrays had to have a constant size at compile time; C99 introduces two exceptions. The first is variably modified array types. A variably-modified type produces a variable-length array. VLAs are new in C99, although many compilers supported them previously. A VLA is the one exception to the rule that sizeof is a compile-time constant expression. VLAs have mostly the same semantics as regular arrays. They cannot be declared at file scope.

Another special case of array declarations is array members with no specified size in a structure, called flexible array members; the next section describes these.

Structures

A structure, introduced with the struct keyword, is an aggregate object containing other objects. Structure types are used to collate related data into a single object. Pointers to structures can be copied around quickly and cheaply, and structures help reduce namespace pollution, as each structure introduces its own namespace. A C++ class is just a structure with associated functions, and the option of making some members private. C structures cannot have real member functions, although they can contain function pointers; these are not the same as member functions (each instance of the structure contains its own copy of the pointers), but may actually be more flexible.

Arrays within a structure must have constant size, and cannot be variably modified types. There is one special exception to this; see the sidebar on flexible array members.

Flexible array members

A common desire is to have a structure containing a header and a variable amount of data. The data could be allocated separately, but this adds both space and time overhead. A hack that worked in many compilers was to declare, as the last member of a structure, an array either with size 1 or with some fairly large size, but actually allocate a different amount of space when creating a new structure. For instance, the following declaration might have been used:

struct hack {
int length;
char data[1];
};

The question of what size to give the array was debated, and no answer was portable or correct.

The C99 standard introduced a solution to this problem, called flexible array members. A flexible array member is a structure member declared with array type but no specified size; it must be the last member of the structure. You cannot declare an object of a structure type containing a flexible array member, but you can declare pointers to it. The sizeof operator gives the size of the structure, not including any space provided for the flexible array member; this includes padding. The flexible member is then treated as a pointer to any additional storage allocated; for instance, if the size of a structure with a flexible array member is 40 bytes, and you allocate 80, 40 more bytes are available for the flexible member. (How many objects this will hold depends on the type of the member.)

Unions

Unions are declared in a manner nearly identical to structures. However, instead of allocating space for each member of a union, the compiler provides space for only one member at a time; the storage for all members overlaps. A union can be used to store heterogeneous data selected from a known limited set of types, or can in some cases be used for explicit type-punning, reinterpreting bit patterns from one type of data to another. Unions are not aggregate types, because a union only contains one object at a time.

When a value is stored into a member of a union, any storage used by the union, but not by that member, may change, or may not change; it's unspecified what it holds. You can refer to the union as a whole, but not to members which use the unspecified bits. In fact, in most cases, accessing any member of the union other than the one most recently stored to invokes undefined behavior. (Part 3 will explain more about this.)

One very elaborate exception exists: if a union contains multiple structure types with a common initial sequence, that common sequence of any of them can be examined if one of them is the most recently modified. For instance, if you have a number of structures, each of which starts with a magic number field of type int, you can look into a union of them and look at the magic number to find out which type of structure you've got. In the majority of cases, though, you should access only the member you most recently stored to.

The type-punning uses, while historically common, are inadvisable. When you access a union through a type other than the one you stored through, the results are generally unpredictable and not particularly useful. If you need to access an object as a series of bytes, take its address and cast it into unsigned char *.

Aggregate types and padding

Aggregate types, such as arrays and structures, are objects which contain other objects. Many C students are surprised to find out that a structure whose members have a total size of, say, 7 might have a size other than 7. The reason for this is that some processors can access certain objects only when their addresses have certain qualities -- for instance, when the address is a multiple of two, four, or even sixteen, or (stated another way) when the lowest bit or two, or even four, of the address is 0. On other systems, this may just have an effect on efficiency; aligned access might be an order of magnitude faster.

Therefore, compilers can insert padding between members of an aggregate object, to ensure that all of the components are efficiently accessible. However, in an array, this is not always enough. It might be necessary to ensure that the second member of the array is aligned, so there has to be space between the members of the array.

C guarantees that the size of an array of N objects is N times the size of the first object in the array. Thus, if padding would be needed for an object to be used efficiently in an array, the object must always have padding. Padding always comes after objects; a pointer to a structure, suitably converted, is a pointer to its first member. If multiple structures have the same initial sequence of object types (including the widths of bit-fields), the padding and layout must be the same.

On some systems, even the primitive types might, in effect, be padded. Nothing requires that all the bits used in an object are actually used in representing the object's value. On a system where CHAR_BIT is 8, it's still possible to have sizeof(int) be 4 (giving 4 * CHAR_BIT, or 32, bits), but INT_MAX be 8,388,607. (This would be a 24-bit int, with 8 bits of padding in it. A DSP processor might have such a beast.) The one exception is that all of the bits in unsigned char are used to represent the object's value.

Alignment

It's not enough to know that an integer takes up, for example, 8 bytes of storage. Many processors cannot access some data types at arbitrary addresses, but only at even-numbered addresses, or addresses which are a multiple of some larger number of bytes. Generically, this is referred to as alignment. A type that requires 4-byte alignment can be stored or accessed only at addresses that are multiples of 4. On some systems, unaligned access is possible but slow; the slowdown can be anything from a few percent to an order of magnitude. On other systems, unaligned access will cause a program to crash or misbehave. Alignment is the reason for structure padding.

Some vector units, such as the AltiVec/VMX units in some PowerPC® processors, or the SIMD-only synergistic processor elements in the Cell Broadband Engine™ processor, will simply always perform memory access at a given alignment by truncating addresses to match alignment requirements. If you try to load an AltiVec register from an unaligned address, it will round down to the nearest multiple of 16 bytes, and load from there. There's no performance penalty, and there's no error indication, but you won't get the data you thought you were requesting.

Bit-fields

Bit-fields are a special type of object, which can exist only within structure or union types. Bit-fields can hold a specific number of bits, and need not take up a full byte. There are no pointers to bit-fields. Multiple bit-fields specified near each other in a structure might be packed together into a single byte, and in fact must be packed together if they fit. Whether a bit-field can cross byte boundaries, or will be packed into a new byte, is implementation-defined. In some cases, bit-fields are used as a way to pack multiple flags into an object, or more clearly denote the structure of a value. A bit-field is declared as a structure or union member of type _Bool, signed int, or unsigned int, with the name followed by a colon and a width in bits. If the type is _Bool, the bit-field always holds either true or false; otherwise, it stores values normally, in a pure binary representation. If the type specified is plain int, it is implementation-defined whether it is signed or unsigned.

If the name of a bit-field is omitted, the space is allocated but unused; this can be used to represent padding in an externally imposed format. If a bit-field is specified with a width of zero, it prevents following bit-fields from being packed into the previous object.

Enumerated types

Although they are not formally derived types, enumerated types are nonetheless new types created by the programmer. Enumerated types are a special kind of integer type representing only specific, named values, declared using the enum keyword. Many of the purposes for which developers use #define statements are better suited to enumerated types.

An enumerated type introduces a list of names and corresponding values. If no values are specified, each name has a value one higher than the value of the previous object, or 0 for the first object. For instance, a first-year computer science student might write:

enum integers { ZERO, ONE, TWO, THREE };

The names of the enumerated constants declared in an enumerated type are new identifiers, which are accessible anywhere in a program, not just in references to that type. This can lead to namespace clashes, where multiple enumerated types try to declare the same name for some member. In general, you should use a meaningful prefix on enumeration values.

Each member of an enumerated type becomes available as soon as its declaration is complete, so later values can refer to earlier ones. This can be used when using an enumerated type to give symbolic names to bits:


Listing 1. Enumerated values might refer to previously defined values

	enum flags {
		FLAG_ONE = 1,
		FLAG_TWO = FLAG_ONE << 1,
		FLAG_THREE = FLAG_TWO << 1,
	...
	};

Enumerated types offer some advantages over preprocessor macros. Most debuggers can recognize the symbolic names introduced by an enumerated type. As a particularly nice example, gcc can warn you if a switch statement with an expression of an enumerated type does not handle every value enumerated for that type.

The enum, struct, and union keywords share a namespace for their tags. Thus, if you have declared struct foo, you cannot also declare union foo.


Interactions between types

The rules for interactions between objects of the same type are the easy part. When you have multiple objects of different types, the rules get more complicated. In general, C proceeds by converting both operands of an operator into a single common type, on which the operation is performed. These conversions are called the usual arithmetic conversions .

Integer types are subject to the integer promotions, which reflect C's heritage as a language written to address hardware's native capabilities.

Anything smaller than an int promotes to int, if the signed type can express the full range of the original object, or to unsigned int otherwise. This rule reflects the typical situation where the int type represents a native hardware register.

In the case where an expression involves types which differ only in width -- say, int and long, or float and double -- the rule is simple enough. The smaller object is promoted into the larger type. Thus, if you multiply an int by a long, you really multiply two long values together. The C89 standard gave a list of specific rules for these promotions. The C99 standard introduced the concept of integer conversion rank, which is a formalized description of what people mean when they talk about one type being larger or wider than another. This is necessary because C99 allows for implementation-defined types which might not be in the list of standard types. In the case where an extended integer type (one of the ones an implementation provides, but which is not described in the standard) has the same range as a standard type, the standard type has greater rank.

The usual arithmetic conversions are almost all obvious; values are converted to the largest common type. However, when signed and unsigned types are mixed, the correct course is less obvious. Historical implementations generally chose one of two sets of rules. One is that, if you have used unsigned types, you probably did it on purpose, so everything should convert to unsigned. The other is that values should be preserved whenever possible. The C99 standard settled on slightly more elaborate semantics. If the unsigned type has greater rank than the signed type, the signed type is converted to the unsigned type. If the signed type can represent all the values of the unsigned type, the unsigned type is converted to the signed type. Finally, if the two types have the same rank, the signed value is converted to the corresponding unsigned type.

Functions are operators too, and their arguments are operands, but they do not all get converted to the same type; rather, each argument is converted separately. When there is no prototype for a function, or the function takes a variable number of arguments, the function's arguments are subjected to the default argument promotions, namely, the integer promotions used in integer expressions, plus the conversion of float to double. (With variable arguments, only the variable arguments are converted this way.) This is why the %f format for printf() is used for both float and double objects. When there is a prototype in scope, each argument is converted to the type given in the prototype.

In both cases, function argument promotions have no effect on pointer types; a pointer to float does not become a pointer to double, and passing one where the other is expected is a plain error.

Casting

Cast operations are conversions of data from one type to another. For the most part, casting does what you'd expect; if you write (float) 1, you get an object of type float with the value 1.0.

Casting is referred to in a few places in this series, mostly when there's something unusual about it. One general caveat here, though, is that casting a pointer has no effect on the data pointed to. Inexperienced programmers sometimes try to convert an array of objects by accessing it through the wrong type of pointer. It doesn't work.

Note that you can convert any object pointer to a pointer to unsigned char and go poking about at the representation of an object, if you wish.

Floating point and integer values

In any combination of floating point and integer values, the integer values are promoted into floating point types, which typically have significantly wider ranges. However, while the floating point types have wider ranges, they lose precision at the extremes of their range. Floating point values are represented as an exponent and a mantissa; the exponent is the power of FLT_RADIX by which the mantissa is multiplied [note that "usually" here means "almost always;" while there have occasionally been decimal floating point systems, they have been few and far between].

For small numbers, the exponent is negative. For integers up through the range of the mantissa, the exponent is zero, and the mantissa represents the number precisely. However, if a number larger than this must be stored, it will be stored as a multiple of a power of FLT_RADIX.

On my laptop system, it is impossible to represent 16,778,001 exactly as a 32-bit float value; it gets truncated to 16,778,000. In some cases, this loss of precision might matter. For instance, if you're trying to use a floating point number as a counter, it might stop working. The following trivial test program illustrates the problem. (On some systems, there might not be values in the range of long which have this problem. It shows up on most common processors, such as PPC and x86 architectures, on both of which the magic value is 16,777,216 exactly, that being 2^24.)


Listing 2. Finding an unrepresentable float

	#include <stdio.h>
	#include <limits.h>

	int
	main(void) {
		unsigned long l;
		float f, g;
		for (l = 0; l < (ULONG_MAX - 1000); l += 1000) {
			f = l;
			g = l + 1;
			if (f == g) {
				printf("(float) %ld+1 == (float) %ld\n", l, l);
				break;
			}
		}
		f = l;
		printf("%f++ => ", f);
		f++;
		printf("%f\n", f);
		return 0;
	}

The philosophical implications of being able to describe a number you can't add one to and get a higher number are staggering. Never show this to a mathematician. (For more on scientific programming in C, see Resources.)

Fixed-point math

Many programmers have implemented fixed-point arithmetic manually, by simply adopting a consistent scaling factor, and writing code to handle common tasks, such as addition or multiplication, with their fixed-point numbers. Also, some processors have built-in fixed-point types. Such types are not yet supported by standard C. Work is ongoing to standardize these features.

If you need to use hardware support for fixed-point math, you are not going to be able to write very portable code. For now, that means you should isolate the fixed-point math part of your code away from the main body of code as much as possible.

Saturated math

Although it never shows up in current standard C, one interesting variety of overflow handling is called saturation.

A saturating operation will not generate an exception, or wrap around to the other end of the range; it will simply produce the largest possible value. On a 32-bit value that can represent numbers in the range +/= 2,147,483,647 or so, adding 2,000,000,000 to 1,000,000,000 produces exactly 2,147,483,647. No overflow, no trap, no undefined behavior; you just get the largest available value. This is not always what you want, but sometimes it is; if it is enough to know that the value is "way too large," it's very nice to be able to get that result without spending five times as much code checking for errors as it took to get the initial result.

I/O access

Although the C type system doesn't directly address I/O hardware, many systems provide access to real hardware through special objects; see the discussion of the volatile keyword in Part 1. While many programmers never see such things in their code, kernel developers are pretty much buried in them.

I/O devices create interesting challenges for a compiler writer, because the programmer needs to know, and control, exactly when and how often the hardware is accessed. If the volatile keyword doesn't give you enough control when working with hardware registers, consider doing all of your work in temporaries of a suitable type, and restricting operations on the registers to direct assignments. As with much advice, you can't always do this, but it helps when you can.

There are some efforts to standardize access to such hardware. The NetBSD kernel has a particularly interesting (and, obviously, exceptionally portable) set of code and data structures to handle representations of registers, ports, and other hardware bits in a way compatible with many platforms and bus architectures. Work in this field reflects the tension between accommodating the needs of various different architectures, and the desire to have as little overhead as possible when running on any specific machine. If you're trying to write data to an I/O controller as fast as possible, the last thing you want is a three-layer deep nesting of routines which have to perform elaborate table lookups when the net result will be equivalent to *port=*src++; on your primary platform. On the other hand, you want some kind of portability.


And up next...

The next article in this series looks at more implementation-specific magic, then gets into the details of changes in the type system in the C99 standard.

Steve Summit, author of the comp.lang.c FAQ, contributed many helpful corrections and suggestions for this article; it is not, however, his fault that some of my errors remain.


Resources

Learn

Get products and technologies

Discuss

About the author

Peter Seebach

Peter Seebach joined the ISO C committee as a hobby some years ago. His favorite type is int. He has never had any of his own code fail to run on 64-bit systems. He would love to hear about errors or omissions in these articles; contact him at developerworks@seebs.plethora.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=103917
ArticleTitle=Everything you ever wanted to know about C types, Part 2: Floating point and derived types
publish-date=02142006
author1-email=developerworks@seebs.plethora.net
author1-email-cc=dwpower@us.ibm.com

IBM SmartCloud trial. No charge.

IBM PureSystems on a kaleideoscope background

Unleash the power of hybrid cloud computing today!


Special offers