Memory errors in C and C++ programs are bad: they're common, and they can have serious consequences. Many of the gravest security notices from the Computer Emergency Response Team (see Resources) and vendors are commentaries on simple memory errors. C programmers have talked about this class of error since the late '70s, but their impact remains large in 2007. Worse, if my impressions are any guide, many of today's C and C++ coders seem to regard memory errors as uncontrollable and mysterious afflictions from which one can only recover, not prevent.
It's not so. This article shows that it's possible to understand all the essentials of good memory-related coding in a single sitting:
- Importance of correct memory management
- Categories of memory errors
- Strategies of memory programming
- Conclusion
Importance of correct memory management
C and C++ programs with memory errors cause problems. If they leak memory, they run progressively slower and eventually halt; if they overwrite memory, they are fragile and likely vulnerable to hijacking by a malignant user. Exploits from the famous Morris worm of 1988 to the latest security alerts on Flash Player and other crucial retail-level programs relied on buffer overflows: "The majority of computer security holes are buffer overruns," wrote Rodney Bates in 2004.
Many other general-purpose languages, such as Java™, Ruby, Haskell, C#, Perl, Smalltalk, and so on, are widely enlisted in situations where C or C++ might instead be used, and each has significant enthusiasts and benefits. Part of the folklore of computing, though, is that the majority of the usability advantage each has over C or C++ has to do strictly with ease of memory management. Memory-related programming is so important, and its correct application so difficult in practice, as to dominate all other variables or theories of object-oriented, functional, high-level, declarative, and other qualities of programming languages.
Memory errors also can be insidious in a way common to few other classes of errors: They're hard to reproduce and symptoms often are difficult to localize in the corresponding source code. A memory leak, for example, might render an application entirely unacceptable at the same time it is opaque, regardless of where or when the leak occurs.
For all these reasons, then, memory aspects of C and C++ programming deserve special consideration. Let's see what you can do about them, short of avoiding the languages.
First, don't despair. There are answers to memory challenges. Start with a list all of all the effective possible difficulties:
- Memory leak
- Misassignment, including multiply
free()d memory and uninitialized references - Dangling pointers
- Array bounds violations
That's the whole list. Even moving to C++'s
object-orientation language doesn't change the categories significantly;
the model of memory management and reference in C and C++ is fundamentally the
same, whether the data is the simple types and structs
of C, or C++'s classes. Most of what follows is in "pure
C," with extension to C++ largely left as an exercise.
Memory leaks occur when a resource is allocated, but it's never reclaimed. Here's a model for what can go wrong (see Listing 1):
Listing 1. Simple potential heap memory loss and buffer overwrite
void f1(char *explanation)
{
char *p1;
p1 = malloc(100);
(void) sprintf(p1,
"The f1 error occurred because of '%s'.",
explanation);
local_log(p1);
}
|
Do you see the problem? Unless local_log() takes the
unusual responsibility for free()ing the memory
it's passed, invocation of f1 leaks 100 bytes
each time it's called. This is tiny in a time when megabytes are given
away in memory sticks as promotional items but, over hours of continuous
operation, even such small losses can cripple an application.
In practical C and C++ programming, it is not enough to sanitize your use of
malloc() or new.The sentence
at the beginning of this section mentioned "resources" rather
than just "memory" precisely because of examples like this one
(see Listing 2). FILE handles
might not look like memory blocks, but they must be handled with the same care:
Listing 2. Potential heap memory loss from resource mismanagement
int getkey(char *filename)
{
FILE *fp;
int key;
fp = fopen(filename, "r");
fscanf(fp, "%d", &key);
return key;
}
|
The semantics of fopen require a complementary
fclose. While the C standard doesn't specify
what happens without the fclose(), it's likely
to leak memory. Other resources, such as semaphores, network handles, database
connections, and so on, deserve the same consideration.
Less difficult to manage are misassignments. Here's an example (see Listing 3):
Listing 3. An uninitialized pointer
void f2(int datum)
{
int *p2;
/* Uh-oh! No one has initialized p2. */
*p2 = datum;
...
}
|
The good news about errors such as this is that they tend to have dramatic consequences. Under AIX®, assignment to an uninitialized pointer generally results in an immediate segmentation fault. This is good because any such faults are detected swiftly; these errors are much cheaper than ones that take months to identify and are difficult to reproduce.
There are several variations within this category. Memory can be
free()d more often than
malloc()ed (see Listing 4):
Listing 4. Two erroneous memory de-allocations
/* Allocate once, free twice. */
void f3()
{
char *p;
p = malloc(10);
...
free(p);
...
free(p);
}
/* Allocate zero times, free once. */
void f4()
{
char *p;
/* Note that p remains uninitialized here. */
free(p);
}
|
These errors also are often not grave. Although the C standard doesn't define behavior in these cases, typical implementations ignore the faults, or flag them swiftly and vividly; as above, these are safe situations.
Dangling pointers are more troublesome. A dangling pointer arises when a programmer uses a memory resource after it has been freed (see Listing 5):
Listing 5. Dangling pointers
void f8()
{
struct x *xp;
xp = (struct x *) malloc(sizeof (struct x));
xp.q = 13;
...
free(xp);
...
/* Problem! There's no guarantee that
the memory block to which xp points
hasn't been overwritten. */
return xp.q;
}
|
Traditional "debugging" has difficulty isolating dangling pointers. They're poorly reproducible for a couple of distinct reasons:
- Even if the code affecting the prematurely-freed memory range is localized, use of the memory might depend on execution elsewhere in the application or, in extreme cases, even in a different process.
- Dangling pointers are likely to arise in code, which uses memory in subtle ways. The consequence is that, even if memory is overwritten immediately on freeing and the new pointed value differs from the expected one, the new value might be hard to recognize as erroneous.
Dangling pointers are a constant threat to the health of C or C++ programs.
Not safe at all are the array bounds violations, which is the final major
category of memory mismanagement. Look back at Listing 1;
what happens if the length of explanation exceeds 80?
Answer: It's hard to predict, but it's probably far from good.
More specifically, C copies a string that doesn't fit into the 100
characters allocated for it. In any common implementation, the
"excess" characters overwrite other data in memory. The layout
of data allocations in memory is complex and subtle to reproduce, so any symptoms
might be hard to connect back to the specific error at the level of source code.
These are among the errors that regularly result in millions of dollars of damage.
Strategies of memory programming
Diligence and discipline can reduce the incidence of these errors to near zero. Let's go over several specific steps you can take; my experience with these in a variety of organizations is that they consistently slash memory errors by at least an order of magnitude.
The most important, and the one I have never seen emphasized by any other author, is a coding standard. Functions and methods which impact resources, especially memory, need to explain themselves explicitly. Here are examples of pertinent headers, comments, or names (see Listing 6).
Listing 6. Examples of resource-aware source code
/********
* ...
*
* Note that any function invoking protected_file_read()
* assumes responsibility eventually to fclose() its
* return value, UNLESS that value is NULL.
*
********/
FILE *protected_file_read(char *filename)
{
FILE *fp;
fp = fopen(filename, "r");
if (fp) {
...
} else {
...
}
return fp;
}
/*******
* ...
*
* Note that the return value of get_message points to a
* fixed memory location. Do NOT free() it; remember to
* make a copy if it must be retained ...
*
********/
char *get_message()
{
static char this_buffer[400];
...
(void) sprintf(this_buffer, ...);
return this_buffer;
}
/********
* ...
* While this function uses heap memory, and so
* temporarily might expand the over-all memory
* footprint, it properly cleans up after itself.
*
********/
int f6(char *item1)
{
my_class c1;
int result;
...
c1 = new my_class(item1);
...
result = c1.x;
delete c1;
return result;
}
/********
* ...
* Note that f8() is documented to return a value
* which needs to be returned to heap; as f7 thinly
* wraps f8, any code which invokes f7() must be
* careful to free() the return value.
*
********/
int *f7()
{
int *p;
p = f8(...);
...
return p;
}
|
Make these stylistic elements part of your routine. There are all sorts of approaches to memory issues:
- Special-purpose libraries
- Languages
- Software tools
- Hardware checkers
Over this entire domain, the one step I've most consistently found useful and with the biggest return on its investment is thoughtful improvement of source code style. It needn't be expensive or rigidly formal; memory-neutral segments can be left uncommented as always, and memory-impacting definitions surely deserve explicit comment. Put in a few simple words to make memory consequences clear, and your memory programming improves.
I haven't done controlled experiments to validate the effects of this style. If your experience is anything like mine, you'll find you don't want to live without a policy of commenting resource impact. To do so simply pays off too well.
Supplementary to coding standards is inspection. Either helps on its own, but
they're particularly potent in partnership. An alert C or C++
practitioner can scan even unfamiliar source code and detect memory problems at
very low cost. With a little practice and appropriate textual searches, you can
quickly develop an ability to validate source corpora for balanced
*alloc() and free(), or
new and delete. Human source
review of this sort often turns up problems like the one in
Listing 7.
Listing 7. A troublesome memory leak
static char *important_pointer = NULL;
void f9()
{
if (!important_pointer)
important_pointer = malloc(IMPORTANT_SIZE);
...
if (condition)
/* Ooops! We just lost the reference
important_pointer already held. */
important_pointer = malloc(DIFFERENT_SIZE);
...
}
|
Superficial use of automatic run-time tools doesn't detect the memory
leak that occurs if the case condition is true. Careful
source analysis can reason through such conditionals to provably correct
conclusions. I repeat what I wrote about style: While most published descriptions
of memory problems emphasize tools and languages, for me, the greatest gains come
from "soft," developer-centered process changes. Any
improvements you make in style and inspection help you understand the diagnostics
produced by automatic tools.
Static automatic syntax analysis
Humans aren't the only ones who can read source code, of course. You
should also make static syntax analysis part of your development process.
Static syntax analysis is what lint, strict
compilation, and several commercial products do: Scan a source text and spot
items that a compiler accepts, but that are likely to be symptoms of mistakes.
Expect to make your code lint-free. While lint
is old and limited, the many programmers who don't bother with it (or its
more advanced descendants) make a big mistake. It is possible, in general, to
write good, professional-quality code which passes
lint, and the effort to do so usually turns up
significant errors. Some of these affect memory correctness. Even payment of the
most expensive license fees among the products available in this category loses
its sting when compared to the costs of having a customer be the first to identify
a memory error. Clean your source code. Even if a coding that
lint flags appears to give you the functionality you
want now, it's very, very likely that a cleaner approach exists, one that
satisfies lint and is more robust and portable.
The final two categories of remedy are distinct from the first three. The former are light-weight; an individual can readily understand and implement them. Memory libraries and tools, on the other hand, have generally higher license fees, and they require more sophistication and judgment on the part of the developer. The programmers who use libraries and tools effectively are those who understand the light-weight, static approaches. The available libraries and tools are impressive: Their quality, as a group, is quite high. Even the best ones can be foiled, though, by a sufficiently willful programmer committed to ignoring basic principles of memory management. From what I've seen, mediocre programmers working in isolation only frustrate themselves when they try to take advantage of memory libraries and tools.
For all these reasons, I urge C and C++ programmers to start by looking at their own source for memory problems. Having done that, it's time to consider libraries.
Several libraries make it possible to write conventional-looking C or C++ code, with the assurance of improved memory management. Jonathan Bartlett described leading candidates in a 2004 review for developerWorks, available through the Resources section below. Libraries address so many different memory issues that it's difficult to compare them directly; common rubrics in the domain include garbage collection, smart pointers, and smart containers. In broad terms, the libraries automate more of memory management so that the programmer makes fewer errors.
I have mixed feelings about memory libraries. They should work, but their success in the projects I've seen has been less than expected, especially on the C side. I don't yet have a good analysis for these disappointing outcomes. Performance, for example, ought to be as good as comparable manual memory management, but this is a gray area -- especially in situations where garbage-collecting libraries seem to slow processing. My most definite conclusion from working in this area is that the C++ culture seems to accept smart pointers better than groups of C-focused coders.
Development teams putting out serious C-based applications need a run time memory tool as part of their development strategy. The techniques already described are valuable and necessary. The quality and functionality of the memory tools available can be hard for you to appreciate until you've tried them for yourself.
This introduction only focuses on software-based memory tools. Hardware memory debuggers also exist; I regard them as needed only for very special situations -- mostly when working with specialized hosts that don't support other tools.
The marketplace of software memory tools includes both proprietary ones like IBM Rational® Purify, Electric Fence, and other open source tools. Several of each work well with AIX, among other operating systems.
All the memory tools operate roughly the same: Build a special version of your
executable (much as you might generate a debugging version by using the
-g flag when compiling), exercise the application, and
study reports automatically generated by the tool. Consider a program like that of
Listing 8.
Listing 8. Sample error
int main()
{
char p[5];
strcpy(p, "Hello, world.");
puts(p);
}
|
In many environments, this program "works," and it compiles, executes, and prints "Hello, world.\n" to the screen. Running the same application with a memory tool results in a report of an array-bounds violation on the fourth line. To learn of a software fault that fourteen characters have been copied into a space guaranteed to hold only five -- this way is considerably less expensive than finding out from a customer about a symptom of failure. That's the contribution of memory tools.
As a mature C or C++ programmer, you recognize that memory problems deserve serious attention. With a little planning and practice, you can come up with an approach that brings memory hazards under control. Learn correct patterns for memory use, be sensitive to the errors likely to occur, and make the techniques described in this article part of your daily routine. You can begin to eliminate symptoms from your applications that otherwise might take days or weeks to debug.
Learn
- Computer Emergency Response Team: The Computer
and Emergency Response Team is "a federally funded research and
development center" which, along with many other activities, issues
Technical Cyber Security Alerts about
specific software vulnerabilities.
- Why do good programmers follow bad practices?:
This is an article by Assistant Professor Rodney Bates on the culture of C
programming and buffer overruns, written for the ACM Queue.
- "Inside memory management"
(developerWorks, Nov 2004): Get an overview of the memory management techniques
that are available to Linux® programmers, focusing on the C language but
applicable to other languages as well.
- Rational Purify:
Learn more about this leading proprietary memory tool.
- Coverity, Incorporated: This site offers
products and services having to do, among other things, with static source code
analysis of C and C++.
- Memory hygiene in C and C++, Part 2: Commercial tools:
This is an article I wrote in 2004. I also maintain a Web page of
personal notes on memory debuggers.
- AIX and UNIX:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
- New to AIX
and UNIX?:
Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
- AIX 5L™ Wiki:
A collaborative environment for technical information related to AIX.
- Check out other articles and tutorials written
by Cameron Laird:
- Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java technology
- Linux
- Open source
- Safari bookstore:
Visit this e-reference library to find specific technical resources.
- developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
- Podcasts:
Tune in and catch up with IBM technical experts.
Get products and technologies
- IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
- AIX 5L -- technical forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools -- technical
- Virtualization -- technical
- More AIX and UNIX forums

Cameron Laird is a long-time developerWorks contributor and former columnist. He often writes about the open source projects that accelerate development of his employer's applications, focused on reliability and security. He first used AIX twenty years ago, when it was still an experimental product. He's been an enthusiastic consumer of and contributor to a variety of memory debugging tools through that time. You can contact him at claird@phaseit.net.




