 | Level: Intermediate Cameron Laird (claird@phaseit.net), Consultant, Phaseit, Inc.
04 May 2007
Are you tired of spending countless hours devoted to fixing memory faults? Do you
find yourself constantly being bogged down in programs that leak memory, violate memory
bounds, use uninitialized data, and devote an excessive amount of run time to memory
management? Use this article to help you conquer these pesky memory defects.
An earlier article (see
Resources) urged
a general approach to development with C, C++, or Java
of applications free of memory faults. It outlined
the importance of memory management, the errors most
often made in its programming, and strategies for
preventing and correcting them.
That outline leaves plenty of detail still to explain.
Coding errors that were well understood decades ago
continue to afflict a huge percentage of all programs
in C and Java. This is frustrating to me personally because many times when working with
new clients, it is first necessary to clean up memory misallocations and related errors
before addressing any subtelties of thread race conditions, esoteric transaction
semantics, or performance analyses.
Although memory and related errors should be elementary by now, they remain
widely misunderstood in the field. Still, there's
good news--all these problems are correctable. And don't worry: you don't need to
contravene any physical laws to boost program
quality several levels, only shift your habits a bit.
Favorable environment
for improvement of memory management
This article focuses attention on specific programming
tools that are very useful. Keep
in mind throughout the explanations that
teams often make the most progress with coding quality when
they balance three important ideas:
- eyeballs on the code
- automation
- expertise
"Eyeballs on the code" means having plenty of peer review--through pair
programming, inspection, or other techniques. The aim
is that all artifacts that contribute to a program--all
the source, all the build procedures, and so on--should
be read. Segments that compile and link successfully
and do not appear to cause any obvious problems often
receive the benefit of the doubt. They shouldn't. Expectations need to be higher:
at the very least, all the source code
needs to be read and understood by someone.
Automation's slogan is to "make it executable". If
your application needs to launch and initialize in four
seconds, make that into an executable test; if there's
been an issue with project builds, make sure that a
reference host generates everything from scratch,
automatically, every night. Generally, the way to
advance in computing systems is to figure out how something
should work, and turn that answer into a tool, library, or
practice that can be automated to supply the conclusion
tirelessly. Any time you feel irritation or boredom that
you are having to repeat something obvious or tedious, it's
time to celebrate that you've identified another opportunity
to automate.
A significant challenge in writing articles on this
subject is to balance attention on the tools that help
automation, with the attitude that supports it.
More than any single tool, what helps make for success is
diligence in pursuit of quality.
Product expectations
 |
Code quality products
This article emphasizes the importance of attitude and process.
Choice of tools is secondary--not because the tools don't matter,
but because the ones available are themselves so valuable.
An advantage of working in the area of code quality,
and especially with memory faults, is because there's real
progress in the marketplace. For other tool domains--editors,
configuration management, estimation--advances through the
years seem to have been only marginal, and it's rational to
choose a toolset based on subjective "fit". Plenty of teams
edit and manage their code the same way they did a decade or
more ago.
The vendors of analysis tools, though, have distinguished
themselves in my experience by the quality of their after-sale
support and enhancements. I've been repeatedly surprised both by the
problems good automatic tools are able to uncover, and the
enthusiasm of the major vendors for working with customers to
solve subtle puzzles. You should expect your tools to improve
through time, along with your own knowledge of how to resolve
the faults you find.
|
|
The most delicate of these three elements is expertise.
I can quickly set up systems that prepare daily reports
which detail, for example, that the code in the vicinity
of line #285 of my_source.c might contribute to a memory
leak, and the best organizations recognize that they need
to correct this symptom. What specific changes are necessary
in the source to achieve this?
The answer isn't always obvious. The Samba development
team, for instance, justly prides itself on the quality of
its source. When Coverity, Inc. donated error reports in 2006
back to the development team based on scans with its own
source analysis product, the Samba programmers turned around
198 of the faults within the first week. What is interesting,
though, is that the other eighteen reports, many of which
the Samba team initially categorized as "false positives",
eventually were (nearly) all rewritten. Some issues of
coding quality are subtle enough that even the best programmers
need a bit of time to analyze and judge them.
Rewriting code to solve memory-management errors is an
area where deep expertise pays off. Make sure your most
experienced programmers are available to advise your team on
this, and take advantage of the experts. My experience with
program-analysis vendors is that their support teams are
unusually responsive; if you're using open-source tools, on
the other hand, you can access lively mailing lists or related
channels to work through thorny questions.
With enough eyeballs, automation, and expertise, you can
reasonably expect to solve all source quality problems.
Realize what a change this is:
much of the culture of C, which C++ and Java inherited,
crystallized over twenty years ago, when lint
was deeply flawed, yet represented the state-of-the-art in
addressing code quality. Source code past a page in length
used to be at least slightly mysterious and out-of-control.
We can do better now. It's
entirely reasonable to set and achieve the goal of
error-free, warning-free, leak-free, thoroughly-inspected,
and well-styled source code. Here are a few tools to
reach this goal:
Targets for program
analysis
Start with the compiler you're already using. Most of this article
focuses on C, although the same principles apply with C++ and Java.
Take advantage of
-O -Wall -W -Wshadow -pedantic
or the equivalent
for your compiler, that is, the compiler directive that reports
most usefully about incorrect and questionable syntax. Your goal
should be to have a clean log: when you build your entire project
from scratch, no compiler diagnostics should appear.
This claim may sound alien to you.
Plenty of development teams demonstrate their sincere belief
that diagnostic faults are as inevitable as proverbial death or
taxes. That's not so! Even large codebases,
including hundreds of thousands or even millions of lines of source,
can be systematically cleaned of all statically-diagnosable
faults.
To do so brings two benefits:
- You are likely to turn up a
small percentage of true functional errors. From my
experience, a well-maintained 500,000-line project managed
with indifference to diagnostics will have at least 5,000 errors.
Of these, at least five will turn out to be substantive--tests
of unsigned integers which of course never become negative,
declared variables that are unused because of spelling
mistakes, and so on. Scrubbing source of all compiler
diagnostics is an inexpensive way to boost quality
significantly, especially since it's so easy to automate.
- Once the diagnostic noise level has been reduced to zero,
or nearly so, any newly-introduced errors become dramatically
easier to detect and isolate.
And the correct target for warning level is in the range from
-Wall to
-Wall -W -Wshadow -Wredundant-decls ... -pedantic,
not the -W-free compilation
that many teams wrongly take as default. Reasonable experts
can differ over good style in handling, say, diagnostics about
cast alignment; no one, though, should pretend that it's a
real advantage to turn off warnings about blithe confusion of
pointers and integer data or uninitialized variables.
A few examples help demonstrate these arguments. Remember: a first step toward quality, especially in
regard to memory management, is warning-free compilation.
Code examples
It's easy to forget how weak directive-free compilation is.
Consider, for a first example, this blatant, undiagnosed memory error:
Listing 1.
Example of blatant, undiagnosed memory error.
/* Compile with "cc -c example1.c" and
"cc -c -O -Wall example1.c".. */
#include <stdio.h>
int main()
{
int j;
printf("%d.\n", j);
return 0;
}
|
j is clearly not initialized in any
reliable sense, yet gcc and other
common compilers accept this program source without complaint.
Only when invoked as cc -c -O -Wall example1.c
or stronger does the compiler issue the alert,
... warning: 'j' is ... uninitialized ....
Even disciplined and experienced development teams that practice
good habits with inspections and unit tests occasionally generate
such errors. Automatic checks are essential complements; any
organization that doesn't already check for such diagnostics at
least daily urgently needs to change. I've worked with programming
teams of all sizes and situations for decades, and have yet to
encounter one that didn't profit from at least the
lint or -Wall
level of diagnostic automation.
Recognize that this is not "your father's
lint". While compiler warnings a couple
of decades ago had the reputation of generating a lot of noise--both
false positives and false negatives--they've improved
dramatically. Moreover, several competing proprietary products,
including those from FlexeLint, Coverity, Grammatech,
Parasoft, and Klocwork, offer even more value.
While all these alternatives continue to improve, they also
reward expertise. Here's an example of source that challenges
both static and run-time analysis:
Listing 2. Example of difficult diagnosis.
struct a {
int b;
int c;
};
void f2(), f3(int);
int f1(int thing)
{
struct a x;
if (thing < 0)
x.b = 3;
f2();
if (thing < -3)
f3(x.b);
return 0;
}
|
Most analysis tools report that x.b
might be uninitialized. Expert programmers can read this and
conclude that x.b is only evaluated
when thing < -3, which certainly
implies thing < 0. This constitutes
a proof that use of x.b is valid,
but the proof exceeds the default capabilities of most tools.
There are several possible responses; abandonment of analysis is
certainly neither necessary nor desirable:
- Most analysis tools support some sort of directive
to disable analysis for a single line. One might,
for instance, insert
/*NOUNINITIALIZED*/
immediately before f3(x.b).
This preserves almost all the benefits of the tool,
and alerts maintenance programmers that assignment
of x.b is at least
fragile. Experience has taught that cases
like this, where the code "fools" tools, are likely
to represent "hot spots" of code that also present
difficulties to humans. Even though we have proved
that the code currently is valid in the sense that
x.b is initialized,
maintenance is likely to mutate tricky coding into
something that is no longer valid. It's important
to respond deliberately to each diagnostic, rather
than casually deciding to ignore a few on the
grounds that they aren't real. Each diagnostic
is important.
- This specific misdiagnosis is less likely among
run-time analysis tools. While I prefer that
static analysis be used for all source, it's
possible to design an effective strategy that
relies solely on run-time analysis.
- A crude response is to replace
x's
definition with an initialization, that is,
struct a x = {0};
rather than
struct a x;
Although I don't like this approach, I recognize
it's the most comfortable one for some organizations.
There are teams, in fact, that establish such
initializations as a preferred style.
- Refactorization is the tactic I usually favor.
For clarity, I reduced the example at hand to
the minimal elements necessary to illustrate
the false positive. Any practical example is
likely to be more complex; it will be semantically
most meaningful to rewrite the segment to something
like the one in Listing 3.
This rewrite expresses intent to human readers
at least as well as Listing 2 did, at the same
time as it sidesteps warnings from any analysis
tool.
Listing 3. Rewrite of Listing 2.
struct a {
int b;
int c;
};
void f2(), f3(int);
void initialize_a(int thing, struct a *xptr)
{
if (thing < 0)
xptr->b = 3;
}
int f1(int thing)
{
struct a x;
initialize_a(thing, &x);
f2();
if (thing < -3)
f3(x.b);
return 0;
}
|
Whichever alternative you choose, the goal for every response
to a diagnostic review should be the same: not just to
silence warnings,
but to do so in a way that improves at least one other
aspect of coherence, readability, or maintainability.
How to choose a toolset
Choice of code analysis tools is complex enough to deserve a whole
series of articles by itself. Don't let uncertainty slow you, though:
a few simple tips will help you choose one tool as a starting point.
Your own experience over a few weeks or months should reveal whether
you need to supplement or replace your initial choice. Moreover,
nearly all the proprietary vendors have options for evaluation, so,
whether you choose an open-source or fee-licensed product, you can
begin testing it against your own programs today.
For the purpose of this article, analysis tools fall in two broad categories: static
and run-time. Static analysis tools work as lint does: they scan
source code and "reason" over the constructs there to report errors
and difficulties. Along with their analytic sophistication, tools
offer more value through their integration and usability. A simple
lint has a fixed collection of errors it reports. A better tool
typically has both graphical user interface (GUI) and command-line
views to shorten the distance between problem detection and resolution.
Also, good tools can be configured to "learn" local judgments: which
code constructs your team allows or discourages.
The best static analysis tools scan all of an application or
suite's source. This gives the tool the opportunity to analyze
non-local conditions--for example, that thirty different Java
source files pass the same type to a particular constructor, but
there's also one instance of a syntactically-allowed but distinct
type in a single case. Heuristic tests of this sort are an
exciting area for current research.
Run-time tools present a distinct pattern of use and functionality.
They execute the program in a special environment, or, even more
commonly, "instrument" the program to report on itself when
executed in a standard manner. The aim is to execute low-level
instructions and simultaneously analyze those instructions against
such rules as:
- Does this array reference lie within the bounds
of the array as originally defined?
- Does this assignment leave no reference to a
particular memory segment in the heap?
- Is the value of data at a particular location
of the type of the reference to that location?
- and so on
Evangelists sometimes argue for the relative potency of
the rule sets of static or dynamics tools: that static
analysis can identify all the faults found by dynamic
analysis, along with several others, or vice-versa.
As a practical matter, I find these arguments unpersuasive.
I like and use several of the static and dynamic
tools. For me, the differences generally have to do with
other aspects of usage: dynamic tools have the potential
to manage third-party libraries for which source isn't
available, dynamic tools can produce illuminating reports
when run by end-users, while protecting the details of
those end-users' runs, and run-time tools isolate complicated
path-dependent memory faults that are difficult to solve otherwise.
On the other hand, run-time tools slow execution speed, sometimes
unacceptably so, and many developers appear to find their
reports harder to understand. Most telling in some circumstances
are licensing terms: specific licensing clauses might preclude
use of a particular product.
As hinted above, analysis tools are available for languages beside
C. While Java and higher-level languages deserve their reputations
as safer than C--that is, likely to hide a lower incidence of
undetected errors--good tools are available for many of them,
including Fortran, Java, Python, and Ruby. Even functional
languages, which are immune to many of the problems of C, can
code memory leaks and other faults which traditional debugging
largely misses.
Conclusions
Programs are buggy, and applications coded in C in particular are
subject to a host of memory faults. As common as these and
related errors are, though, they're solvable. Methodical adoption
of well-known techniques of code inspection and code analysis
invariably isolates specific source-level errors. At this point,
the greatest barriers to dramatic improvement in code quality are
cultural rather than technical. Tools and techniques are available
to solve a large portion of all memory faults. The key component to finding success
over memory faults is to maintain the attitude that maintaining code at a high
level of quality with full memory correctness is, indeed, possible.
Resources Learn
-
"Techniques for memory debugging" describes
why memory management is important, what sorts of
memory errors are common, and practices developers
can adopt to reduce errors.
-
While the Samba
project has a good reputation for the quality of its
coding, specific reports from tool vendor
Coverity helped
identify
two hundred defects that merited correction.
-
Although Andrew Glover's valuable developerWorks column, "In pursuit of
code quality", focuses on Java, his installment on
"Refactoring
with code metrics", rests on the rewrite techniques
and tool automation which the current article argues are
essential for C and C++ development.
"Using
IBM Rational PurifyPlus", by Poonam Chitale, is
better characterized by its subtitle,
"Performing runtime analysis with test automation",
for most of its explanations and advice apply to the
range of analysis tools, and are not restricted to users
of Purify.
-
"Coding without side effects" is the subtitle Bruce Tate chose
for his 2006 article for developerWorks,
"Explore
functional programming with Haskell". From the standpoint
of more conventional programming languages, much of functional
programming's merit lies in what it cannot do, or do only with difficulty and deliberate intent: leak memory, violate array bounds,
introduce mysterious side effects, corrupt memory, and so on.
I'm a strong advocate for use of high-level languages to
enhance developer productivity and improve reliability. At
the same time, even the highest-level languages, like Haskell,
are subject to specific memory considerations, and benefit from
use of good ancillary tools.
-
I frequently
write about Expect
precisely because it is a marvelously useful tool
for automating command-line applications that are
otherwise difficult to script. When integrating
analytic tools in your development process, it's
crucial to be able to automate their use fully.
Expect frequently plays a role.
-
I collect references to articles on memory safety on
this page.
-
"Avoid the Most Common
Software Development Goofs" introduces the basic concepts
of static C source analysis.
-
AIX and UNIX:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX
and UNIX?:
Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
-
AIX 5L™ Wiki:
A collaborative environment for technical information related to AIX.
- Check out other articles and tutorials written
by Cameron Laird:
- Search the AIX and UNIX library by topic:
-
Safari bookstore:
Visit this e-reference library to find specific technical resources.
-
developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
-
Podcasts:
Tune in and catch up with IBM technical experts.
Get products and technologies
-
IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
About the author  | |  | Cameron is a full-time consultant for Phaseit, Inc., who writes and speaks
frequently on open source and other technical topics. You can contact him
at claird@phaseit.net. |
Rate this page
|  |