Skip to main content

skip to main content

developerWorks  >  AIX and UNIX  >

Improve your memory programming

Intermediate-level techniques improve security and reliability

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Cameron Laird (claird@phaseit.net), Consultant, Phaseit, Inc.

04 May 2007

Are you tired of spending countless hours devoted to fixing memory faults? Do you find yourself constantly being bogged down in programs that leak memory, violate memory bounds, use uninitialized data, and devote an excessive amount of run time to memory management? Use this article to help you conquer these pesky memory defects.

An earlier article (see Resources) urged a general approach to development with C, C++, or Java of applications free of memory faults. It outlined the importance of memory management, the errors most often made in its programming, and strategies for preventing and correcting them.

That outline leaves plenty of detail still to explain. Coding errors that were well understood decades ago continue to afflict a huge percentage of all programs in C and Java. This is frustrating to me personally because many times when working with new clients, it is first necessary to clean up memory misallocations and related errors before addressing any subtelties of thread race conditions, esoteric transaction semantics, or performance analyses.

Although memory and related errors should be elementary by now, they remain widely misunderstood in the field. Still, there's good news--all these problems are correctable. And don't worry: you don't need to contravene any physical laws to boost program quality several levels, only shift your habits a bit.

Favorable environment for improvement of memory management

This article focuses attention on specific programming tools that are very useful. Keep in mind throughout the explanations that teams often make the most progress with coding quality when they balance three important ideas:

  • eyeballs on the code
  • automation
  • expertise

"Eyeballs on the code" means having plenty of peer review--through pair programming, inspection, or other techniques. The aim is that all artifacts that contribute to a program--all the source, all the build procedures, and so on--should be read. Segments that compile and link successfully and do not appear to cause any obvious problems often receive the benefit of the doubt. They shouldn't. Expectations need to be higher: at the very least, all the source code needs to be read and understood by someone.

Automation's slogan is to "make it executable". If your application needs to launch and initialize in four seconds, make that into an executable test; if there's been an issue with project builds, make sure that a reference host generates everything from scratch, automatically, every night. Generally, the way to advance in computing systems is to figure out how something should work, and turn that answer into a tool, library, or practice that can be automated to supply the conclusion tirelessly. Any time you feel irritation or boredom that you are having to repeat something obvious or tedious, it's time to celebrate that you've identified another opportunity to automate.

A significant challenge in writing articles on this subject is to balance attention on the tools that help automation, with the attitude that supports it. More than any single tool, what helps make for success is diligence in pursuit of quality.

Product expectations

Code quality products
This article emphasizes the importance of attitude and process. Choice of tools is secondary--not because the tools don't matter, but because the ones available are themselves so valuable.

An advantage of working in the area of code quality, and especially with memory faults, is because there's real progress in the marketplace. For other tool domains--editors, configuration management, estimation--advances through the years seem to have been only marginal, and it's rational to choose a toolset based on subjective "fit". Plenty of teams edit and manage their code the same way they did a decade or more ago.

The vendors of analysis tools, though, have distinguished themselves in my experience by the quality of their after-sale support and enhancements. I've been repeatedly surprised both by the problems good automatic tools are able to uncover, and the enthusiasm of the major vendors for working with customers to solve subtle puzzles. You should expect your tools to improve through time, along with your own knowledge of how to resolve the faults you find.

The most delicate of these three elements is expertise. I can quickly set up systems that prepare daily reports which detail, for example, that the code in the vicinity of line #285 of my_source.c might contribute to a memory leak, and the best organizations recognize that they need to correct this symptom. What specific changes are necessary in the source to achieve this?

The answer isn't always obvious. The Samba development team, for instance, justly prides itself on the quality of its source. When Coverity, Inc. donated error reports in 2006 back to the development team based on scans with its own source analysis product, the Samba programmers turned around 198 of the faults within the first week. What is interesting, though, is that the other eighteen reports, many of which the Samba team initially categorized as "false positives", eventually were (nearly) all rewritten. Some issues of coding quality are subtle enough that even the best programmers need a bit of time to analyze and judge them.

Rewriting code to solve memory-management errors is an area where deep expertise pays off. Make sure your most experienced programmers are available to advise your team on this, and take advantage of the experts. My experience with program-analysis vendors is that their support teams are unusually responsive; if you're using open-source tools, on the other hand, you can access lively mailing lists or related channels to work through thorny questions.

With enough eyeballs, automation, and expertise, you can reasonably expect to solve all source quality problems. Realize what a change this is: much of the culture of C, which C++ and Java inherited, crystallized over twenty years ago, when lint was deeply flawed, yet represented the state-of-the-art in addressing code quality. Source code past a page in length used to be at least slightly mysterious and out-of-control. We can do better now. It's entirely reasonable to set and achieve the goal of error-free, warning-free, leak-free, thoroughly-inspected, and well-styled source code. Here are a few tools to reach this goal:



Back to top


Targets for program analysis

Start with the compiler you're already using. Most of this article focuses on C, although the same principles apply with C++ and Java. Take advantage of -O -Wall -W -Wshadow -pedantic or the equivalent for your compiler, that is, the compiler directive that reports most usefully about incorrect and questionable syntax. Your goal should be to have a clean log: when you build your entire project from scratch, no compiler diagnostics should appear.

This claim may sound alien to you. Plenty of development teams demonstrate their sincere belief that diagnostic faults are as inevitable as proverbial death or taxes. That's not so! Even large codebases, including hundreds of thousands or even millions of lines of source, can be systematically cleaned of all statically-diagnosable faults.

To do so brings two benefits:

  • You are likely to turn up a small percentage of true functional errors. From my experience, a well-maintained 500,000-line project managed with indifference to diagnostics will have at least 5,000 errors. Of these, at least five will turn out to be substantive--tests of unsigned integers which of course never become negative, declared variables that are unused because of spelling mistakes, and so on. Scrubbing source of all compiler diagnostics is an inexpensive way to boost quality significantly, especially since it's so easy to automate.
  • Once the diagnostic noise level has been reduced to zero, or nearly so, any newly-introduced errors become dramatically easier to detect and isolate.

And the correct target for warning level is in the range from -Wall to -Wall -W -Wshadow -Wredundant-decls ... -pedantic, not the -W-free compilation that many teams wrongly take as default. Reasonable experts can differ over good style in handling, say, diagnostics about cast alignment; no one, though, should pretend that it's a real advantage to turn off warnings about blithe confusion of pointers and integer data or uninitialized variables.

A few examples help demonstrate these arguments. Remember: a first step toward quality, especially in regard to memory management, is warning-free compilation.



Back to top


Code examples

It's easy to forget how weak directive-free compilation is. Consider, for a first example, this blatant, undiagnosed memory error:
Listing 1. Example of blatant, undiagnosed memory error.

                    
      /* Compile with "cc -c example1.c" and
		      "cc -c -O -Wall example1.c".. */
      #include <stdio.h>
      
      int main()
      {
          int j;
      
          printf("%d.\n", j);
          return 0;
      }
    

j is clearly not initialized in any reliable sense, yet gcc and other common compilers accept this program source without complaint. Only when invoked as cc -c -O -Wall example1.c or stronger does the compiler issue the alert, ... warning: 'j' is ... uninitialized ....

Even disciplined and experienced development teams that practice good habits with inspections and unit tests occasionally generate such errors. Automatic checks are essential complements; any organization that doesn't already check for such diagnostics at least daily urgently needs to change. I've worked with programming teams of all sizes and situations for decades, and have yet to encounter one that didn't profit from at least the lint or -Wall level of diagnostic automation.

Recognize that this is not "your father's lint". While compiler warnings a couple of decades ago had the reputation of generating a lot of noise--both false positives and false negatives--they've improved dramatically. Moreover, several competing proprietary products, including those from FlexeLint, Coverity, Grammatech, Parasoft, and Klocwork, offer even more value.

While all these alternatives continue to improve, they also reward expertise. Here's an example of source that challenges both static and run-time analysis:
Listing 2. Example of difficult diagnosis.

                    

      
      struct a {
          int b;
          int c;
      };
      void f2(), f3(int);
      
      
      int f1(int thing)
      {
          struct a x;
      
          if (thing < 0)
      	       x.b = 3;
          f2();
          if (thing < -3)
      	      f3(x.b);
          return 0;
      }
    

Most analysis tools report that x.b might be uninitialized. Expert programmers can read this and conclude that x.b is only evaluated when thing < -3, which certainly implies thing < 0. This constitutes a proof that use of x.b is valid, but the proof exceeds the default capabilities of most tools.

There are several possible responses; abandonment of analysis is certainly neither necessary nor desirable:

  • Most analysis tools support some sort of directive to disable analysis for a single line. One might, for instance, insert /*NOUNINITIALIZED*/ immediately before f3(x.b). This preserves almost all the benefits of the tool, and alerts maintenance programmers that assignment of x.b is at least fragile. Experience has taught that cases like this, where the code "fools" tools, are likely to represent "hot spots" of code that also present difficulties to humans. Even though we have proved that the code currently is valid in the sense that x.b is initialized, maintenance is likely to mutate tricky coding into something that is no longer valid. It's important to respond deliberately to each diagnostic, rather than casually deciding to ignore a few on the grounds that they aren't real. Each diagnostic is important.
  • This specific misdiagnosis is less likely among run-time analysis tools. While I prefer that static analysis be used for all source, it's possible to design an effective strategy that relies solely on run-time analysis.
  • A crude response is to replace x's definition with an initialization, that is, struct a x = {0}; rather than struct a x; Although I don't like this approach, I recognize it's the most comfortable one for some organizations. There are teams, in fact, that establish such initializations as a preferred style.
  • Refactorization is the tactic I usually favor. For clarity, I reduced the example at hand to the minimal elements necessary to illustrate the false positive. Any practical example is likely to be more complex; it will be semantically most meaningful to rewrite the segment to something like the one in Listing 3. This rewrite expresses intent to human readers at least as well as Listing 2 did, at the same time as it sidesteps warnings from any analysis tool.

Listing 3. Rewrite of Listing 2.
                    

      
      struct a {
          int b;
          int c;
      };
      void f2(), f3(int);

      void initialize_a(int thing, struct a *xptr)
      {
          if (thing < 0)
      	       xptr->b = 3;
      }
      
      int f1(int thing)
      {
          struct a x;
      
	  initialize_a(thing, &x);
          f2();
          if (thing < -3)
      	      f3(x.b);
          return 0;
      }
    

Whichever alternative you choose, the goal for every response to a diagnostic review should be the same: not just to silence warnings, but to do so in a way that improves at least one other aspect of coherence, readability, or maintainability.



Back to top


How to choose a toolset

Choice of code analysis tools is complex enough to deserve a whole series of articles by itself. Don't let uncertainty slow you, though: a few simple tips will help you choose one tool as a starting point. Your own experience over a few weeks or months should reveal whether you need to supplement or replace your initial choice. Moreover, nearly all the proprietary vendors have options for evaluation, so, whether you choose an open-source or fee-licensed product, you can begin testing it against your own programs today.

For the purpose of this article, analysis tools fall in two broad categories: static and run-time. Static analysis tools work as lint does: they scan source code and "reason" over the constructs there to report errors and difficulties. Along with their analytic sophistication, tools offer more value through their integration and usability. A simple lint has a fixed collection of errors it reports. A better tool typically has both graphical user interface (GUI) and command-line views to shorten the distance between problem detection and resolution. Also, good tools can be configured to "learn" local judgments: which code constructs your team allows or discourages.

The best static analysis tools scan all of an application or suite's source. This gives the tool the opportunity to analyze non-local conditions--for example, that thirty different Java source files pass the same type to a particular constructor, but there's also one instance of a syntactically-allowed but distinct type in a single case. Heuristic tests of this sort are an exciting area for current research.

Run-time tools present a distinct pattern of use and functionality. They execute the program in a special environment, or, even more commonly, "instrument" the program to report on itself when executed in a standard manner. The aim is to execute low-level instructions and simultaneously analyze those instructions against such rules as:

  • Does this array reference lie within the bounds of the array as originally defined?
  • Does this assignment leave no reference to a particular memory segment in the heap?
  • Is the value of data at a particular location of the type of the reference to that location?
  • and so on
Evangelists sometimes argue for the relative potency of the rule sets of static or dynamics tools: that static analysis can identify all the faults found by dynamic analysis, along with several others, or vice-versa.

As a practical matter, I find these arguments unpersuasive. I like and use several of the static and dynamic tools. For me, the differences generally have to do with other aspects of usage: dynamic tools have the potential to manage third-party libraries for which source isn't available, dynamic tools can produce illuminating reports when run by end-users, while protecting the details of those end-users' runs, and run-time tools isolate complicated path-dependent memory faults that are difficult to solve otherwise. On the other hand, run-time tools slow execution speed, sometimes unacceptably so, and many developers appear to find their reports harder to understand. Most telling in some circumstances are licensing terms: specific licensing clauses might preclude use of a particular product.

As hinted above, analysis tools are available for languages beside C. While Java and higher-level languages deserve their reputations as safer than C--that is, likely to hide a lower incidence of undetected errors--good tools are available for many of them, including Fortran, Java, Python, and Ruby. Even functional languages, which are immune to many of the problems of C, can code memory leaks and other faults which traditional debugging largely misses.



Back to top


Conclusions

Programs are buggy, and applications coded in C in particular are subject to a host of memory faults. As common as these and related errors are, though, they're solvable. Methodical adoption of well-known techniques of code inspection and code analysis invariably isolates specific source-level errors. At this point, the greatest barriers to dramatic improvement in code quality are cultural rather than technical. Tools and techniques are available to solve a large portion of all memory faults. The key component to finding success over memory faults is to maintain the attitude that maintaining code at a high level of quality with full memory correctness is, indeed, possible.



Resources

Learn
  • "Techniques for memory debugging" describes why memory management is important, what sorts of memory errors are common, and practices developers can adopt to reduce errors.

  • While the Samba project has a good reputation for the quality of its coding, specific reports from tool vendor Coverity helped identify two hundred defects that merited correction.

  • Although Andrew Glover's valuable developerWorks column, "In pursuit of code quality", focuses on Java, his installment on "Refactoring with code metrics", rests on the rewrite techniques and tool automation which the current article argues are essential for C and C++ development. "Using IBM Rational PurifyPlus", by Poonam Chitale, is better characterized by its subtitle, "Performing runtime analysis with test automation", for most of its explanations and advice apply to the range of analysis tools, and are not restricted to users of Purify.

  • "Coding without side effects" is the subtitle Bruce Tate chose for his 2006 article for developerWorks, "Explore functional programming with Haskell". From the standpoint of more conventional programming languages, much of functional programming's merit lies in what it cannot do, or do only with difficulty and deliberate intent: leak memory, violate array bounds, introduce mysterious side effects, corrupt memory, and so on. I'm a strong advocate for use of high-level languages to enhance developer productivity and improve reliability. At the same time, even the highest-level languages, like Haskell, are subject to specific memory considerations, and benefit from use of good ancillary tools.

  • I frequently write about Expect precisely because it is a marvelously useful tool for automating command-line applications that are otherwise difficult to script. When integrating analytic tools in your development process, it's crucial to be able to automate their use fully. Expect frequently plays a role.

  • I collect references to articles on memory safety on this page.

  • "Avoid the Most Common Software Development Goofs" introduces the basic concepts of static C source analysis.

  • AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.

  • New to AIX and UNIX?: Visit the New to AIX and UNIX page to learn more about AIX and UNIX.

  • AIX 5L™ Wiki: A collaborative environment for technical information related to AIX.

  • Check out other articles and tutorials written by Cameron Laird:
  • Search the AIX and UNIX library by topic:
  • Safari bookstore: Visit this e-reference library to find specific technical resources.

  • developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.

  • Podcasts: Tune in and catch up with IBM technical experts.

Get products and technologies
  • IBM trial software: Build your next development project with software for download directly from developerWorks.


Discuss


About the author

Cameron is a full-time consultant for Phaseit, Inc., who writes and speaks frequently on open source and other technical topics. You can contact him at claird@phaseit.net.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top