Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Self-manage data buffer memory

Deliver code efficiency, simplicity, portability, and security

Xiaoming Zhang (zhang@uk.ibm.com), Staff Software Engineer, IBM UK Lab
Staff Software Engineer Xiaoming Zhang, having spent nine years as an academic carrying out research, gave up his university lectureship to join IBM in 1998. Since then, he spends most of his time designing and developing messaging middleware targeted for small devices. He is also particularly interested in the data security aspects of messaging. Xiaoming, who holds a PhD in computer-speech signal processing from the University of Wales Swansea, has published nearly thirty conference and journal papers covering speech signal processing, parallel numerical computation, application of functional programming, and image processing.

Summary:  The C programming language defines two standard memory management functions: malloc() and free(). C programmers frequently use those functions to allocate buffers at run time to pass data between functions. In many situations, however, you cannot predetermine the actual sizes required for the buffers, which may cause several fundamental problems for constructing complex C programs. In this article, Xiaoming Zhang advocates a self-managing, abstract data buffer. He outlines a pseudo-C implementation of the abstract buffer and details the advantages of adopting this mechanism.

Date:  06 Jan 2004
Level:  Intermediate

Activity:  6598 views
Comments:  

Software's scale and complexity grows all the time, fundamentally affecting how the architecture of applications. In many circumstances it seems impractical to code all functionality into a single piece of software. The importance for independent software pieces to interact, say in the form of a plug-in, becomes increasingly apparent. To achieve such interaction with relative ease, even among pieces written by different vendors, requires a well-defined interface. Writing software that meets such demands in a legacy programming language such as C can be a challenge.

Considering that challenge, this article examines the data buffer interface in the C programming language, with an eye toward improving upon current practices. Although memory management may appear trivial, a properly designed interface can produce code efficiency, simplicity, and portability -- none of which may be achievable otherwise. As such, the next section outlines the various problems programmers currently face when they adopt a conventional data buffer management scheme. That's followed by an abstract data buffer scheme, illustrated by a pseudo implementation, that solves many of those problems, followed by code snippets demonstrating the solution's benefits.

Conventional practices and the problems they present

C programmers frequently use dynamically allocated buffers (by calling malloc()/free() functions) to pass data between functions. Oftentimes, however, you cannot determine the size of such data until run time. Although that approach offers flexibility, it does have some impacts. First, it requires extra management work (allocation and freeing of memory block) wherever a buffer block is required. If the allocation and freeing cannot be done at the same code location, it is important to ensure a piece of memory block is freed once and only once when the block is no longer needed; otherwise, memory leaks or code crashes may result. Second, the buffer's size must be predetermined before a block can be allocated. However, you may not always find it easy to identify the data size. Frequently, developers adopt a conservative estimation of the maximum size, which may result in a significant waste of memory resources.

To avoid possible memory leaks and code crashes due to multiple freeing, good practice dictates you clearly predefine the program's section responsible for allocating buffer memory and the section responsible for freeing. However, defining responsibility causes other difficulties in practice. Because you must specify the size when a buffer is created under the conventional allocation scheme, a data provider, which presumably knows the size of the data it provides, is the best party to perform the buffer allocation operation. On the other hand, the best party for freeing is presumably the data consumer as it knows when the data is no longer needed. Frequently, data providers and data consumers are not the same.

When data providers and data consumers come from different software providers, the interacting parties may adopt or assume different underlying memory management mechanisms. For example, some software providers may choose a self-managed heap space, while others rely on the underlying operating system (OS) for such functionality. Moreover, different OSs may implement memory management differently. For example, the PalmOS provides two distinctive memory resources: heap-based and database-based. Generally speaking, different memory management mechanisms have their advantages and disadvantages, so you may not wish to assume a particular mechanism in advance. Different preferences may even lead to conflicting coding practices.

Three ways to resolve this problem are:

  • One of the interacting parties defines the underlying memory allocation mechanism for data exchange. The other party always uses the published interface to allocate or free buffers to avoid possible inconsistency. This model requires both parties to stick to a programming convention that may not be relevant to the software's basic functionality and, in general, can make the code less reusable.

  • The party driving the computation takes responsibility for management operations -- a relatively problem-free scheme when that party acts as the data provider. However, when that party acts as the data consumer, things become tricky. To avoid discovering data size, the consumer can allocate a buffer of an arbitrary size. Multiple calls to the data provider must be made if the provided data buffer is insufficiently large. This approach, therefore, requires coding an extra loop around the interaction call just in case multiple calls are required.

  • With the third option, the consumer becomes responsible for management operations. Here, however, the consumer must make an advanced call to discover the buffer size if the other party is the data provider -- placing an extra burden on the other party to code the logic for providing the size information, which may require the execution of a time-consuming algorithm. Also, this solution may introduce a serious efficiency problem: Suppose that function a() acquires data from function b(), which in turn, during its execution, acquires data from function c(). Assume that discovering the buffer sizes and providing the actual data require the execution of the same algorithm.

    To obtain a piece of data from b(), a() must make two calls: one for size discovery and one for obtaining the actual data. For each call that a() makes, b() must make two calls to c(). Therefore, by the end of this operation, the algorithm coded in c() may have been executed four times. In principle, the execution should occur only once.

Clearly, all three solutions present limitations and the conventional buffer memory management approach is not a good mechanism for coding large-scale interacting software.

Besides the aforementioned difficulties, security proves an issue here: The conventional buffer management scheme cannot easily prevent a malicious user from deliberately overwriting a data buffer, causing program misbehavior. Considering all that, a properly designed data buffer interface is required!


What is the solution?

In the last section, you saw how the conventional buffer management scheme produces several problems. In contrast, when you make an abstract data buffer, the solution becomes simple.

Conceptually, two operations create a data buffer under the conventional scheme: the creation of a data buffer entity and the allocation of actual memory. In fact, however, you needn't allocate actual memory until the actual data becomes available -- you can separate the two operations.

You can initially create an abstract buffer with an empty linked list of memory block. The abstract data buffer allocates the memory blocks when and only when the actual data becomes available. Freeing the memory becomes the responsibility of the abstract data buffer as well. Considering all that, centralizing memory management and data copying operations produces the following advantages:

  • Either party can construct and/or destroy an abstract data buffer by calling predefined API functions.
  • Memory usage is kept near optimal as buffer memory becomes allocated only when necessary and is freed as soon as possible, thus minimizing memory leaks.
  • No party needs to know the underlying memory management scheme, making the software highly portable with guaranteed compatibility between interacting parties.
  • Because no party needs to manage memory, buffer-size identification becomes unnecessary (hence there's no possibility of the multi-execution problem identified earlier).
  • Buffer overflow proves impossible as data copies only when room for extra data exists.

A simple implementation

To represent an abstract data buffer, declare two structured data types:


Listing 1. Declare two structured data types to represent an abstract data buffer
typedef struct BufferBlockHeader_st BufferBlockHeader;

struct BufferBlockHeader_st {
   BufferBlockHeader * pNextBlock;
};

struct Buffer_st {
   long  int             totalLength;
   BufferBlockHeader *   pFirstBlock;
   short int             startPoint;
   BufferBlockHeader *   pLastBlock;
   short int             endPoint;
};

typedef struct Buffer_st Buffer;

The Buffer holds the information about a created abstract buffer, and it manages a linked list of memory blocks:

  • The totalLoength records the number of bytes currently stored in the buffer.
  • The pFirstBlock points to the first memory block of the linked list.
  • The startPoint records the first byte's offset location into the first memory block.
  • The pLostBlock points to the linked list's last memory block.
  • The endPoint records the first free byte's offset location into the last memory block.

You can introduce an additional parameter into Buffer to specify each memory block's size, and you can set it to a preferred value during the initialization of an abstract buffer. Here, a default block size is assumed.

The pNextBlock in the BufferBlockHeader structure always points to the next memory block in the linked list, if allocated. Each memory block, when allocated, consists of a BufferBlockHeader header followed by a buffer block for storage of actual data.

Figure 1 illustrates an abstract buffer when stored with some data.


Figure 1. Data structure of an abstract buffer
Graphical representation of relations

Let M denote the Buffer's size (which is normally 20 bytes) and B denote the chosen size of a memory block. The overhead averages about (M+B) bytes (ignoring the pointers at the beginning of each memory block). The B in (M+B) results because, on average, only half of the first and half of the last memory block are used. This overhead is almost constant.

You must explicitly create the abstract buffer by calling the following newBuffer() function before data can be buffered:


Listing 2 Create an abstract buffer with the newBuffer() function
Buffer * newBuffer() {
   allocate a Buffer structure;
   initialize the structure;
}

In Listing 2, the function allocates a piece of memory containing a Buffer and initializes its entries to indicate an empty abstract buffer.

Correspondingly, you must destruct an abstract buffer after use by calling the following freeBuffer() function:


Listing 3. Destruct a buffer with the freeBuffer() function
void freeBuffer(Buffer * pBuffer    /* pointer to the buffer to be freed */
	       ) {
   while (there is more memory block in the linked list) {
      free the next memory block;
   }
   free the Buffer structure;
}

The function in Listing 3 frees any memory blocks in the linked list, then frees the Buffer allocated by newBuffer().

To incrementally append data segments to an abstract buffer, use the following function:


Listing 4. Incrementally append data segments to an abstract buffer
long int appendData(Buffer * pBuffer,    /* pointer to the abstract buffer */
             byte *   pInput,    /* pointer to the data source */
             long int offset,    /* offset of the input data */
             long int dataLength    /* number of bytes of the input data */
               ) {
   while (there is more input data) {
      fill the current memory block;
      if (there is more input data) {
         allocate a new memory block and add it into the linked list;
      }
   }      
}

The function in Listing 4 copies bytes stored in pInput[offset..offset+dataLength] into the abstract buffer pointed by pBuffer, inserting new memory blocks into the linked list if necessary, and returns the number of bytes successfully copied into the abstract buffer.

In a similar fashion, you can read data segments off segment by segment from an abstract buffer with the following function:


Listing 5. Read data by segments from an abstract buffer
long int readData(Buffer * pBuffer,    /* pointer to the abstract buffer */
           byte *   pOutput,    /* pointer to the output byte array */
           long int offset,    /* offset of the output byte array */
           long int arrayLength    /* size of available output byte array */
             ) {
   while (there is something more to read and there is room for output) {
      read from the first memory block;
      if (the first memory block is empty) {
         delete the first memory block from the linked list and free its memory;
      }
   }
}

In Listing 5, the function destructively reads off up to arrayLength leading bytes from the abstract buffer pointed by pBuffer, deleting memory blocks from the linked list when they become empty, and returns the number of bytes successfully read.

If required, you can implement a function similar to readData() to allow a nondestructive read.

You may benefit by implementing a function that returns the number of bytes currently stored in an abstract buffer:


Listing 6. Return the number of bytes in an abstract buffer
long int bytesAvailable(Buffer * pBuffer  /* pointer to the abstract buffer */
		       ) {
   return totalLength;
}


Advantages of using the abstract buffer

You saw, earlier, several difficult areas associated with the conventional buffer scheme. As an alternative, by centralizing the memory management and data copying operations, the proposed abstract buffer immediately eliminates the possibilities of inconsistent memory management and buffer overflow. It also leads to simpler coding and avoids the possible multi-execution problem identified earlier. To better understand the solution, let's see an example using pseudo-code, first with conventional approaches, followed by the centralized solution.

Simpler coding than the fixed-size buffer approach

Suppose function a() gets input data from function b() but does not know function b()'s size. You can avoid querying the input size by letting a() allocate a fixed size buffer and iteratively call b() until b() indicates the end of the input data:


Listing 7. Allocate a fixed size buffer and call input data
int b(byte *buf, int bufSize) {
   fill buf;
   return size of output;
}

void a() {

   byte * buf = malloc(BUFFER_SIZE);
   int size;

   if (NULL != buf) {
      while (there is more data from b()) {
         size = b(buf, BUFFER_SIZE);
         process data in buf;
      }
      free(buf);
   }

}

By using the abstract buffer, simplify the code to:


Listing 8. Call input data for an an abstract buffer
void b(Buffer *buf) {
   fill buf;
}

void a() {

   Buffer * buf = newBuffer();
   if (NULL != buf) {
      b(buf);
      process data in buf;
      freeBuffer(buf);
   }

}

No multi-execution compared with the size discovery approach

Again, suppose function a() gets input data from function b() but does not know b()'s size. To allocate a sufficiently large buffer for b(), a() must make an advanced discovery call to b() (presumably, only b() knows about the size). a() would look something like:


Listing 9. Make advanced discovery calls
int b(byte *buf, int bufSize) {
   if (NULL != buf) {
      fill buf;
   }
   return size of output;
}

void a() {

   int size = b(NULL, 0);
   byte * buf = malloc(size);
   if (NULL != buf) {
      b(buf, size);
      process data in buf;
      free(buf);
   }
}

Notice that a() calls b() twice.

By using the abstract buffer, you can write the code as:


Listing 10. Make a single discovery call for an abstract buffer
void b(Buffer *buf) {
   fill buf;
}

void a() {

   Buffer * buf = newBuffer();
   if (NULL != buf) {
      b(buf);
      process data in buf;
      freeBuffer(buf);
   }

}

Which only calls b() once.


In conclusion

This article examined problems produced when two C functions interact using a conventional data buffer management scheme. Such problems may become major issues when coding large-scale interacting software. As an alternative, a self-managing abstract data buffer can combat those problems. Implementing this proposed abstract data buffer should be a relatively trivial task for an average C programmer.

To benefit from the solution, you must clearly define a lasting concrete abstract data buffer interface. Adopting such an interface will simplify future code development. However, treat existing code migration to the interface with caution, using a case-by-case analysis weighing the costs/benefits.


Resources

  • Check out the classic C programming manual: The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie (Prentice Hall PTR, 1988).

  • Find an extensive coverage of C memory management techniques in C Memory Management Techniques by Len Dorfman and Marc J. Neuberger (McGraw-Hill Osborne Media, 1992).

  • Get an explanation of Palm OS memory management in Palm Programming by Glenn Bachmann (Sams Publishing, 1999).

  • developerWorks Web Architecture zone: Expand your site development skills with articles and tutorials that specialize in Web technologies.

About the author

Staff Software Engineer Xiaoming Zhang, having spent nine years as an academic carrying out research, gave up his university lectureship to join IBM in 1998. Since then, he spends most of his time designing and developing messaging middleware targeted for small devices. He is also particularly interested in the data security aspects of messaging. Xiaoming, who holds a PhD in computer-speech signal processing from the University of Wales Swansea, has published nearly thirty conference and journal papers covering speech signal processing, parallel numerical computation, application of functional programming, and image processing.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=11869
ArticleTitle=Self-manage data buffer memory
publish-date=01062004
author1-email=zhang@uk.ibm.com
author1-email-cc=