Trace is fairly important in wireless applications but, unfortunately, is often overlooked until it is too late. Developers might find it can be difficult to get trace right due to the limited specifications of the devices they're using. Additionally, trace is often an easy target for removal when project planning and footprint discussions take place -- particularly because it doesn't add much perceivable benefit to the end user. After all, when was the last time you saw a product specification sheet containing the boast, "Now with 25% more trace"?
Developing a good trace infrastructure for your wireless application presents a unique set of problems, mostly because the devices in question are typically underpowered when compared with desktop/server machines -- particularly in respect to memory/disk space.
In many ways, it can be more important to catch detailed diagnostic information in wireless applications because, by their very nature, problems can be harder to reproduce than in server-type applications.
While the main design criteria for this implementation was for the wireless market, the implementation is just as a valid on the more heavyweight platforms.
Before designing my trace infrastructure, it is worth taking a moment to consider what I want the trace to accomplish.
- General trace requirements:
- Recording function entry
- Recording function exit (with return codes)
- Recording of arbitrary data
- Specific requirements for the wireless environment:
- Small trace file
- Small runtime-footprint
- Small performance/CPU hit, particularly when not tracing
- Nonessential requirements (nice-to-have features):
- Ability to start and stop trace during execution
- Tracing of input/output function parameters
- Recording of timing information
Specific design criteria and problems encountered
Implementation of the first three design requirements (general trace) is relatively straightforward, as you will see in the following section. I have implemented these requirements as three separate function calls. After that, things get interesting when I focus on the next set of requirements (specific requirements). I'll discuss the specific requirements in detail first:
Because the trace will be running on devices with somewhat limited local storage, it is imperative that you minimize the size of the files produced -- ultimately this will allow you to capture more information.
One easy way to implement trace is to write the data you want to capture to the trace files in human-readable form. For example, perhaps capturing the input to a function might be of the entry form (assuming time(t), thread id(n) and function name are captured):
Entry: [tttttt](nnnnn) MyExampleFunction |
While this is easy to implement and use (the end user can simply read the data on the device or copy to a PC for analysis), there are a number of serious flaws with it. The most serious, when considering the current topic in hand (file sizes), is that this is a wasteful way to record the infomation as compared to a binary (non human-readable) implementation.
For instance, consider the ThreadID. On PocketPC, the ThreadID is 4 bytes in length. When converted into decimal this ranges from 0 - 4294967295, or 1 to 10 digits (and, therefore, bytes, assuming simple ASCII encoding). On average, the number of digits will be 5.5, which wastes 1.5 digits for every thread id. This does not take into account that if the field is not a fixed length, you must delimit it from the next field. This takes up at least one extra digit; thereby wasting 2.5 digits per record.
The function names are even more pronounced. By using the symbolic name for the functions, I will be using N characters (where N is the average length of the function name). It is not uncommon for the average length of a function name to be 10 characters or more. By using a non-human-readable approach, you can have a lookup table of function names where the index is a 4- (or even 2 if there are not many functions) byte number. This saves N-4 bytes per entry.
By the time you have added formatting to make the output look nice (brackets, spaces etc.), a binary trace implementation can offer the same amount of data as a human-readable implementation -- using one-half to one-third of the space.
As you can see, I am leaning in the direction of a non-human-readable file format, which will be converted into a human-readable form using an external tool.
For more information on small trace buffers and structure padding, see Structure Padding.
In order to achieve this objective, you should try to keep the code as simple as possible (that is, reduce code size). Also, try to reduce the size of any static buffers that are working (such as, reduce memory requirements).
This is another area in which using a binary file format can help. As I mentioned in the previous section, I would like to use a lookup table to map function IDs to function names. However, this still leaves me with a potentially huge lookup table that will take up valuable disk/memory space. By moving the formatting of the binary trace file to an external tool, I can avoid storing this lookup table on the device at all. I can create an external tool, separate from the main trace functionality, which can then be run on a less constrained machine (a Windows/Linux PC in this example). This provides massive savings on disk space of NxM bytes (where N is the average function name length and M is the number of functions).
Small performance hit (particularly when not tracing)
In order to reduce the performance hit I'll do two things:
- Try to keep the trace code simple.
- Streamline/optimize the code for the not tracing case (as this will be, by far, the more likely route). I would like to keep the trace performance hit down to one or two CPU instructions when I am not tracing. This is vitally important in order to encourage the use of the trace library. If it noticeably slows the application down even when not tracing, users will be less likely to add trace calls to their functions.
What I've discussed so far are the main things I kept in mind when coming up with the design and implementation that I will now explain. However, it should be noted that many of these objectives are mutually exclusive (keeping the code simple versus compressing the generated trace file, for instance), so it is a balancing act from that point of view.
As with most articles such as this one, the way to learn the most from this series is to read and understand the design and implementation decisions and then tailor them to your particular needs. You don't need timestamps? Get rid of them. You don't need more that 216 functions? Reduce the function ID to 2 bytes. You get the idea.
In the supplied implementation, the trace infrastructure is built as a DLL, which is distinct from any application that uses it, although this is not a strict requirement. It could just as easily be embedded within an application (this would obviously have footprint implications if multiple applications use the functionality).
The basic building block of the implementation is the Trace Control Block, which is a structure defined in dwTrace.h called dwTraceControlBlock.
An application wanting to use this trace implementation should, on initialization, create a single dwTraceControlBlock for convenience, usually on the stack in main or as a static global variable. This variable is then passed as the first parameter of any trace calls made.
Once you have a dwTraceControlBlock there are eight API calls that can be made.
dwTrace_initinitializes the dwTraceControlBlock. This call must be made against the trace control block before it can be used in any of the other calls.dwTrace_termuninitializes the dwTraceControlBlock. This call must be the last call made against the trace control block, after which it cannot be used in any of the other calls.dwTrace_checkchecks to see if the tracing status has changed (that is, whether tracing has been turned off or on) and takes the appropriate actions (opening files, etc.). This function should be called periodically within the application to pick up any changes which have been requested.dwTrace_start_Start tracing (open trace file etc.) is not usually called directly. This function is called bydwTrace_checkwhen it determines that the user has requested trace be enabled.dwTrace_stop_Stops tracing (close trace file, etc.). Again, this function not usually called directly. This function is called bydwTrace_checkif the user has asked for tracing to be turned off during its tracing. The function is also called bydwTrace_termat shutdown if we are tracing.dwTrace_entry_should be called on entry to a function to trace that fact to the trace file. The call takes aMethodIDas an argument.dwTrace_exit_is a function that should be called on exit from a function to trace that fact to the trace file. the function takes aMethodIDas an argument. It also takesReturnCodeandReasonCodeto be entered into the trace file .DwTrace_data_is used to trace arbitrary data to the trace file. The function takes aMethodIDas an argument. It also takes a format string (char*) and a variable number of arguments. This format string and arguments are the same as you would pass to(s)printf.
You will notice that the last three functions (entry, exit, and data) have underscores on the end. This is to signify that these functions are not to be called directly. You want to streamline the code for the instance when you are not tracing; it would be wasteful to always call the trace entry, exit, and data functions and then check within those functions to see whether you were tracing. If you were to do this, you would have lots of function calls taking place, with no actual work being done within them. As function calls are quite an expensive operation on most architectures, it would be preferable to determine whether you are tracing outside those functions and then only call them if you are tracing. This will increase the code size slightly (as you will have a few extra instructions called the trace functions), but I believe that the vast improvement in efficiency is worth it. As I mentioned earlier, if this is not true for your implementation (that is, size is much more important than performance), then, by all means, move the checking into the functions and call them directly.
In order to alleviate this checking responsibility from the trace user, I have implemented C Macros, which I called instead of the functions.
In an ideal world, this could be achieved by writing three macros, which would check whether I am tracing and then call the appropriate functions. This is fine for the entry and exit calls -- two macros are supplied for them (DWTRACE_ENTRY and DWTRACE_EXIT).However, the dwTrace_data_ function is a var-args function. This is a convenient way to pass an unknown number of parameters into a function. Unfortunately, there is no standard way to write a C Macro that will accept a variable number of arguments. There are several non-standard compiler extensions to do this and numerous bizarre looking hacks, which give the approximate functionality at the expense of looking slightly strange (such as having extra or missing brackets). For ease of use and understanding, I have taken a simplistic approach and just written a number of different data macros, with each taking a different number of parameters (DWTRACE_DATA_0_PARAM, DWTRACE_DATA_1_PARAM, DWTRACE_DATA_2_PARAM, etc.).
This is slightly unwieldy, but I think it's a reasonable compromise because it does not increase footprint (macros are replaced during pre-processing and, therefore, they all boil down to the same function call, they are only really being used to match the same functionality onto a different number of arguments). In addition, they do add some compile-time checking (actually pre-processing time). That is, if you call DWTRACE_DATA_2_PARAM with only one argument, it will not compile, whereas most compilers will happily compile a var-args function with an incorrect number of arguments (which will more than likely go horribly wrong at run-time).
After reading this article and the associated source files, you should see a basic binary trace infrastructure. This implementation can be used as-is, but in the next two articles I will be adding some much-needed functionality to build this simple implementation up to a production-quality implementation. You'll end up with something that can be deployed in your own wireless applications.
Aggressive techniques to further reduce file sizes.
- Find out more about structure padding in the comp.lang.C FAQ.
- You might also check out Structure Alignment and Padding at ByteSwap.net.
- Here is a case study illustrating how padding was used incorrectly, and how the problem was solved.
- The Apache HTTP server logs and traces; see the user's guide.
Martyn Honeyford graduated from Nottingham University with a BSc in computer science in 1996. He has worked as a software engineer at IBM UK Labs in Hursley, England, ever since. His current role is as a developer in the WebSphere MQ Everyplace development team. When not working, Martyn can usually be found either playing the electric guitar (badly) or playing video games more than most people would consider healthy. You can contact Martyn at martynh@uk.ibm.com.