Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

A significant trace, Part 2

Aggressive techniques for reducing file sizes

Martyn Honeyford, Software Engineer, IBM UK Labs
Martyn Honeyford graduated from Nottingham University with a BSc in computer science in 1996. He has worked as a software engineer at IBM UK Labs in Hursley, England, ever since. His current role is as a developer in the WebSphere MQ Everyplace development team. When not working, Martyn can usually be found either playing the electric guitar (badly) or playing video games more than most people would consider healthy. You can contact Martyn at martynh@uk.ibm.com.

Summary:  Part 1 of this three-part series to design a binary-trace architecture suited for wireless applications provided a simple reference design for a basic trace infrastructure. Expanding on that architecture, Martyn Honeyford shows in this article how to effectively reduce file sizes. Part 3 will complete the implementation by adding even more functionality and measuring how closely the goals have been met.

Date:  06 May 2003
Level:  Intermediate

Activity:  990 views
Comments:  

Application-level tracing can serve as an effective tool for diagnosing software problems. Unfortunately, designing and implementing an efficient trace mechanism can prove difficult. As a result, many applications (particularly in the wireless field) lack such tools. With that in mind, this three-part series strives to do the hard design and implementation work for an application-trace infrastructure that you can use in your projects.

Part 1 of this series established the trace infrastructure's desired features and supplied a basic implementation. At the end of this series you will have developed a full-featured implementation. Before getting there, however, this article explores methods for minimizing the amount of trace captured while maximizing the useful information captured. To accomplish this goal, the article proposes two main approaches. The first approach is algorithms that let the trace-file wrap so you can leave tracing on without filling up the available storage. The second approach uses an algorithm for filtering the trace at the capture point to throw away extraneous information without storing it; thereby leaving more space to store the information in which you are interested.

Moreover, for related information on how to efficiently synchronize lookup tables required by the solutions presented here, see this article's Implementation notes.

Limitations

While the solution in part 1 provided a good base implementation that's usable as-is, it can be improved. As I discussed in part 1, minimizing file size must be a primary concern. The previous solution did so by keeping the trace records small by writing them in a binary format. One big problem remains, however. The trace file will continue to grow until it has used up the device's storage, which unacceptably destabilizes the device or its applications. To prevent that situation, you must track the trace file's size and take appropriate action before it gets too large -- and that's the goal of this article.


Maximum file-size algorithms

To avoid the size problem, let's consider four possible algorithms as solutions.

Algorithm 1: Stop

The first and simplest algorithm, Stop, simply stops tracing when the file size reaches a specified level. While this algorithm proves simple to implement, it severely and unacceptably limits the usefulness of the trace, particularly for long-running applications.

Algorithm 2: Restart

The Restart algorithm improves upon the Stop algorithm. When the file reaches a specified size, it erases the current file and then continues tracing. Although Restart proves somewhat more usable than Stop, it also has a fairly major flaw. The captured file's size will be zero to N bytes (where N is the chosen maximum file size) with the average being N/2 -- a problem because you might catch little useful information.

Algorithm 3: Single Flip

Single Flip ensures that when the current file (F1) reaches N/2 bytes, you stop writing to it and start writing to a second file (F2). You continue writing to F2 until it reaches N/2 bytes, at which point you delete F1 and start writing to it again. The captured information's size will range from N/2 to N bytes with the average being 3N/4. That is quite an improvement over Algorithm 2, at the small expense of slightly more complicated code.

Algorithm 4: Multiple Flip

Multiple Flip extends Single Flip to the general case. Rather than writing to just two different files, it writes to M different files. The captured information's size will range from (M-1) N/M to N, with the average being (M-1) N/M + N/2M. By looking at the formulas, notice that increasing M increases data captured, with the best case being M=N. In reality, however, don't set M too high because:

  • Each file's maximum length will be N/M. You'll need to fit at least one record in each file, so if you set M too high you won't be able to trace large trace records.
  • The overhead associated with deleting and opening files will adversely affect performance.
  • The implementation includes a header at the start of each file. If M remains fairly small, the overhead proves negligible, but as M increases, the overhead becomes increasingly important.

For now, I'll implement the fourth algorithm, Multiple Flip. In Part 3 next month I'll present a fifth, and final, algorithm called In-Place Wrapping.


Filtering

Continuing with the general file-size-reduction theme, you may find a facility that traces only those sections in which you are interested very useful. For example, if you suspect that your application is failing at a specific section such as the portion that accesses a database, then you'd find it helpful to trace only calls in that code section. Likewise, you'll find such a facility essential to encourage developers to fully employ the trace facilities. With a sophisticated filtering scheme in place, the application developer can, for instance, add trace to every memory allocation and deallocation, and feel safe knowing that he controls whether that trace is actually generated. You'll find it much easer to put all the trace calls in place and then decide at run time whether they'll be traced than to leave the calls out for fear of generating too much trace. Once application code exists in the field, it is at best tricky, and at worst impossible to add further trace statements. The first step toward filtering is to let the user specify which functions will generate trace.

To provide the filtering functionality, you must first partition the functions in some way. With that in mind, I have chosen a three-level hierarchical structure. The highest level, component, is the broad area where this function belongs. For example, in your application you might have components in databases, networking, GUIs, and so on. The next level down, package, further refines the functional area. For example, the GUI component might be divided into packages such as MainForm, PreferencesForm, DataInputForm, and so on. Finally, at the lowest hierarchical level, function, each package breaks down into individual functions. For example, the MainForm package might have such functions as DrawButtons, DrawLabels, DrawMenus, and so on.

The partitioning scheme you choose depends on your application's structure and how much control you require over the filtering; you might choose more or fewer levels.

Once you've determined the structure, you next must decide how to represent the information. Along those lines, I have encoded the information as part of the function ID, and I have partitioned off the high-order bits in the function ID and used these to store the component and package. In the supplied implementation, the function ID is a 32-bit number, with the high-order 8 bits forming the component, the next 8 bits forming the package, and the rest (the low-order 16 bits) giving the function ID itself. For efficiency reasons, you could represent the component and package as a bit-mask. If you represent the component and package in this way, you can simply store a single mask that indicates which components and packages to trace and then efficiently test whether a given function should be traced with a single AND instruction.

However, this solution produces a serious problem; a single bit is required for each entry that can be filtered individually. That is, if you use the above representation, you can have only 8 different components, each containing a maximum of 8 different packages, with each containing a maximum of 16 functions. You can always increase the function ID's size, but that will also increase the trace records' sizes (as each record contains the function ID).

With that in mind, instead specify what to trace as a list rather than as a mask. While this solution proves less efficient in run time speed, given that a candidate function ID must be checked against each element in the list, the solution allows many more components (256 in total), packages (256 per component), and functions (65,536 per package). Using the new scheme, simply supply a list of which components or packages to trace. I call this additive filtering because you must add an entry for everything you want to trace.

As an additional feature, you can allow subtractive filtering -- that is, deciding what not to trace rather than what to trace. By combining these two techniques, you can finely control what you trace. For convenience, allow wildcards so that, for instance, all packages for a given component (or all functions for a given package) can be listed as a single entry. Similarly, for convenience, if no entry exists in the add list, you assume to add everything and if the subtract list has no entry, you assume to subtract nothing.

Look at the algorithm that decides whether to trace a particular trace point.

Given a function ID (F), you should trace:

  • Additive phase:
    • If no entries exist in the add list, assume add.
    • For each entry in the add list, while not added:
      • If the component is present with a wildcard, add.
      • If the component plus the package is present with a wildcard, add.
      • If the exact function is present, add.
      • If not added, don't trace.
    • If the function has been added during this phase, continue to the next phase to see if the function will be removed or traced.
  • Subtractive phase:
    • If no entries exist in the subtract list, trace.
    • For each entry in subtract list, while not traced:
      • If the component is present with a wildcard, don't trace.
      • If the component plus the package is present with a wildcard, don't trace.
      • If the exact function is present, don't trace.
      • If not matched, trace.

The algorithm shown above produces a flexible filtering mechanism. For example, suppose you want to trace all GUI and network functions, but you aren't particularly interested in those in the PreferencesForm package. In the add filter, insert entries corresponding to the GUI and network components with wildcards (to ask to trace only methods from the GUI and network components). In the subtract filter, place an entry for the PreferencesForm package. When each trace point is to be traced, all functions from the GUI and network components will make it through the additive phase, the subtractive phase will remove those functions in the PreferencesForm package. That's all there is to it.


Further improvements

By implementing this article's algorithms to reduce the size of the generated trace files, you've improved on the basic binary trace infrastructure described in part 1. Specifically, you now possess much greater control over the generated trace amount and the created files' sizes. While you can use this article's implementation as-is, I will add, in the next part of this series, the finishing touches that transform the implementation into a production-quality tool you can deploy into your own wireless applications. You can expect to see:

  • Full entry/exit parameter tracing
  • The In-Place Wrapping algorithm

Resources

About the author

Martyn Honeyford graduated from Nottingham University with a BSc in computer science in 1996. He has worked as a software engineer at IBM UK Labs in Hursley, England, ever since. His current role is as a developer in the WebSphere MQ Everyplace development team. When not working, Martyn can usually be found either playing the electric guitar (badly) or playing video games more than most people would consider healthy. You can contact Martyn at martynh@uk.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Architecture
ArticleID=12260
ArticleTitle=A significant trace, Part 2
publish-date=05062003
author1-email=martynh@uk.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers