Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Improve collaborative build times with ccache

How to squeeze more speed out of your compilations

Martin Brown (questions@mcslp.com), Freelance writer and Consultant
Martin C. Brown is a former IT Director with experience in cross-platform integration. A keen developer, he has produced dynamic sites for blue-chip customers including HP and Oracle and is the Technical Director of Foodware.net. Now a freelance writer and consultant, MC, as he is better known, works closely with Microsoft as an SME, is the LAMP Technologies Editor for LinuxWorld magazine, is a core member of the AnswerSquad.com team, and has written a number of books on topics as diverse as Microsoft Certification, iMacs, and open source programming. Despite his best attempts, he remains a regular and voracious programmer on many platforms and numerous environments. You can contact MC at questions@mcslp.com or through his Web site.

Summary:  Collaboratively building a C/C++ project using cc or gcc to share source files and other components works fine with CVS, but the time required to build the application when it has been merged with everybody else's changes can be significant. Even if you're not developing a project as part of a group, recompiling an application can take a lot of time. The ccache tool improves the build performance by caching the incorporation of header files into source files and therefore speeds the build time by reducing the time required to add in header files with each compilation stage. In this article, learn how to build and install ccache, how to use it with your existing environment, and how to improve the build times in group development projects. You will also see how to use ccache and distcc together to get the best performance out of your development environment.

Date:  24 Aug 2004
Level:  Introductory

Comments:  

In the standard build process, developing applications under UNIX with C/C++ generally involves using a compiler such as gcc with some kind of build tool, such as make. The problem with make and just about any C compiler is the way in which the C preprocessor and header files work. If you have a look at a typical C source file, it will incorporate a number of #include references to the various header files.

Each time you compile a file, the C preprocessor (cpp) parses and then includes each of these files and any files that are in turn referenced. By parsing the content, cpp can turn what was a fairly basic 1-KB source file into an 8-KB source file and, in the process, incorporate tens or even hundreds of header files. In a typical development project, a number of header files relevant to the project may be included multiple times in different source files, and each header file may itself reference many others.

During a typical build, the make tool simplifies the process considerably by compiling only the files that have changed since the target was last built. For example, the directory in Listing 1 shows that the foo.o object is older than the last modification time of the corresponding foo.c source file. Meanwhile, bar.o is newer than bar.c. With a suitably configured Makefile, only foo.o would be recompiled from its source.

The problem with make is that it uses the timestamp as the only method of actually determining whether the file needs to be recompiled. Because make also takes header files into the equation, a simple header file change will trigger a later timestamp and therefore a recompilation of the source files. But because of the way CPP works, for example, using the #ifdef statement, these changes may not affect the files that include them. A simple change of modification time can therefore mean that you end up re-compiling source files for no reason. Individually, that time can be significant. Across a team, the effects may be even more pronounced as multiple developers may end up repeating the process many times, maybe even simultaneously over the course of a typical day.


Listing 1. A sample source environment
total 808
-rw-------  1 mc  mc    5123 24 Jul 14:17 bar.c
-rw-------  1 mc  mc   39474 24 Jul 14:19 bar.o
-rw-------  1 mc  mc    7856 24 Jul 14:17 foo.c
-rw-------  1 mc  mc   28443 24 Jul 14:19 foo.o
-rwx--x--x  1 mc  mc  319742 24 Jul 14:19 foobar*
-rw-------  1 mc  mc    1045 24 Jul 14:21 foobar.h


Using ccache

The ccache (short for compiler cache) tool caches the object files generated by a compilation. It uses a hash based on the output of CPP to provide a reliable method of determining whether the raw source before actual compilation has been changed. This method is more reliable than the timestamp system used by make. For example, using make without ccache on the source in Listing 2 would compile the file as normal.


Listing 2. Source file contents
#include "foobar.h"

void main(void)
{
}

If we edit one of the header files referenced by foobar.h, then standard make would definitely recompile the file in Listing 2, irrespective of the actual changes. With ccache, the output of the CPP version of the file would be checked with the hash of the previous compilation. If they matched, nothing would be done. If they didn't, it would recompile. It's as simple as that. On a single file, you are not going to see much advantage, but for over 20 or 30 or, certainly, hundreds, of files, the speed gains would be significant.

Installation

Installing ccache and using it is not as complicated as you might think. It doesn't replace or in any way affect the way you already use your compiler; instead, it acts as an interface between you and your compiler, so you can choose whether or not to use it according to your needs. To install ccache, download the source directly from the Samba group or a local mirror (see Resources at end of this article). Unpack the contents of the file:

$ bunzip2 -c ccache-2.3.tar.bz2|tar xf -

Change into the directory:

$ cd ccache-2.3

Configure:

$ ./configure

Build:

$ make

And finally install ccache:

$ make install

You're all ready to go!

Deployment

As mentioned above, ccache works by sitting between you and your normal compiler. Instead of calling gcc, you call ccache with gcc as the first argument. For example, to compile a file from the command line, you would normally use:

$ gcc foo.c

To use ccache you would type:

$ ccache gcc foo.c

On a single compilation of a file like this, especially if it is the first time the file has been compiled using ccache, you won't see any benefit, because the compilation information has not been cached yet. Therefore, it is generally more effective to configure ccache as a permanent replacement for your main compiler. To do this, set the value of the CC environment variable:

$ export set CC='ccache gcc'

If you want to enable ccache only on a project basis, say when compiling third-party tools such as Perl, then you can either use the environment trick or tell the configure script or make command what C compiler to use.

Controlling caches

By default, ccache uses a directory within the current user's home directory ($HOME/.ccache) to hold the cache information. In a team environment, you'll want to use a central location for the cache so everybody can use the cache information during builds. Another environment variable, CCACHE_DIR, specifies the location of the cache directory. In a single-machine environment, set this to a directory that is accessible to everybody who needs it. For a greater speed boost, and providing you have the memory to support it, use a directory mounted using tmpfs. You could get an additional speed boost of between 10 and 25 percent.

If you are using ccache across a network of machines, make sure the directory you share is exported over NFS and mounted on each client. Again, you can use a tmpfs filesystem if you want to get an extra speed boost.

Some additional options give you further control over the cache settings:

  • The CCACHE_LOGFILE environment variable defines the location of a log file that will be populated as you use ccache.

  • Use the -s command line option with ccache to get statistics about the cache performance (see Listing 3).

  • Use the -M command line option to set the maximum size of the cache. The default is 1GB. The settings for the cache are written to the cache directory, so you can have different cache sizes in different locations for different users and groups.

  • The -F option sets the maximum number of files, rounded to the nearest 16, for the cache directory. As with -M it should only need to be used when you want to change the configuration.

  • The -c option cleans the cache. You shouldn't normally need to use this, as ccache updates the information during execution, but if you reuse a cache directory that has not been used for a file, you might want to try this option.

  • The -C option completely clears the cache.

Listing 3. ccache cache statistics
cache hit                             44
cache miss                           152
called for link                      107
compile failed                        11
no input file                          2
files in cache                       304
cache size                           8.8 MB
max cache size                     976.6 MB

Once you've set the initial options and configured the directory and cache size you want, there's no need to change anything. There's also no need to perform any regular maintenance.


Combining ccache and distcc

You may already be aware of distcc, another tool from the Samba group, which allows you to spread the compilation across a number of machines. This effectively increases the number of simultaneous compilations that can occur, just as if you had used the multiple jobs option with make (with the -j command line option). The distcc system works by using on each host a daemon that accepts source files in their final pre-parsed form, and then compiles the files locally before returning the resulting object file.

When used properly, you can generally expect to get a slightly less than linear decrease in build times for each new identical node you add, although you'll only see the effects on projects where you have more than one source file, as distcc only distributes whole source files.

Because distcc distributes files in their parsed state, you can combine ccache, which speeds up the C preprocessing portion, with distcc to perform the actual compilation into object code. To use distcc and ccache in this way, configure distcc on your hosts and distcc and ccache on your main development machine.

Now set up your environment variables on the machine on which you want to build the project, as shown in Listing 4.


Listing 4. Environment variables for using ccache and distcc
export set DISTCC_HOSTS='localhost atuin nautilus pteppic kernel'
export set CCACHE_DIR=/Data/Cache/CCache
export set CCACHE_PREFIX=distcc
export set CCACHE_LOGFILE=/Data/Cache/CCache.log
export set CC='ccache gcc'

The variables are defined as follows:

  • DISTCC_HOSTS specifies the hosts to distribute work to.

  • CCACHE_DIR specifies the location of the ccache directory.

  • CCACHE_PREFIX defines a prefix to use when ccache calls the true compiler to compile the source (after preprocessing).

  • CC sets the name of the C compiler to use in the first instance (ccache).

Now when you run make with the -j option to specify the number of the simultaneous compiles to perform, the file will first be parsed with ccache (using the cache if necessary) before being distributed to one of the distcc hosts.

Although distcc speeds up the process of compiling, it doesn't alter the basic limitations of the environment. For example, you shouldn't set the number of simultaneous jobs operated by make to more than twice the number of available CPUs. For example, if you have four two-way machines, you won't notice much improvement in speed if you set the job value above 16.


Statistics

Now that everything is set up, it is time to see how much of a difference it makes. I've run a series of tests here building Perl. We need something meaty to compile, because ccache works best when it has cached the parsed header files. This is just the make phase, which occurs after we've run a standard configure (using configure.gnu), and it includes all the stages, even those not related to compiling code. These non-compiler operations won't affect the overall statistics.

As previously mentioned, the effects of ccache won't be felt on the first compilation. It's the recompilation, where you are reusing previous preprocessor passes, that will make a difference. The recompile times in Table 1 are based on simply doing a touch on each of the C source files in the main Perl source directory. I've included the times for building with plain gcc, ccache+gcc, and ccache+distcc+gcc in a four-node network with various values of concurrent distcc jobs.

Table 1. The recompile times

EnvironmentTime
gcc (first run)8m02.273s
gcc (recompile)3m30.051s
ccache+gcc (first run)8m54.714s
ccache+gcc (recompile)0m45.455s
ccache+distcc+gcc -j44m14.546s
ccache+distcc+gcc -j4 (recompile)0m38.985s
ccache+distcc+gcc -j83m13.020s
ccache+distcc+gcc -j8 (recompile)0m34.380s

Wow! Just using ccache alone saves almost 3 minutes (well, 2 minutes and 45 seconds) on the build time of Perl, all because ccache has kept pre-parsed versions of the header files and used them in place of continually re-running cpp on each source file. Incorporating distcc into the process gives results in an overall speed increase as well as slightly faster recompile times.


Summary

In this article, you've seen how much of a speed improvement you can obtain from a relatively straightforward and easy-to-use tool such as ccache. You can improve compile times even more when using ccache in combination with distcc. And when using these tools in a team environment, you could save hours every day just in compile times. That means fewer excuses for coffee breaks for your development staff, but also a decrease in development times for your applications.


Resources

About the author

Martin C. Brown is a former IT Director with experience in cross-platform integration. A keen developer, he has produced dynamic sites for blue-chip customers including HP and Oracle and is the Technical Director of Foodware.net. Now a freelance writer and consultant, MC, as he is better known, works closely with Microsoft as an SME, is the LAMP Technologies Editor for LinuxWorld magazine, is a core member of the AnswerSquad.com team, and has written a number of books on topics as diverse as Microsoft Certification, iMacs, and open source programming. Despite his best attempts, he remains a regular and voracious programmer on many platforms and numerous environments. You can contact MC at questions@mcslp.com or through his Web site.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=15075
ArticleTitle=Improve collaborative build times with ccache
publish-date=08242004
author1-email=questions@mcslp.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).