Reducing the build time for C/C++-based systems is one of the major technical challenges any
release or build engineer faces. This article looks into some of the open source tool options available that help
speed up the build process by parallelizing the activity: distributing the build process across
multiple machines in a local area network. The discussion in this article primarily focuses on GNU
make, due to its wide availability.
The –j option in GNU make
make is a sequential utility. It serially invokes the
underlying compiler to compile C/C++ sources. Typically, C/C++ source files (usually with a .cpp/.cxx
extension) can be built without depending on each other. You do so by invoking
make with the
–j option. Listing 1 shows a typical
Listing 1. Typical GNU make invocation
make –j10 –f makefile.x86_linux
The argument to
–j -- 10 -- is the maximum number of simultaneous
compilations that can ensue once the build process starts. If no argument is provided to
-j, then all source files are queued up in the system for simultaneous compilation.
-j option makes particular sense when you're running the build on
a multicore system. To make the
-j option work for you, you must address
several key issues; these are discussed in the next section.
Issues and potential solutions when using the –j option
First, you should check your system configuration. On a low-memory (<512MB RAM) system, too
many simultaneous compilations can slow the system due to paging. The compile time increases in
such cases. You need to experiment to figure out the optimal value of
for your system. Another option is to use the
–load-average option of the GNU
-j, which keeps firing jobs only if the system load is less
than a certain level.
You can also use the same temporary file for independent compilations. Consider the
make snippet shown in Listing 2.
Listing 2. Makefile with the same temporary file y.tab.c
my_parser : main.o parser1.o parser2.o g++ -o $* $> parser1.o : parser1.y yacc parser1.y g++ -o $* -c y.tab.c parser2.o : parser2.y yacc parser2.y g++ -o $* -c y.tab.c
Assume that the grammar files parser1.y and parser2.y are located in the same directory.
During sequential compilation, the file y.tab.c is generated by yacc (where y.tab.c is the default
filename) for parserl and then parser2; but in parallel mode, this results in a conflict. You can
solve this situation a couple of ways: keep the two yacc files in separate folders; or use the
–b option to generate two different C outputs, as shown in Listing 3.
Listing 3. Use the –b option of yacc to generate unique filenames
parser1.o : parser1.y yacc parser1.y –b parser1 g++ -o $* -c parser1.tab.c
You must take a close look into the makefile to figure out such situations, where parallelizing an otherwise fine script in serial mode will mess things up if it's run in parallel.
Some makefile rules have implicit dependencies. Consider the situation shown in Listing 4, where a Perl script generates a header that is included by other sources.
Listing 4. Makefile with implicit dependencies
my_exe: info.h test1.o test2.o g++ -o $@ $^ test1.o: test1.cxx g++ -c $< test2.o: test2.cxx g++ -c $< info.h: make_header #shell script that generates the header file
The info.h header is included by test1.cxx and test2.cxx. In serial build mode,
make works from left to right, and the file info.h is generated first.
However, in parallel build mode,
make is free to process all
dependencies in parallel -- this can potentially result in some compilations failing intermittently
because info.h may not be generated before the compilation of test1.cxx and/or test2.cxx starts. To
fix this problem, it makes sense to remove info.h from the dependency list of my_exe and put it in
the dependency list of test1.o and test2.o. It's also advisable to use another wrapper to ensure that
info.h is generated only once. Listing 5 shows the modified version of the make_header script, and
Listing 6 shows the makefile.
Listing 5. Modified version of make_header script to prevent multiple writes
#!/usr/bin/bash if [ -f info.h ] then exit fi echo "#ifndef __INFO_H" > info.h echo "#define __INFO_H" > > info.h echo "#include <iostream>>" > > info.h echo "using namespace std;" > > info.h echo "int f1(int);" > > info.h echo "int f2(int);" > > info.h echo "#endif" > > info.h
Listing 6. Modified version of the makefile from Listing 4
my_exe: info.h test1.o test2.o g++ -o $@ $^ test1.o: test1.cxx info.h g++ -c $< test2.o: test2.cxx info.h g++ -c $< info.h: make_header #shell script that generates the header file
-j can extract sufficient
parallelism if you create the makefile properly. Try to avoid unnecessary dependencies in the makefile
Note that GNU
make can only extract parallelism for a single machine.
The next section introduces
distcc, a tool that lets you share the build process on multiple machines.
distcc tool can distribute the builds of C/C++ code across multiple
machines. Each of these machines must have
distcc installed. Here's a quick
installation and configuration reference:
distcc(see the Resources section).
- Build the
distccsources on all machines by executing
./configure; make && make install.
- The build process starts from one machine and is then distributed on all the other machines
(servers). On all the servers, start the distccd daemon (you must have root privileges to do this).
distccd resides in /etc/init.d folder. The syntax to start it in root mode is
tcsh-arpan# /etc/init.d/distccd start
And the syntax to start it in user mode is
tcsh-arpan$ sudo /etc/init.d/distccd
You can also run
distccdaemon processes in user mode by running
distccd –daemon –j N, where
Nis the number of jobs you want to run on a given machine.
- The local machine needs to know which servers the build processes should be distributed to.
Depending on your shell, issue a modified version of this command:
export DISTCC_HOSTS='localhost tintin asterix pogo'
tintin, asterix, and pogo are other hosts in the network that can host build processes;
localhostrefers to the local machine.
- Instead of using the export directive, you can also create a file named hosts and put the names of
the servers in that file, separated by spaces. Copy this file to the $HOME/.
distcc has been installed, the only thing remaining to be done
is to fire the build. Here's the invocation:
make –j4 CC=distcc –f makefile.x86_linux
Key things to keep in mind while working with distcc
distcc to work to your advantage, you must keep several things in
- The machines must have identical configurations. This means the same version of the g++ compiler must be installed on all the machines, along with related build tools like ar, ranlib, libtool, and so on. The type and version of the operating system should also be the same.
- From the client machine,
distccsends the preprocessed code to the server machines. You need to verify whether the
distccddaemon process is running on the server machine.
- By default, the number of jobs that
distccschedules on a single machine is (no. of CPUs) + 2. For a single core machine, this number is 3. Keep this in mind while you're firing the processes: a command line like
make –j10 CC=distcc, where there are only three hosts, means nine compile jobs are fired initially.
- Verify that the underlying machines can access the requisite file systems on which source files are stored. On Network File System (NFS) based systems, some source areas may not be mounted, which will result in compilation fails. You must also carefully monitor network congestion.
distccis used to compile the sources over the network. The linking step(s) may not be parallelized.
Monitoring the distcc compilation process
distcc installation has a console-based monitoring tool called
distccmon-text. Prior to starting the build process, it's worthwhile to open a separate terminal window and
issue distccmon-text 5. This terminal then continuously displays the compile status at multiple nodes in the
network every five seconds. Listing 7 shows a sample of the monitoring window.
Listing 7: Output from distccmon-text
2167 Compile memory.c tintin 2164 Compile main.cxx tintin 2192 Compile ui_tcl.cxx asterix 2187 Compile traverse.c asterix 2177 Compile reports.cxx pogo 2184 Compile messghandler.c pogo 2181 Compile trace.cpp localhost 2189 Compile remote.c localhost
Use ccache to further speed up compilation
Usually, when header files are modified in a C/C++ development framework, an average
make-based system ends up recompiling all source files. Typically, header-file
changes affect only a subset of the source files, so a time-consuming clean build isn't needed. You can also
ccache, a tool that drastically reduces the time it takes to clean-build a
system, often by a factor of 5 to 10.
ccache acts as a cache to the compiler. It works by creating a hash from
the preprocessed sources and the compiler options used to compile the sources. While recompiling, if
ccache detects no changes in the preprocessed source and compiler options,
it retrieves its cached copy of the previously compiled output. This helps speed up the compilation process.
To download the latest version (2.4) of
ccache, see the Resources
section. Once in the
ccache directory, issue the command
./configure –prefix=/usr/bin followed by
make && make
ccache isn't installed in /usr/bin, verify that the
ccache location is defined as part of the
Ccache environment variables
The following are some of the environment variables you can use to customize the
CCACHE_DIR-- Specifies the folder where
ccachestores the precompiled outputs. If you don't define this variable, then by default the cached output is stored in $HOME/.ccache.
CCACHE_TEMPDIR-- Specifies the folder where
ccacheputs temporary files that it generates. If you don't define this variable, then by default $HOME/.ccache is used. It's a good idea to define both this variable and
CCACHE_DIR-- most organizations have a user quota for specific file-system areas, and if
$HOMEbelongs to such an area the quota will quickly be exhausted. Explicitly setting the cache area avoids this problem.
CCACHE_DISABLE-- If set, tells
ccacheto invoke the compiler proper, bypassing the cache. Used for diagnostic purposes.
CCACHE_RECACHE-- If set, tells
ccacheto ignore the existing entries in the cache and calls the compiler; but for new entries, it caches the result. Used for diagnostic purposes.
CCACHE_LOGFILE-- If set, tells
ccacheto record the hit and miss statistics from the cache in this file. Very useful for diagnostics.
CCACHE_PREFIX-- Adds a prefix to the command line that
ccacheuses to invoke the compiler proper. This is used in particular to interface
distcc, as described in detail in the next section.
You can use
ccache with or without
distcc. It doesn't depend on the
-j makefile option.
The simplest usage of
ccache is as follows:
g++ -o <executable name> <source file(s)>. When used with a makefile, overriding the
CC variable suffices; see Listing 8.
Listing 8. Sample makefile using the CC variable
CC := g++ app1: placer1.o route1.o floorplan1.o $(CC) –o $* $^ placer1.o: placer1.cxx $(CC) –o $* -c $< …
With the makefile in Listing 8, the syntax to issue
ccache, you need to set
CCACHE_PREFIX environment variable to
distcc, as follows:
CCACHE_PREFIX=distcc. (This syntax is valid for the bash shell. If you use another shell,
modify the syntax accordingly.)
Here's a ample
make invocation with
export CCACHE_PREFIX=distcc; make "CC=ccache g++" –j4 –f makefile.x86
The actual invocation in the shell prompt during the build process looks like this:
ccache distcc –o placer1.o –c placer1.cxx, and so on. Note that
ccache only needs to be installed on the local machine.
ccache makes the first check to decide whether the copy in the local cache suffices;
otherwise, it hands the baton to
distcc for distributed compilation.
This article has delved into the GNU
ccache tools, which can help you
parallelize the build process. These tools come with several other features that can further customize this
effort -- for example,
ccache has a
option that restricts the size of the cache; and
distcc installation has a GUI-
based monitor called
distcc-gnome that tracks network build activity (it's
distcc is built using the
option). The links in the Resources section provide further information.
- Read comprehensive Spirit framework documentation.
- Read the GNU
- Read the
- Visit the definitive
- Read about how you can use distcc to reduce compilation time.
- The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
- Participate in the AIX and UNIX forums:
Dig deeper into AIX and Unix on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Keep up with the best and latest technical info to help you tackle your development challenges.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.