Reducing the build time for C/C++-based systems is one of the major technical challenges any
release or build engineer faces. This article looks into some of the open source tool options available that help
speed up the build process by parallelizing the activity: distributing the build process across
multiple machines in a local area network. The discussion in this article primarily focuses on GNU
make, due to its wide availability.
By default, make is a sequential utility. It serially invokes the
underlying compiler to compile C/C++ sources. Typically, C/C++ source files (usually with a .cpp/.cxx
extension) can be built without depending on each other. You do so by invoking make with the –j option. Listing 1 shows a typical
usage.
Listing 1. Typical GNU make invocation
make –j10 –f makefile.x86_linux |
The argument to –j -- 10 -- is the maximum number of simultaneous
compilations that can ensue once the build process starts. If no argument is provided to -j, then all source files are queued up in the system for simultaneous compilation.
Using the -j option makes particular sense when you're running the build on
a multicore system. To make the -j option work for you, you must address
several key issues; these are discussed in the next section.
Issues and potential solutions when using the –j option
First, you should check your system configuration. On a low-memory (<512MB RAM) system, too
many simultaneous compilations can slow the system due to paging. The compile time increases in
such cases. You need to experiment to figure out the optimal value of -j
for your system. Another option is to use the –l or –load-average option of the GNU make tool,
along with -j, which keeps firing jobs only if the system load is less
than a certain level.
You can also use the same temporary file for independent compilations. Consider the make snippet shown in Listing 2.
Listing 2. Makefile with the same temporary file y.tab.c
my_parser : main.o parser1.o parser2.o
g++ -o $* $>
parser1.o : parser1.y
yacc parser1.y
g++ -o $* -c y.tab.c
parser2.o : parser2.y
yacc parser2.y
g++ -o $* -c y.tab.c
|
Assume that the grammar files parser1.y and parser2.y are located in the same directory.
During sequential compilation, the file y.tab.c is generated by yacc (where y.tab.c is the default
filename) for parserl and then parser2; but in parallel mode, this results in a conflict. You can
solve this situation a couple of ways: keep the two yacc files in separate folders; or use the –b option to generate two different C outputs, as shown in Listing 3.
Listing 3. Use the –b option of yacc to generate unique filenames
parser1.o : parser1.y
yacc parser1.y –b parser1
g++ -o $* -c parser1.tab.c
|
You must take a close look into the makefile to figure out such situations, where parallelizing an otherwise fine script in serial mode will mess things up if it's run in parallel.
Some makefile rules have implicit dependencies. Consider the situation shown in Listing 4, where a Perl script generates a header that is included by other sources.
Listing 4. Makefile with implicit dependencies
my_exe: info.h test1.o test2.o
g++ -o $@ $^
test1.o: test1.cxx
g++ -c $<
test2.o: test2.cxx
g++ -c $<
info.h:
make_header #shell script that generates the header file
|
The info.h header is included by test1.cxx and test2.cxx. In serial build mode, make works from left to right, and the file info.h is generated first.
However, in parallel build mode, make is free to process all
dependencies in parallel -- this can potentially result in some compilations failing intermittently
because info.h may not be generated before the compilation of test1.cxx and/or test2.cxx starts. To
fix this problem, it makes sense to remove info.h from the dependency list of my_exe and put it in
the dependency list of test1.o and test2.o. It's also advisable to use another wrapper to ensure that
info.h is generated only once. Listing 5 shows the modified version of the make_header script, and
Listing 6 shows the makefile.
Listing 5. Modified version of make_header script to prevent multiple writes
#!/usr/bin/bash if [ -f info.h ] then exit fi echo "#ifndef __INFO_H" > info.h echo "#define __INFO_H" > > info.h echo "#include <iostream>>" > > info.h echo "using namespace std;" > > info.h echo "int f1(int);" > > info.h echo "int f2(int);" > > info.h echo "#endif" > > info.h |
Listing 6. Modified version of the makefile from Listing 4
my_exe: info.h test1.o test2.o
g++ -o $@ $^
test1.o: test1.cxx info.h
g++ -c $<
test2.o: test2.cxx info.h
g++ -c $<
info.h:
make_header #shell script that generates the header file
|
In general, make -j can extract sufficient
parallelism if you create the makefile properly. Try to avoid unnecessary dependencies in the makefile
wherever possible.
Note that GNU make can only extract parallelism for a single machine.
The next section introduces distcc, a tool that lets you share the build process on multiple machines.
The distcc tool can distribute the builds of C/C++ code across multiple
machines. Each of these machines must have distcc installed. Here's a quick
installation and configuration reference:
- Download
distcc(see the Resources section). - Build the
distccsources on all machines by executing./configure; make && make install. - The build process starts from one machine and is then distributed on all the other machines
(servers). On all the servers, start the distccd daemon (you must have root privileges to do this).
distccd resides in /etc/init.d folder. The syntax to start it in root mode is
tcsh-arpan# /etc/init.d/distccd start
And the syntax to start it in user mode istcsh-arpan$ sudo /etc/init.d/distccd
You can also rundistccdaemon processes in user mode by runningdistccd –daemon –j N, whereNis the number of jobs you want to run on a given machine. - The local machine needs to know which servers the build processes should be distributed to.
Depending on your shell, issue a modified version of this command:
export DISTCC_HOSTS='localhost tintin asterix pogo'
tintin, asterix, and pogo are other hosts in the network that can host build processes;localhostrefers to the local machine. - Instead of using the export directive, you can also create a file named hosts and put the names of
the servers in that file, separated by spaces. Copy this file to the $HOME/.
distccfolder.
Now that distcc has been installed, the only thing remaining to be done
is to fire the build. Here's the invocation:
make –j4 CC=distcc –f makefile.x86_linux |
Key things to keep in mind while working with distcc
For distcc to work to your advantage, you must keep several things in
mind:
- The machines must have identical configurations. This means the same version of the g++ compiler must be installed on all the machines, along with related build tools like ar, ranlib, libtool, and so on. The type and version of the operating system should also be the same.
- From the client machine,
distccsends the preprocessed code to the server machines. You need to verify whether thedistccddaemon process is running on the server machine. - By default, the number of jobs that
distccschedules on a single machine is (no. of CPUs) + 2. For a single core machine, this number is 3. Keep this in mind while you're firing the processes: a command line likemake –j10 CC=distcc, where there are only three hosts, means nine compile jobs are fired initially. - Verify that the underlying machines can access the requisite file systems on which source files are stored. On Network File System (NFS) based systems, some source areas may not be mounted, which will result in compilation fails. You must also carefully monitor network congestion.
distccis used to compile the sources over the network. The linking step(s) may not be parallelized.
Monitoring the distcc compilation process
distcc installation has a console-based monitoring tool called
distccmon-text. Prior to starting the build process, it's worthwhile to open a separate terminal window and
issue distccmon-text 5. This terminal then continuously displays the compile status at multiple nodes in the
network every five seconds. Listing 7 shows a sample of the monitoring window.
Listing 7: Output from distccmon-text
2167 Compile memory.c tintin[0] 2164 Compile main.cxx tintin[1] 2192 Compile ui_tcl.cxx asterix[0] 2187 Compile traverse.c asterix[1] 2177 Compile reports.cxx pogo[0] 2184 Compile messghandler.c pogo[1] 2181 Compile trace.cpp localhost[0] 2189 Compile remote.c localhost[1] |
Use ccache to further speed up compilation
Usually, when header files are modified in a C/C++ development framework, an average make-based system ends up recompiling all source files. Typically, header-file
changes affect only a subset of the source files, so a time-consuming clean build isn't needed. You can also
use ccache, a tool that drastically reduces the time it takes to clean-build a
system, often by a factor of 5 to 10.
ccache acts as a cache to the compiler. It works by creating a hash from
the preprocessed sources and the compiler options used to compile the sources. While recompiling, if
ccache detects no changes in the preprocessed source and compiler options,
it retrieves its cached copy of the previously compiled output. This helps speed up the compilation process.
To download the latest version (2.4) of ccache, see the Resources
section. Once in the ccache directory, issue the command ./configure –prefix=/usr/bin followed by make && make
install. If ccache isn't installed in /usr/bin, verify that the ccache location is defined as part of the PATH
environment variable.
The following are some of the environment variables you can use to customize the ccache setup:
CCACHE_DIR-- Specifies the folder whereccachestores the precompiled outputs. If you don't define this variable, then by default the cached output is stored in $HOME/.ccache.CCACHE_TEMPDIR-- Specifies the folder whereccacheputs temporary files that it generates. If you don't define this variable, then by default $HOME/.ccache is used. It's a good idea to define both this variable andCCACHE_DIR-- most organizations have a user quota for specific file-system areas, and if$HOMEbelongs to such an area the quota will quickly be exhausted. Explicitly setting the cache area avoids this problem.CCACHE_DISABLE-- If set, tellsccacheto invoke the compiler proper, bypassing the cache. Used for diagnostic purposes.CCACHE_RECACHE-- If set, tellsccacheto ignore the existing entries in the cache and calls the compiler; but for new entries, it caches the result. Used for diagnostic purposes.CCACHE_LOGFILE-- If set, tellsccacheto record the hit and miss statistics from the cache in this file. Very useful for diagnostics.CCACHE_PREFIX-- Adds a prefix to the command line thatccacheuses to invoke the compiler proper. This is used in particular to interfaceccachewithdistcc, as described in detail in the next section.
You can use ccache with or without distcc. It doesn't depend on the -j makefile option.
The simplest usage of ccache is as follows: ccache
g++ -o <executable name> <source file(s)>. When used with a makefile, overriding the CC variable suffices; see Listing 8.
Listing 8. Sample makefile using the CC variable
CC := g++
app1: placer1.o route1.o floorplan1.o
$(CC) –o $* $^
placer1.o: placer1.cxx
$(CC) –o $* -c $<
…
|
With the makefile in Listing 8, the syntax to issue make is make
"CC=ccache g++".
To use distcc with ccache, you need to set
the CCACHE_PREFIX environment variable to distcc, as follows: export
CCACHE_PREFIX=distcc. (This syntax is valid for the bash shell. If you use another shell,
modify the syntax accordingly.)
Here's a ample make invocation with ccache and distcc:
export CCACHE_PREFIX=distcc; make "CC=ccache g++" –j4 –f makefile.x86 |
The actual invocation in the shell prompt during the build process looks like this:
ccache distcc –o placer1.o –c placer1.cxx, and so on. Note that ccache only needs to be installed on the local machine. ccache makes the first check to decide whether the copy in the local cache suffices;
otherwise, it hands the baton to distcc for distributed compilation.
This article has delved into the GNU make, distcc, and ccache tools, which can help you
parallelize the build process. These tools come with several other features that can further customize this
effort -- for example, ccache has a –M
option that restricts the size of the cache; and distcc installation has a GUI-
based monitor called distcc-gnome that tracks network build activity (it's
created if distcc is built using the –use-gtk
option). The links in the Resources section provide further information.
Learn
- Read comprehensive Spirit framework documentation.
- Read the GNU
makemanual. - Read the
distccdocumentation. - Visit the definitive
ccacheresource. - Read about how you can use distcc to reduce
compilation time.
-
The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration.
-
developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
-
Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
Discuss
-
Participate in the AIX and UNIX forums:
- AIX Forum
- AIX Forum for developers
- Cluster Systems Management
- IBM Support Assistant Forum
- Performance Tools Forum
- Virtualization Forum
- More AIX and UNIX Forums
- AIX Networking
Arpan Sen is a lead engineer working on the development of software in the electronic design automation industry. He has worked on several flavors of UNIX, including Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several years. He takes a keen interest in software performance-optimization techniques, graph theory, and parallel computing. Arpan holds a post-graduate degree in software systems. You can reach him at arpan@syncad.com.
Comments (Undergoing maintenance)





