The average C++ developer has several things to do as part of his
or her daily chores: developing new software, debugging other people's code, creating
a test plan, developing tests per the plan, managing a regression suite, an so on.
Juggling between multiple roles can eat away precious time. To help, this article
provides 10 effective methods that can increase your productivity. The examples
in this article use tcsh version 6 as a reference, but the
ideas are portable to all variants of UNIX® shells. This article also refers to
several open source tools available for the UNIX platform.
Using rm -rf * or its variants at the shell prompt is
probably the most common source of several lost work hours for a UNIX developer.
There are several ways to help in this cause—alias rm,
alias cp, or alias mv to
their interactive variants in $HOME/.aliases, then source this file during system
startup. Depending on the login shell, this could mean putting source $HOME/.aliases
inside .cshrc (for C/tcsh shells) or in .profile or .bash_profile
(for the Bourne shell), as shown in Listing 1.
Listing 1. Setting aliases for rm, cp, and mv
alias rm 'rm –i' alias cp 'cp –i' alias mv 'mv –i' |
Yet another option specific to users of the tcsh shell is to add the following line to your startup scripts:
set rmstar on |
If you ever issue rm * with the
rmstar variable set, you would be prompted to
confirm your decision, as shown in Listing 2.
Listing 2. Using the rmstar shell variable in tcsh
arpan@tintin# pwd /home/arpan/IBM/documents arpan@tintin# set rmstar on arpan@tintin# rm * Do you really want to delete all files? [n/y] n |
However, if you use the rm,
cp, or mv commands
with the -f option, Interactive mode is superseded. A
more effective way is to create your own copy of these UNIX commands and for
deletion, use a predefined folder such as $HOME/.recycle_bin to hold the data.
Listing 3 shows a sample script—called saferm—that accepts
only files and folder names.
Listing 3. Outline of a safe-rm script
#!/usr/bin/tcsh if (! -d ~/.recycle_bin) then mkdir ~/.recycle_bin endif mv $1 ~/.recycle_bin/$1.`date +%F` |
Comprehensive policy measures are needed to restore data. Data backup might
occur nightly or every couple of hours, depending on the requirement. By default,
$HOME and all its subdirectories should be backed up using a
cron job and kept in a previously disclosed file system
area. Note that only the system administrator should have Write or Execute
permission to the backed-up data. The following cron
script succinctly describes this point:
0 20 * * * /home/tintin/bin/databackup.csh |
The script backs up the data every day at 20:00. The data backup script is shown in Listing 4.
Listing 4. The data backup script
cd /home/tintin/database/src tar cvzf database-src.tgz.`date +%F` database/ main/ sql-processor/ mv database-src.tgz.`date +%F` ~/.backup/ chmod 000 ~/.backup/database-src.tgz.`date +%F` |
Yet another policy would be to maintain some file system areas in the network with easy-to-follow names like /backup_area1, /backup_area2, and so on. Developers who want their data backed up should create directories or files in these areas. It is also important to understand that such areas must have the sticky bit turned on, similar to /tmp.
Using the cscope utility, available for download at no
charge (see Resources for a link) is a great way to
discover and browse existing sources. Cscope requires a list of
files—C or C++
headers, source files, flex and bison files, inline sources (.inl files), and so on—to
create its own database. When the database has been created, it provides a neat
interface to the source code listings. Listing 5 shows how to build a cscope database
and invoke it.
Listing 5. Using cscope to build a source database and invoke it
arpan@tintin# find . –name “*.[chyli]*” > cscope.files arpan@tintin# cscope –b –q –k arpan@tintin# cscope –d |
The -b option of cscope makes it create the internal
database; the -q option makes it create an index
file for faster searches; the -k option means that
cscope does not look into system headers while searching (otherwise, the result
would be overwhelming even for the most trivial of searches).
Using the -d option invokes the cscope interface, as
shown in Listing 6.
Listing 6. The cscope interface
Cscope version 15.5 Press the ? key for help Find this C symbol: Find this global definition: Find functions called by this function: Find functions calling this function: Find this text string: Change this text string: Find this egrep pattern: Find this file: Find files #including this file: |
To exit cscope, click Ctrl-D. You use the Tab key to switch between the data cscope lists and the cscope options (for example, Find C Symbol, Find file). Listing 7 shows the screen snapshot when you search for a file whose name contains database. To look into individual files, you click 0, 1, and so on, accordingly.
Listing 7. Cscope output when searching for a file named database
Cscope version 15.5 Press the ? key for help File 0 database.cpp 1.database.h 2.databasecomponents.cpp 3.databasecomponents.h Find this C symbol: Find this global definition: Find functions called by this function: Find functions calling this function: Find this text string: Change this text string: Find this egrep pattern: Find this file: Find files #including this file: |
Learn to debug legacy code with doxygen
To effectively debug code that was developed by someone else, it pays to understand the overall hierarchy of the existing software—the classes and their hierarchy, the global and the static variables, and the public interface routines. The GNU utility doxygen (see Resources for a link) is probably the best-in-class tool for extracting class hierarchies from the existing sources.
To run doxygen on a project, first run doxygen -g at the
shell prompt. This command generates a file called Doxyfile in the current
working directory and must be manually edited. When edited, you re-run doxygen
on Doxyfile. Listing 8 shows a sample run log.
Figure 8. Running doxygen
arpan@tintin# doxygen -g arpan@tintin# ls Doxyfile … [after editing Doxyfile] arpan@tintin# doxygen Doxyfile |
Doxyfile has several fields that you need to understand. Some of the more important fields are:
- OUTPUT_DIRECTORY. The generated documentation files are kept in this directory.
- INPUT. This is a space-separated list of all the source files and folders whose documentation must be generated.
- RECURSIVE. Set this field to YES when the source code listing is hierarchical. So, instead of specifying all the folders in INPUT, simply specifying the top-level folder in INPUT and setting this field to YES does the required job.
- EXTRACT_ALL. This field must be set to YES to indicate to doxygen that documentation should be extracted from those classes and functions that are undocumented.
- EXTRACT_PRIVATE. This field must be set to YES to indicate to doxygen that private data members of classes should be included in the documentation.
- FILE_PATTERNS. Unless the project does not adhere to the usual
CorC++source header extension styles, such as .c, .cpp, .cc, .cxx, .h, or .hpp, you don't need to add anything to this field.
Note: Doxyfile has several other fields that you must study depending on the project requirements and the detail of documentation required. Listing 9 shows a sample Doxyfile.
Listing 9. Sample Doxyfile
OUTPUT_DIRECTORY = /home/tintin/database/docs INPUT = /home/tintin/project/database FILE_PATTERNS = RECURSIVE = yes EXTRACT_ALL = yes EXTRACT_PRIVATE = yes EXTRACT_STATIC = yes |
Most sophisticated pieces of software developed using C++
use the classes from the C Standard Template Library
(STL). Unfortunately, debugging code inundated with classes from the STL isn't easy,
and the GNU Debugger (gdb) often complains of missing information, fails to display
the relevant data, or even crashes. To circumvent this problem, use an advanced feature
of gdb—the capability to add user-defined commands. For example, consider
the code snippet in Listing 10, which uses a vector and displays the information.
Listing 10. Using STL vector in C++ code
#include <vector>
#include <iostream>
using namespace std;
int main ()
{
vector<int> V;
V.push_back(9);
V.push_back(8);
for (int i=0; i < V.size(); i++)
cout << V[i] << "\n";
return 0;
}
|
Now, while you're debugging the program, if you want to figure out the length of
the vector, you can run V._M_finish – V._M_start at
the gdb prompt, where _M_finish and
_M_start are pointers to the end and beginning of the
vector, respectively. However, this requires that you understand the internals of
STL, which may not always be feasible. As an alternative, I recommend
gdb_stl_utils—available for download at no
charge—which defines several user-defined commands in gdb such as
p_stl_vector_size, which displays the size of a
vector, or p_stl_vector, which displays the contents
of a vector. Listing 11 describes p_stl_vector, which
iterates over the data demarcated by _M_start and
_M_finish pointers.
Listing 11. Using p_stl_vector to display the contents of a vector
define p_stl_vector
set $vec = ($arg0)
set $vec_size = $vec->_M_finish - $vec->_M_start
if ($vec_size != 0)
set $i = 0
while ($i < $vec_size)
printf "Vector Element %d: ", $i
p *($vec->_M_start+$i)
set $i++
end
end
end
|
For a list of commands defined using gdb_stl_utils, run
help user-defined at the gdb prompt.
Making a clean build of the sources for any reasonably complicated piece of software
eats into your productive time. One of the best tools for speeding up the
compilation process is ccache (see Resources for a
link). Ccache acts as a compiler cache, which means that if a file isn't changed
during compilation, it's retrieved from the tool's cache. This results in a
tremendous benefit when a user makes changes in a header file and typically
invokes make clean; make. Because ccache doesn't
judge whether a file is fit for re-compilation using just the time stamp, precious
build time is saved. Here's a sample use of ccache:
arpan@tintin# ccache g__ foo.cxx |
Internally, ccache generates a hash that, among other things, takes into
consideration pre-processed versions of the source file (obtained using
g++ –E), command-line options used to invoke
the compiler, and so on. The compiled object file is stored against this hash in
the cache.
Ccache defines several environment variables to allow for customizations:
- CCACHE_DIR. Here, ccache stores the cached files. By default, the files are stored in $HOME/.ccache.
- CCACHE_TEMPDIR. Here, ccache stores temporary files. This folder should be in the same file system as $CCACHE_DIR.
- CCACHE_READONLY. If the ever-increasing size of the cache folder is a problem, setting this environment variable is useful. If you enable this variable, ccache doesn't add any files to the cache during compilation; however, it uses the existing cache to look for object files.
Use Valgrind and Electric-Fence with gdb to stop memory errors
C++ programming has several pitfalls—most notably,
memory corruption. Two open source tools for use in the UNIX
environment—Valgrind and Electric-Fence—work in tandem with
gdb to help close in on memory errors. Here's a brief guide on how to use these
tools.
The easiest way to use Valgrind on a program is to run it at the shell prompt followed by the usual program options. Note that for optimal results, you should run the debug version of the program.
arpan@tintin# valgrind <valgrind options>
<program name> <program option1> <program option2> ..
|
Valgrind reports several common memory errors, like incorrect freeing of memory
(allocation using malloc and free using
delete), using variables with uninitialized values,
and deleting the same pointer twice. The sample code shown in Listing 12
has an obvious array overwrite problem.
Listing 12. Sample C++ memory corruption issue
int main ()
{
int* p_arr = new int[10];
p_arr[10] = 5;
return 0;
}
|
Valgrind and gdb work in tandem. Using the -db-attach=yes
option in Valgrind, it's possible to directly invoke gdb while Valgrind is running.
For example, when Valgrind is invoked on the code in Listing 12 with the
–db-attach option, it invokes gdb the instant it
first encounters a memory issue, as shown in Listing 13.
Listing 13. Attaching gdb during Valgrind execution
==5488== Conditional jump or move depends on uninitialised value(s)
==5488== at 0x401206C: strlen (in /lib/ld-2.3.2.so)
==5488== by 0x4004E35: _dl_init_paths (in /lib/ld-2.3.2.so)
==5488== by 0x400305A: dl_main (in /lib/ld-2.3.2.so)
==5488== by 0x400F87D: _dl_sysdep_start (in /lib/ld-2.3.2.so)
==5488== by 0x4001092: _dl_start (in /lib/ld-2.3.2.so)
==5488== by 0x4000C56: (within /lib/ld-2.3.2.so)
==5488==
==5488== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- n
==5488==
==5488== Invalid write of size 4
==5488== at 0x8048466: main (test.cc:4)
==5488== Address 0x4245050 is 0 bytes after a block of size 40 alloc'd
==5488== at 0x401ADEB: operator new[](unsigned)
(m_replacemalloc/vg_replace_malloc.c:197)
==5488== by 0x8048459: main (test.cc:3)
==5488==
==5488== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ----
|
Electric-Fence is a set of libraries for detecting buffer overflow or underflow in a gdb-based environment. In the event of erroneous memory access, this tool, in conjunction with gdb, points to the exact instruction in the source code that caused the access. For example, for the code in Listing 12, with Electric-Fence turned on, Listing 14 shows the gdb behaviour.
Listing 14. Electric-Fence, showing the exact area in sources that caused a crash
(gdb) efence on Enabled Electric Fence (gdb) run Starting program: /home/tintin/efence/a.out Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com> Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com> Program received signal SIGSEGV, Segmentation fault. 0x08048466 in main () at test.cc:4 <b>4 p_arr[10] = 5;</b> |
After Electric-Fence installation, add the following lines to the .gdbinit file:
define efence
set environment EF_PROTECT_BELOW 0
set environment LD_PRELOAD /usr/lib/libefence.so.0.0
echo Enabled Electric Fence\n
end
|
One of the most common programming tasks is improving code performance. To do this, it is important to figure out which sections of the code took the maximum time to execute. In technical terms, this is known as profiling. The GNU profiler tool, gprof (see Resources for a link), is both easy to use and at the same time packed with a number of useful features.
To collect profile information for a program, the first step is to specify the
–pg option when invoking the compiler:
arpan@tintin# g++ database.cpp –pg |
Next, run the program as you would during the normal course. At the end of a
successful run (that is, a run with no crash or call to
_exit system call), the profile information is written
in a file named gmon.out. After the gmon.out file is generated, you run
gprof on the executable, as shown below. Note that if no executable name is
mentioned, a.out is assumed by default. Likewise, if no profile-data file name is
mentioned, gmon.out is assumed to be present in the current working directory.
arpan@tintin# gprof <options> <executable name>
<profile-data-file name> > outfile
|
By default, gprof displays output in the standard output, so you need to redirect it to a file. Gprof provides two sets of information: the flat profile and the call graph, both of which form part of the output file. The flat profile shows the total amount of time spent in each function. Cumulative seconds indicate the total time spent in a function plus the time spent in other functions called from this function. Self seconds indicate the time accounted by this function alone.
Display source listings in gdb
It's quite common to find developers debugging code over a remote connection that
is slow enough not to support a graphical interface like the Data Display Debugger
(DDD) for gdb. In such situations, using the Ctrl-X-A key combination in gdb
proves a lifesaver of sorts, because it displays the source listings during debug.
To return to the gdb prompt, the key combination Ctrl-W-A is required. Yet
another option is to invoke gdb with –tui option:
This directly launches the text mode source listing. Listing 15 shows gdb being
invoked in text mode.
Using gdb source listing in text mode
3using namespace std;
4
5int main ()
6 {
B+>7 vector<int> V;
8 V.push_back(9);
9 V.push_back(8);
10 for (int i=0; i < V.size(); i++)
11 cout << V[i] << "\n";
12 return 0;
13 }
14
--------------------------------------------------------------------------------------
child process 6069 In: main Line: 7 PC: 0x804890e
(gdb) b main
Breakpoint 1 at 0x804890e: file test.cc, line 7.
(gdb) r
|
Maintain orderly source listings using CVS
Different projects have different coding styles. For example, some advocate the use of tabs in code, while some do not. It is important, however, that all developers adhere to the same set of coding standards. Quite often, this is not the case in the real world. With a version-control system like Concurrent Versions system (CVS), this can be effectively enforced by subjecting the file about to be checked into a list of coding guidelines. To accomplish this task, CVS comes up with a set of predefined trigger scripts that come into action when certain user actions are involved. The format for the trigger scripts is simple:
<REGULAR EXPRESSION> <PROGRAM TO RUN> |
One of the predefined trigger scripts is the commitinfo file located within the $CVSROOT folder. To check whether the file about to be checked in contains tabs, here's how the commitinfo file syntax would look:
ALL /usr/local/bin/validate-code.pl |
The commitinfo file recognizes the ALL keyword (it
means that every file being committed should have this check run on it; it's
possible to customize the set of files on which these checks are run). The
associated script is run on the file to check for source code guidelines.
This article discussed several freely available tools that can help increase the
productivity of C++ developers—both
new and experienced. For further details on the individual tools, check out
the Resources section.
Learn
-
Learn more about integrating
STL with gdb.
-
Learn more about the cscope project.
-
Learn about CVS usage and internals.
-
The AIX and UNIX developerWorks
zone provides a wealth of information relating to all aspects of AIX systems
administration and expanding your UNIX skills.
-
New to AIX and UNIX?
Visit the New to AIX and UNIX page to learn more.
-
Browse the technology
bookstore for books on this and other technical topics.
Get products and technologies
-
Visit the doxygen site.
-
Find information on how to run Valgrind along with tool internals at the
Valgrind site.
-
Download Electric-Fence.
-
Check out the guide to GNU
gprof.
-
Download ccache.
Discuss
-
Check out developerWorks blogs
and get involved in the developerWorks
community.
-
Participate in the AIX and UNIX forums:
- AIX 5L—technical forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools—technical
- Virtualization—technical
- More AIX and UNIX forums
Arpan Sen is a lead engineer working on the development of software in the electronic design automation industry. He has worked on several flavors of UNIX, including Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several years. He takes a keen interest in software performance-optimization techniques, graph theory, and parallel computing. Arpan holds a post-graduate degree in software systems. You can reach him at arpan@syncad.com.





