10 steps to UNIX nirvana

Time-saving tips and tools for C++ developers

Discover several time-saving tips and freely available tools that both new and experienced C++ developers can use.

Share:

Arpan Sen, Independent author

Arpan Sen is a lead engineer working on the development of software in the electronic design automation industry. He has worked on several flavors of UNIX, including Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several years. He takes a keen interest in software performance-optimization techniques, graph theory, and parallel computing. Arpan holds a post-graduate degree in software systems. You can reach him at arpansen@gmail.com.


developerWorks Contributing author
        level

03 March 2009

Also available in Chinese

The average C++ developer has several things to do as part of his or her daily chores: developing new software, debugging other people's code, creating a test plan, developing tests per the plan, managing a regression suite, an so on. Juggling between multiple roles can eat away precious time. To help, this article provides 10 effective methods that can increase your productivity. The examples in this article use tcsh version 6 as a reference, but the ideas are portable to all variants of UNIX® shells. This article also refers to several open source tools available for the UNIX platform.

Keep the data safe

Using rm -rf * or its variants at the shell prompt is probably the most common source of several lost work hours for a UNIX developer. There are several ways to help in this cause—alias rm, alias cp, or alias mv to their interactive variants in $HOME/.aliases, then source this file during system startup. Depending on the login shell, this could mean putting source $HOME/.aliases inside .cshrc (for C/tcsh shells) or in .profile or .bash_profile (for the Bourne shell), as shown in Listing 1.

Listing 1. Setting aliases for rm, cp, and mv
alias rm 'rm –i'
alias cp 'cp –i'
alias mv 'mv –i'

Yet another option specific to users of the tcsh shell is to add the following line to your startup scripts:

set rmstar on

If you ever issue rm * with the rmstar variable set, you would be prompted to confirm your decision, as shown in Listing 2.

Listing 2. Using the rmstar shell variable in tcsh
arpan@tintin# pwd
/home/arpan/IBM/documents
arpan@tintin# set rmstar on
arpan@tintin# rm *
Do you really want to delete all files? [n/y] n

However, if you use the rm, cp, or mv commands with the -f option, Interactive mode is superseded. A more effective way is to create your own copy of these UNIX commands and for deletion, use a predefined folder such as $HOME/.recycle_bin to hold the data. Listing 3 shows a sample script—called saferm—that accepts only files and folder names.

Listing 3. Outline of a safe-rm script
#!/usr/bin/tcsh

if (! -d ~/.recycle_bin) then
  mkdir ~/.recycle_bin
endif

mv $1 ~/.recycle_bin/$1.`date +%F`

Automatic backup of data

Comprehensive policy measures are needed to restore data. Data backup might occur nightly or every couple of hours, depending on the requirement. By default, $HOME and all its subdirectories should be backed up using a cron job and kept in a previously disclosed file system area. Note that only the system administrator should have Write or Execute permission to the backed-up data. The following cron script succinctly describes this point:

0 20 * * * /home/tintin/bin/databackup.csh

The script backs up the data every day at 20:00. The data backup script is shown in Listing 4.

Listing 4. The data backup script
cd /home/tintin/database/src
tar cvzf database-src.tgz.`date +%F` database/ main/ sql-processor/
mv database-src.tgz.`date +%F` ~/.backup/
chmod 000 ~/.backup/database-src.tgz.`date +%F`

Yet another policy would be to maintain some file system areas in the network with easy-to-follow names like /backup_area1, /backup_area2, and so on. Developers who want their data backed up should create directories or files in these areas. It is also important to understand that such areas must have the sticky bit turned on, similar to /tmp.


Browse source code

Using the cscope utility, available for download at no charge (see Resources for a link) is a great way to discover and browse existing sources. Cscope requires a list of files—C or C++ headers, source files, flex and bison files, inline sources (.inl files), and so on—to create its own database. When the database has been created, it provides a neat interface to the source code listings. Listing 5 shows how to build a cscope database and invoke it.

Listing 5. Using cscope to build a source database and invoke it
arpan@tintin# find . –name “*.[chyli]*” > cscope.files
arpan@tintin# cscope –b –q –k 
arpan@tintin# cscope –d

The -b option of cscope makes it create the internal database; the -q option makes it create an index file for faster searches; the -k option means that cscope does not look into system headers while searching (otherwise, the result would be overwhelming even for the most trivial of searches).

Using the -d option invokes the cscope interface, as shown in Listing 6.

Listing 6. The cscope interface
Cscope version 15.5                                               Press the ? key for help



Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:

To exit cscope, click Ctrl-D. You use the Tab key to switch between the data cscope lists and the cscope options (for example, Find C Symbol, Find file). Listing 7 shows the screen snapshot when you search for a file whose name contains database. To look into individual files, you click 0, 1, and so on, accordingly.

Listing 7. Cscope output when searching for a file named database
Cscope version 15.5                                               Press the ? key for help

File
0 database.cpp
1.database.h
2.databasecomponents.cpp
3.databasecomponents.h



Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:

Learn to debug legacy code with doxygen

To effectively debug code that was developed by someone else, it pays to understand the overall hierarchy of the existing software—the classes and their hierarchy, the global and the static variables, and the public interface routines. The GNU utility doxygen (see Resources for a link) is probably the best-in-class tool for extracting class hierarchies from the existing sources.

To run doxygen on a project, first run doxygen -g at the shell prompt. This command generates a file called Doxyfile in the current working directory and must be manually edited. When edited, you re-run doxygen on Doxyfile. Listing 8 shows a sample run log.

Figure 8. Running doxygen
arpan@tintin# doxygen -g 
arpan@tintin# ls
Doxyfile
… [after editing Doxyfile]
arpan@tintin# doxygen Doxyfile

Doxyfile has several fields that you need to understand. Some of the more important fields are:

  • OUTPUT_DIRECTORY. The generated documentation files are kept in this directory.
  • INPUT. This is a space-separated list of all the source files and folders whose documentation must be generated.
  • RECURSIVE. Set this field to YES when the source code listing is hierarchical. So, instead of specifying all the folders in INPUT, simply specifying the top-level folder in INPUT and setting this field to YES does the required job.
  • EXTRACT_ALL. This field must be set to YES to indicate to doxygen that documentation should be extracted from those classes and functions that are undocumented.
  • EXTRACT_PRIVATE. This field must be set to YES to indicate to doxygen that private data members of classes should be included in the documentation.
  • FILE_PATTERNS. Unless the project does not adhere to the usual C or C++ source header extension styles, such as .c, .cpp, .cc, .cxx, .h, or .hpp, you don't need to add anything to this field.

Note: Doxyfile has several other fields that you must study depending on the project requirements and the detail of documentation required. Listing 9 shows a sample Doxyfile.

Listing 9. Sample Doxyfile
OUTPUT_DIRECTORY = /home/tintin/database/docs
INPUT = /home/tintin/project/database
FILE_PATTERNS = 
RECURSIVE = yes
EXTRACT_ALL = yes
EXTRACT_PRIVATE = yes
EXTRACT_STATIC = yes

Use STL and gdb

Most sophisticated pieces of software developed using C++ use the classes from the C Standard Template Library (STL). Unfortunately, debugging code inundated with classes from the STL isn't easy, and the GNU Debugger (gdb) often complains of missing information, fails to display the relevant data, or even crashes. To circumvent this problem, use an advanced feature of gdb—the capability to add user-defined commands. For example, consider the code snippet in Listing 10, which uses a vector and displays the information.

Listing 10. Using STL vector in C++ code
#include <vector>
#include <iostream>
using namespace std;

int main ()
  {
  vector<int> V;
  V.push_back(9);
  V.push_back(8);
  for (int i=0; i < V.size(); i++)
    cout << V[i] << "\n";
  return 0;
  }

Now, while you're debugging the program, if you want to figure out the length of the vector, you can run V._M_finish – V._M_start at the gdb prompt, where _M_finish and _M_start are pointers to the end and beginning of the vector, respectively. However, this requires that you understand the internals of STL, which may not always be feasible. As an alternative, I recommend gdb_stl_utils—available for download at no charge—which defines several user-defined commands in gdb such as p_stl_vector_size, which displays the size of a vector, or p_stl_vector, which displays the contents of a vector. Listing 11 describes p_stl_vector, which iterates over the data demarcated by _M_start and _M_finish pointers.

Listing 11. Using p_stl_vector to display the contents of a vector
define p_stl_vector
  set $vec = ($arg0)
  set $vec_size = $vec->_M_finish - $vec->_M_start
  if ($vec_size != 0)
    set $i = 0
    while ($i < $vec_size)
      printf "Vector Element %d:  ", $i
      p *($vec->_M_start+$i)
      set $i++
    end
  end
end

For a list of commands defined using gdb_stl_utils, run help user-defined at the gdb prompt.


Speed up compile time

Making a clean build of the sources for any reasonably complicated piece of software eats into your productive time. One of the best tools for speeding up the compilation process is ccache (see Resources for a link). Ccache acts as a compiler cache, which means that if a file isn't changed during compilation, it's retrieved from the tool's cache. This results in a tremendous benefit when a user makes changes in a header file and typically invokes make clean; make. Because ccache doesn't judge whether a file is fit for re-compilation using just the time stamp, precious build time is saved. Here's a sample use of ccache:

arpan@tintin# ccache g__ foo.cxx

Internally, ccache generates a hash that, among other things, takes into consideration pre-processed versions of the source file (obtained using g++ –E), command-line options used to invoke the compiler, and so on. The compiled object file is stored against this hash in the cache.

Ccache defines several environment variables to allow for customizations:

  • CCACHE_DIR. Here, ccache stores the cached files. By default, the files are stored in $HOME/.ccache.
  • CCACHE_TEMPDIR. Here, ccache stores temporary files. This folder should be in the same file system as $CCACHE_DIR.
  • CCACHE_READONLY. If the ever-increasing size of the cache folder is a problem, setting this environment variable is useful. If you enable this variable, ccache doesn't add any files to the cache during compilation; however, it uses the existing cache to look for object files.

Use Valgrind and Electric-Fence with gdb to stop memory errors

C++ programming has several pitfalls—most notably, memory corruption. Two open source tools for use in the UNIX environment—Valgrind and Electric-Fence—work in tandem with gdb to help close in on memory errors. Here's a brief guide on how to use these tools.

Valgrind

The easiest way to use Valgrind on a program is to run it at the shell prompt followed by the usual program options. Note that for optimal results, you should run the debug version of the program.

arpan@tintin# valgrind <valgrind options>
    <program name> <program option1> <program option2> ..

Valgrind reports several common memory errors, like incorrect freeing of memory (allocation using malloc and free using delete), using variables with uninitialized values, and deleting the same pointer twice. The sample code shown in Listing 12 has an obvious array overwrite problem.

Listing 12. Sample C++ memory corruption issue
int main ()
  {
  int* p_arr = new int[10];
  p_arr[10] = 5;
  return 0;
  }

Valgrind and gdb work in tandem. Using the -db-attach=yes option in Valgrind, it's possible to directly invoke gdb while Valgrind is running. For example, when Valgrind is invoked on the code in Listing 12 with the –db-attach option, it invokes gdb the instant it first encounters a memory issue, as shown in Listing 13.

Listing 13. Attaching gdb during Valgrind execution
==5488== Conditional jump or move depends on uninitialised value(s)
==5488==    at 0x401206C: strlen (in /lib/ld-2.3.2.so)
==5488==    by 0x4004E35: _dl_init_paths (in /lib/ld-2.3.2.so)
==5488==    by 0x400305A: dl_main (in /lib/ld-2.3.2.so)
==5488==    by 0x400F87D: _dl_sysdep_start (in /lib/ld-2.3.2.so)
==5488==    by 0x4001092: _dl_start (in /lib/ld-2.3.2.so)
==5488==    by 0x4000C56: (within /lib/ld-2.3.2.so)
==5488== 
==5488== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- n
==5488== 
==5488== Invalid write of size 4
==5488==    at 0x8048466: main (test.cc:4)
==5488==  Address 0x4245050 is 0 bytes after a block of size 40 alloc'd
==5488==    at 0x401ADEB: operator new[](unsigned) 
                    (m_replacemalloc/vg_replace_malloc.c:197)
==5488==    by 0x8048459: main (test.cc:3)
==5488== 
==5488== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ----

Electric-Fence

Electric-Fence is a set of libraries for detecting buffer overflow or underflow in a gdb-based environment. In the event of erroneous memory access, this tool, in conjunction with gdb, points to the exact instruction in the source code that caused the access. For example, for the code in Listing 12, with Electric-Fence turned on, Listing 14 shows the gdb behaviour.

Listing 14. Electric-Fence, showing the exact area in sources that caused a crash
(gdb) efence on
Enabled Electric Fence
(gdb) run 
Starting program: /home/tintin/efence/a.out 

  Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com>

  Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com>

Program received signal SIGSEGV, Segmentation fault.
0x08048466 in main () at test.cc:4
<b>4         p_arr[10] = 5;</b>

After Electric-Fence installation, add the following lines to the .gdbinit file:

define efence
        set environment EF_PROTECT_BELOW 0
        set environment LD_PRELOAD /usr/lib/libefence.so.0.0
        echo Enabled Electric Fence\n
end

Use gprof for code coverage

One of the most common programming tasks is improving code performance. To do this, it is important to figure out which sections of the code took the maximum time to execute. In technical terms, this is known as profiling. The GNU profiler tool, gprof (see Resources for a link), is both easy to use and at the same time packed with a number of useful features.

To collect profile information for a program, the first step is to specify the –pg option when invoking the compiler:

arpan@tintin# g++ database.cpp –pg

Next, run the program as you would during the normal course. At the end of a successful run (that is, a run with no crash or call to _exit system call), the profile information is written in a file named gmon.out. After the gmon.out file is generated, you run gprof on the executable, as shown below. Note that if no executable name is mentioned, a.out is assumed by default. Likewise, if no profile-data file name is mentioned, gmon.out is assumed to be present in the current working directory.

arpan@tintin# gprof <options> <executable name> 
    <profile-data-file name> > outfile

By default, gprof displays output in the standard output, so you need to redirect it to a file. Gprof provides two sets of information: the flat profile and the call graph, both of which form part of the output file. The flat profile shows the total amount of time spent in each function. Cumulative seconds indicate the total time spent in a function plus the time spent in other functions called from this function. Self seconds indicate the time accounted by this function alone.


Display source listings in gdb

It's quite common to find developers debugging code over a remote connection that is slow enough not to support a graphical interface like the Data Display Debugger (DDD) for gdb. In such situations, using the Ctrl-X-A key combination in gdb proves a lifesaver of sorts, because it displays the source listings during debug. To return to the gdb prompt, the key combination Ctrl-W-A is required. Yet another option is to invoke gdb with –tui option: This directly launches the text mode source listing. Listing 15 shows gdb being invoked in text mode.

Using gdb source listing in text mode
   3using namespace std;
   4
   5int main ()
   6         {
B+>7     vector<int> V;
   8         V.push_back(9);
   9         V.push_back(8);
   10        for (int i=0; i < V.size(); i++)
   11          cout << V[i] << "\n";
   12        return 0;
   13        }
   14
   --------------------------------------------------------------------------------------
child process 6069 In: main                                      Line: 7    PC: 0x804890e 
(gdb) b main
Breakpoint 1 at 0x804890e: file test.cc, line 7.
(gdb) r

Maintain orderly source listings using CVS

Different projects have different coding styles. For example, some advocate the use of tabs in code, while some do not. It is important, however, that all developers adhere to the same set of coding standards. Quite often, this is not the case in the real world. With a version-control system like Concurrent Versions system (CVS), this can be effectively enforced by subjecting the file about to be checked into a list of coding guidelines. To accomplish this task, CVS comes up with a set of predefined trigger scripts that come into action when certain user actions are involved. The format for the trigger scripts is simple:

<REGULAR EXPRESSION>    <PROGRAM TO RUN>

One of the predefined trigger scripts is the commitinfo file located within the $CVSROOT folder. To check whether the file about to be checked in contains tabs, here's how the commitinfo file syntax would look:

ALL     /usr/local/bin/validate-code.pl

The commitinfo file recognizes the ALL keyword (it means that every file being committed should have this check run on it; it's possible to customize the set of files on which these checks are run). The associated script is run on the file to check for source code guidelines.


Conclusion

This article discussed several freely available tools that can help increase the productivity of C++ developers—both new and experienced. For further details on the individual tools, check out the Resources section.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=373210
ArticleTitle=10 steps to UNIX nirvana
publish-date=03032009