gprof Command

Purpose

Displays call graph profile data.

Syntax

/usr/ccs/bin/gprof [ -b ] [ -c [ filename ] ] [ -e Name ] [ -E Name ] [ -f Name ] [-g filename ] [-i filename] [-p filename ] [ -F Name ] [ -L PathName ] [ -s ] [ -x [ filename ] ] [ -z ] [ a.out [ gmon.out ... ] ]

Description

The gprof command produces an execution profile of C, FORTRAN, or COBOL programs. The effect of called routines is incorporated into the profile of each caller. The gprof command is useful in identifying how a program consumes processor resource. To find out which functions (routines) in the program are using the processor, you can profile the program with the gprof command.

The profile data is taken from the call graph profile file (gmon.out by default) created by programs that are compiled with the cc command by using the -pg option. The -pg option also links in versions of library routines that are compiled for profiling, and reads the symbol table in the named object file (a.out by default), correlating it with the call graph profile file. If more than one profile file is specified, the gprof command output shows the sum of the profile information in the specified profile files.

The -pg option causes the compiler to insert a call to the mcount subroutine into the object code that is generated for each recompiled function of your program. During program execution, each time a parent calls a child function the child calls the mcount subroutine to increment a distinct counter for that parent-child pair. Programs that are not recompiled with the -pg option do not have the mcount subroutine, and therefore keep no record of who called them.
Note: Symbols from C++ object file names get changed before they are used.
The GPROF environment variable can be used to set different options for profiling. The syntax of this environment variable is defined as follows:
GPROF = profile:<profile-type>,scale:<scaling-factor>,file:<file-type>,filename:<filename>
where:
  • <profile-type> describes what type of profiling is to be performed; it can be either process or thread. Type 'process' indicates that profiling granularity is at process level, 'thread' indicates that profiling granularity is at thread level.
  • <scaling-factor> describes how much memory is required to be allocated for call graph profile, by default the scaling factor is 2 for process level profiling and 8 for thread level profiling. A scaling factor of 2 indicates that a memory of half of the process size is allocated for every process or thread, scaling factor of 8 indicates that a memory of one eighth of the process size is allocated for every process of thread. This memory is the buffer area to store the call graph information.
  • <file-type> describes what type of gmon.out file is required, a value of multi indicates that one gmon.out file per process is required, a value of multithread indicates that one gmon.out file per thread is required. If an application is profiled with the -pg option, and it forks, then specifying the file type as multi generates a gmon.out file for the parent process and another for the child process. The naming convention for the generated gmon.out files is as follows:
    • For multi file type: <prefix>-processname-pid.out
    • For multithread file type: <prefix>-processname-pid-Pthread<threadid>.out
    The <prefix> is by default gmon. You can define your own prefix by using the filename parameter of the GPROF environment variable.
  • <filename> describes the prefix that requires to be used for the generated gmon.out files. By default, the prefix is gmon.
Note: Specifying profile:thread generates a format gmon.out file that can be read only by AIX® 5.3 gprof command. If you want an old format gmon.out file and still want to specify profile:thread, then you must specify file:multithread. It generates an old format gmon.out file per thread. Hence, if your application has 2 threads, then 2 gmon.out files are generated, one per thread, by using the naming convention. You cannot enable thread level profiling by compiling an application with the -pg flag in AIX 5.2 or earlier and running it in AIX 5.3. To enable thread level profiling, you must compile that application with the -pg flag in AIX 5.3 and later.

The gprof command produces three items:

  1. First, a flat profile is produced similar to the profile that is provided by the prof command. This listing gives total execution times and call counts for each of the functions in the program, which is sorted by decreasing time. The times are then propagated along the edges of the call graph. Cycles are discovered, and calls into a cycle are made to share the time of the cycle.
  2. A second listing shows the functions that are sorted according to the time they represent, including the time of their call-graph descendants. Below each function entry are its (direct) call-graph children, with an indication of how their times are propagated to this function. A similar display above the function shows how the time of the function and the time of its descendants are propagated to its (direct) call-graph parents.
  3. Cycles are also shown, with an entry for the cycle as a whole and a listing of the members of the cycle and their contributions to the time and call counts of the cycle.
Note: If the input to the gprof command contains thread level profiling data (format gmon.out file), then the gprof command produces the specified three items for every thread, starting with a cumulative report, followed by per thread reported (sorted in the ascending order of thread IDs).

The grpof command can also be used to analyze the execution profile of a program on a remote machine. It can be done by running the gprof command with the -c option on the call graph profile file (gmon.out by default) to generate a file (gprof.remote by default), which can then be processed on a remote machine. If a call graph profile file other than gmon.out is to be used, the call graph profile file name must be specified after -c Filename and the executable name. Filename must be specified if the GPROF environment variable's file attribute is set to multi; multiple gmon.out files are created, with one gmon.out file for each PID when the running program forks. The -x option can be used on the remote machine to process the gprof.remote (by default) file to generate profile reports.

Profiling with the fork and exec subroutines

Profiling by using the gprof command is problematic if your program runs the fork or exec subroutine on multiple, concurrent processes. Profiling is an attribute of the environment of each process, so if you are profiling a process that forks a new process, the child is also profiled. However, both processes write a gmon.out file in the directory from which you run the parent process, overwriting one of them. The tprof command is recommended for multiple-process profiling. You can use file:multi to avoid deleting the gmon.out file of the parent process, file:multi by using the AIX naming convention to generate the gmon.out files, hence the child processes gmon.out file does not have the same name as the parent, which avoids overwrites.

Profiling without source code

If you do not have source for your program, you can profile by using the gprof command without recompiling. You must, however, be able to relink your program modules with the appropriate compiler command (for example, cc for C). If you do not recompile, you do not get call frequency counts, although the flat profile is still useful without them. As an added benefit, your program runs almost as fast as it usually does. The following explains how to profile:

cc -c dhry.c         # Create dhry.o without call counting code.
cc -pg dhry.o -L/lib -L/usr/lib -o dhryfast
                     # Re-link (and avoid -pg libraries).
dhryfast             # Create gmon.out without call counts.
gprof >dhryfast.out  # You get an error message about no call counts
                     #  -- ignore it.

A result of running without call counts is that some quickly running functions (which you know had to be called) do not appear in the listing. Although nonintuitive, this result is normal for the gprof command. The gprof command lists only functions that were either called at least once, or which registered at least one clock tick. Even though they ran, quickly running functions often receive no clock ticks. Since call-counting was suspended, these small functions are not listed at all. (You can get call counts for the runtime routines by omitting the -L options on the cc -pg command line.)

Using less real memory

Profiling with the gprof command can cause programs to page excessively since the -pg option dedicates pinned real-memory buffer space equal to one-half the size of your program text. Excessive paging does not affect the data that is generated by profiling, since profiled programs do not generate ticks when waiting on I/O but only when using the processor. If the time delay caused by excessive paging is unacceptable, it is recommended to use thetprof command.

Flags

Item Description
-b Suppresses the printing of a description of each field in the profile.
-c Filename Creates a file that contains the information that is needed for remote processing of profiling information. Do not use the -c flag in combination with other flags.
-E Name Suppresses the printing of the graph profile entry for routine Name and its descendants, similar to the -e flag, but excludes the time that is spent by routine Name and its descendants from the total and percentage time computations. (-E MonitorCount -E MonitorCleanup is the default.)
-e Name Suppresses the printing of the graph profile entry for routine Name and all its descendants (unless they have other ancestors that are not suppressed). More than one -e flag can be given. Only one routine can be specified with each -e flag.
-F Name Prints the graph profile entry of the routine Name and its descendants similar to the -f flag, but uses only the times of the printed routines in total time and percentage computations. More than one -F flag can be given. Only one routine can be specified with each -F flag. The -F flag overrides the -E flag.
-f Name Prints the graph profile entry of the specified routine Name and its descendants. More than one -f flag can be given. Only one routine can be specified with each -f flag.
-g Filename Writes call graph information to the specified output filename. It also suppresses the profile information unless the -p flag is used.
-i Ffilename Writes the routine index table to the specified output filename. If this flag is not used, the index table goes either at the end of the standard output, or at the bottom of the filename specified with the -p and -g flags.
-L PathName Uses an alternative path name for locating shared objects.
-p Filename Writes flat profile information to the specified output file name. It also suppresses the call graph information unless the -g flag is used.
-s Produces the gmon.sum profile file, which represents the sum of the profile information in all the specified profile files. This summary profile file might be given to subsequent executions of the gprof command (by using the -s flag) to accumulate profile data across several runs of an a.out file.
-x Filename Retrieves information from Filename (a file that is created with the -c option) to generate profile reports. If Filename is not specified, the gprof command searches for the default gprof.remote file.
-z Displays routines that have zero usage (as indicated by call counts and accumulated time).

Examples

  1. To obtain profiled output, enter the following command:
    gprof
  2. To get profiling output from a command run earlier and possibly moved, enter the following command:
    gprof -L/home/score/lib runfile runfile.gmon
    This example uses the runfile.gmon file for sample data and the runfile file for local symbols, and checks the /u/score/lib file for loadable objects.
  3. To profile the sample program dhry.c:
    1. Recompile the application program with the cc -pg command, as follows:
      cc -pg dhry.c -o dhry # Re-compile to produce gprof output.
    2. Run the recompiled program. A file named gmon.out is created in the current working directory (not the directory in which the program executable file is located).
      dhry    # Execute program to generate ./gmon.out file.
    3. Run the gprof command in the directory with the gmon.out file to produce the call graph and flat profile reports.
      gprof >gprof.out     # Name the report whatever you like
      vi gprof.out         # Read flat profile first.
    4. To generated thread level profiling granularity, export the GPROF environment variable as follows, and run the application, enter the following command:
      export GPROF=profile:thread
      dhry   # Execute program to generate ./gmon.out file which has thread level granularity
    5. To generate per process gmon.out file with a prefix of mygmon, enter the following command:
      export GPROF=file:multi,filename:mygom
      dhry  # Execute program to generate ./gmon-dhry-2468.out
    6. To generate per thread gmon.out file, with a scaling factor of 10, with a file name prefixed as tgmon, enter the following command:
      export GPROF=profile:thread,file:multithread,scale:10,filename:tgmon
      dhry # Execute program to generate ./tgmon-dhry-2468-Pthread215.out
    7. To see only flat profile report from the gmon-dhry-2468.out, enter the following command:
      gprof -p fprofile.out ./dhry ./gmon-dhry-2468.out 
    8. To see only call graph profile report from the gmon-dhry-2468.out, enter the following command:
      gprof -g callgraph.out ./dhry ./gmon-dhry-2468.out
  4. To use the remote processing feature of gprof command:
    1. Recompile the application program with cc -pg command:
      cc -pg thread.c -o thread  -lpthread
    2. Enable thread level profiling granularity and use a different name for gmon.out:
      export GPROF=profile:thread,filename:mygmon
    3. Run the recompiled program. A file named mygmon.out is created in the current working directory (not the directory in which the program executable file is located):
      thread    # Execute program to generate mygmon.out file.
    4. Use the -c flag to generate the my.remote file, which can then be taken to a remote machine for processing:
      gprof -c my.remote thread mygmon.out
    5. On a remote machine, use the -x flag to extract information from the my.remote file:
      gprof -x my.remote

Throughout this description of the gprof command, most of the examples use the C program dhry.c. However, the discussion and examples apply equally to FORTRAN or COBOL modules by substituting the appropriate compiler name in place of the C compiler, cc, and the word subroutine for the word function. For example, the following commands show how to profile a FORTRAN program named matrix.f:

xlf -pg matrix.f -o matrix # FORTRAN compile of matrix.f program
matrix                    # Execute with gprof profiling,
                          #   generating gmon.out file
gprof > matrix.out        # Generate profile reports in
                          #   matrix.out from gmon.out
vi matrix.out             # Read flat profile first.

Files

Item Description
a.out Name list and text space
gmon.out Dynamic call graph and profile
gmon.sum Summarized dynamic call graph and profile
gprof.remote File for remote profiling
/usr/ucb/gprof Contains the gprof command.
/usr/ccs/bin/gprof Contains the gprof command