Flat profile

The flat profile sample is the second part of the cwhet.gprof file.

The following is an example of the cwhet.gprof file:
granularity: each sample hit covers 4 byte(s) Total time: 62.85 seconds

  %   cumulative   self              self     total
 time   seconds  seconds     calls  ms/call  ms/call  name
 30.9      19.44    19.44        1 19440.00 40620.00  .main [1]
 30.5      38.61    19.17                             .__mcount [3]
 14.1      47.50     8.89  8990000     0.00     0.00  .mod8 [4]
  9.0      53.14     5.64  6160000     0.00     0.00  .mod9 [5]
  2.5      54.72     1.58   930000     0.00     0.00  .exp [6]
  2.4      56.25     1.53  1920000     0.00     0.00  .cos [7]
  2.2      57.62     1.37   930000     0.00     0.00  .log [8]
  2.0      58.88     1.26                             .qincrement [9]
  1.6      59.90     1.02   140000     0.01     0.01  .mod3 [10]
  1.2      60.68     0.78                             .__stack_pointer [11]
  1.0      61.31     0.63   640000     0.00     0.00  .atan [12]
  0.9      61.89     0.58                             .qincrement1 [13]
  0.8      62.41     0.52   640000     0.00     0.00  .sin [14]
  0.7      62.85     0.44                             .sqrt [15]
  0.0      62.85     0.00      180     0.00     0.00  .fwrite [16]
  0.0      62.85     0.00      180     0.00     0.00  .memchr [17]
  0.0      62.85     0.00       90     0.00     0.00  .__flsbuf [18]
  0.0      62.85     0.00       90     0.00     0.00  ._flsbuf [19]

The flat profile is much less complex than the call-graph profile and very similar to the output of the prof command. The primary columns of interest are the self seconds and the calls columns. These reflect the CPU seconds spent in each function and the number of times each function was called. The next columns to look at are self ms/call (CPU time used by the body of the function itself) and total ms/call (time in the body of the function plus any descendent functions called).

Normally, the top functions on the list are candidates for optimization, but you should also consider how many calls are made to the function. Sometimes it can be easier to make slight improvements to a frequently called function than to make extensive changes to a piece of code that is called once.

A cross reference index is the last item produced and looks similar to the following:
Index by function name

  [18] .__flsbuf            [37] .exit                 [5] .mod9
  [34] .__ioctl              [6] .exp                 [43] .moncontrol
  [20] .__mcount            [39] .expand_catname      [44] .monitor
   [3] .__mcount            [32] .free                [22] .myecvt
  [23] .__nl_langinfo_std   [33] .free_y              [28] .nl_langinfo
  [11] .__stack_pointer     [16] .fwrite              [27] .pout
  [24] ._doprnt             [40] .getenv              [29] .printf
  [35] ._findbuf            [41] .ioctl                [9] .qincrement
  [19] ._flsbuf             [42] .isatty              [13] .qincrement1
  [36] ._wrtchk              [8] .log                 [45] .saved_category_nam
  [25] ._xflsbuf             [1] .main                [46] .setlocale
  [26] ._xwrite             [17] .memchr              [14] .sin
  [12] .atan                [21] .mf2x2               [31] .splay
  [38] .catopen             [10] .mod3                [15] .sqrt
   [7] .cos                  [4] .mod8                [30] .write
Note: If the program you want to monitor uses a fork() system call, be aware that by default, the parent and the child create the same file, gmon.out. To avoid this problem, use the GPROF environment variable. You can also use the GPROF environment variable to profile multi-threaded applications.