Maynard Johnson and the oprofile community have posted a new oprofile release oprofile 0.9.9
A new 'ocount' program is introduced for collecting raw event counts on a per-application, per-process, per-cpu, or system-wide basis. Unlike the profiling tools, there is no post-processing required for the collected data, which is displayed directly in the output of ocount. A common use case for event counting tools is for computing the CPI (cycles per instruction) for an application. High CPI implies possible stalls, and many architectures provide events that give detailed information about the different types of stalls. This new feature requires a kernel version of 2.6.31 or greater.
See the release notes for details.
To build this, first download the latest tar ball : oprofile-0.9.9.tar.gz
We also assume you already have the latest Advance Toolchain 6.0 release installed on your system # tar -zxf oprofile-0.9.9.tar.gz # cd oprofile-0.9.9 # export PATH=/usr/local/bin:/opt/at6.0/bin:$PATH
# ./configure <messages clipped> # make you may get this error: /usr/lib64/qt-3.3/include/qvaluelist.h:91:13: error: ‘ptrdiff_t’ does not name a type /usr/lib64/qt-3.3/include/qvaluelist.h:167:13: error: ‘ptrdiff_t’ does not name a type # ./configure --enable-gui=no <messages clipped> # make # make install # opcontrol --version opcontrol: oprofile 0.9.9 compiled on Jul 30 2013 13:34:58 # ocount --version ocount: oprofile 0.9.9 compiled on Jul 30 2013 13:35:22 # ocount You must either pass in the name of a command or app to run or specify a run mode usage: ocount [ options ] [ --system-wide | -p <pids> | -r <tids> | -C <cpus> [ command [ args ] ] ] See ocount man page for details. Give ocount a try (this is on an idle system) # ocount --system-wide --time-interval 5:1 --events=PM_RUN_CYC,PM_INST_CMPL ocount: Press Ctl-c or 'kill -SIGINT 13108' to stop counting Current time (seconds since epoch): 1375209465 Event counts (actual) for the whole system: Event Count % time counted PM_RUN_CYC_GRP1 54,311,267 100.00 PM_INST_CMPL_GRP1 50,957,416 100.00
Try re-building oprofile with the ocount tool.
# make clean # ocount --events=PM_RUN_CYC,PM_INST_CMPL time make 70.93user 2.97system 1:14.43elapsed 99%CPU (0avgtext+0avgdata 10326016maxresident)k 0inputs+524928outputs (0major+179098minor)pagefaults 0swaps Events were actively counted for 1 minute and 14 seconds. Event counts (actual) for /usr/bin/time: Event Count % time counted PM_RUN_CYC_GRP1 263,256,705,078 100.00 PM_INST_CMPL_GRP1 272,175,044,723 100.00 Therefore, the average CPI (cycles per instruction) for the single-threaded command is .97
Re-building with make -j16 (there are four processor cores in the partition, SMT=4).
# make clean # ocount --events=PM_RUN_CYC,PM_INST_CMPL time make -j16 106.78user 4.03system 0:26.06elapsed 425%CPU (0avgtext+0avgdata 10330112maxresident)k 0inputs+524928outputs (0major+179166minor)pagefaults 0swaps Events were actively counted for 26.1 seconds. Event counts (actual) for /usr/bin/time: Event Count % time counted PM_RUN_CYC_GRP1 394,313,103,033 100.00 PM_INST_CMPL_GRP1 272,209,282,531 100.00
In this case, the average CPI for the multi-process build is 1.44.
More examples will be posted in the coming weeks. Checking cache and memory characteristics for the application is another common usage.