The hpmstat command

The following is an example output from the hpmstat command.

# hpmstat -s 7
 Execution time (wall clock time): 1.003946 seconds
 Counting mode: user
  PM_TLB_MISS (TLB misses)                           :          260847
  PM_CYC (Processor cycles)                          :      3013964331
  PM_ST_REF_L1 (L1 D cache store references)         :       161377371
  PM_LD_REF_L1 (L1 D cache load references)          :       255317480
  PM_INST_CMPL (Instructions completed)              :      1027391919
  PM_RUN_CYC (Run cycles)                            :      1495147343
  Derived metric group: default
  Utilization rate                                 :         181.243 %
  Total load and store operations                  :         416.695 M
  Instructions per load/store                      :           2.466
  MIPS                                             :        1023.354
  Instructions per cycle                           :           0.341

The following is an example of the hpmstat command with counter multiplexing:

# hpmstat -s 1,2 -d
Execution time (wall clock time): 2.129755 seconds
Set: 1
Counting duration: 1.065 seconds
  PM_INST_CMPL (Instructions completed)                :          244687
  PM_FPU1_CMPL (FPU1 produced a result)                :               0
  PM_ST_CMPL (Store instruction completed)             :           31295
  PM_LD_CMPL (Loads completed)                         :           67414
  PM_FPU0_CMPL (Floating-point unit produced a result) :              19
  PM_CYC (Processor cycles)                            :          295427
  PM_FPU_FMA (FPU executed multiply-add instruction)   :               0
  PM_TLB_MISS (TLB misses)                             :             788
Set: 2
Counting duration: 1.064 seconds
  PM_TLB_MISS (TLB misses)                           :            379472
  PM_ST_MISS_L1 (L1 D cache store misses)            :             79943
  PM_LD_MISS_L1 (L1 D cache load misses)             :            307338
  PM_INST_CMPL (Instructions completed)              :         848578245
  PM_LSU_IDLE (Cycles LSU is idle)                   :         229922845
  PM_CYC (Processor cycles)                          :         757442686
  PM_ST_DISP (Store instructions dispatched)         :         125440562
  PM_LD_DISP (Load instr dispatched)                 :         258031257
Counting mode: user
  PM_TLB_MISS (TLB misses)                             :          380260
  PM_ST_MISS_L1 (L1 D cache store misses)              :          160017
  PM_LD_MISS_L1 (L1 D cache load misses)               :          615182
  PM_INST_CMPL (Instructions completed)                :       848822932
  PM_LSU_IDLE (Cycles LSU is idle)                     :       460224933
  PM_CYC (Processor cycles)                            :       757738113
  PM_ST_DISP (Store instructions dispatched)           :       251088030
  PM_LD_DISP (Load instr dispatched)                   :       516488120
  PM_FPU1_CMPL (FPU1 produced a result)                :               0
  PM_ST_CMPL (Store instruction completed)             :           62582
  PM_LD_CMPL (Loads completed)                         :          134812
  PM_FPU0_CMPL (Floating-point unit produced a result) :              38
  PM_FPU_FMA (FPU executed multiply-add instruction)   :               0
  Derived metric group: default
  Utilization rate                                 :         189.830 %
  % TLB misses per cycle                           :           0.050 %
  number of loads per TLB miss                     :           0.355
  Total l2 data cache accesses                     :           0.775 M
  % accesses from L2 per cycle                     :           0.102 %
  L2 traffic                                       :          47.276 MBytes
  L2 bandwidth per processor                       :          44.431 MBytes/sec
  Total load and store operations                  :           0.197 M
  Instructions per load/store                      :        4300.145
  number of loads per load miss                    :         839.569
  number of stores per store miss                  :        1569.133
  number of load/stores per D1 miss                :         990.164
  L1 cache hit rate                                :           0.999 %
  % Cycles LSU is idle                             :          30.355 %
  MIPS                                             :         199.113
  Instructions per cycle                           :           1.120