The hpmstat command
The following is an example output from the hpmstat command.
# hpmstat -s 7
Execution time (wall clock time): 1.003946 seconds
Counting mode: user
PM_TLB_MISS (TLB misses) : 260847
PM_CYC (Processor cycles) : 3013964331
PM_ST_REF_L1 (L1 D cache store references) : 161377371
PM_LD_REF_L1 (L1 D cache load references) : 255317480
PM_INST_CMPL (Instructions completed) : 1027391919
PM_RUN_CYC (Run cycles) : 1495147343
Derived metric group: default
Utilization rate : 181.243 %
Total load and store operations : 416.695 M
Instructions per load/store : 2.466
MIPS : 1023.354
Instructions per cycle : 0.341
The following is an example of the hpmstat command with counter multiplexing:
# hpmstat -s 1,2 -d
Execution time (wall clock time): 2.129755 seconds
Set: 1
Counting duration: 1.065 seconds
PM_INST_CMPL (Instructions completed) : 244687
PM_FPU1_CMPL (FPU1 produced a result) : 0
PM_ST_CMPL (Store instruction completed) : 31295
PM_LD_CMPL (Loads completed) : 67414
PM_FPU0_CMPL (Floating-point unit produced a result) : 19
PM_CYC (Processor cycles) : 295427
PM_FPU_FMA (FPU executed multiply-add instruction) : 0
PM_TLB_MISS (TLB misses) : 788
Set: 2
Counting duration: 1.064 seconds
PM_TLB_MISS (TLB misses) : 379472
PM_ST_MISS_L1 (L1 D cache store misses) : 79943
PM_LD_MISS_L1 (L1 D cache load misses) : 307338
PM_INST_CMPL (Instructions completed) : 848578245
PM_LSU_IDLE (Cycles LSU is idle) : 229922845
PM_CYC (Processor cycles) : 757442686
PM_ST_DISP (Store instructions dispatched) : 125440562
PM_LD_DISP (Load instr dispatched) : 258031257
Counting mode: user
PM_TLB_MISS (TLB misses) : 380260
PM_ST_MISS_L1 (L1 D cache store misses) : 160017
PM_LD_MISS_L1 (L1 D cache load misses) : 615182
PM_INST_CMPL (Instructions completed) : 848822932
PM_LSU_IDLE (Cycles LSU is idle) : 460224933
PM_CYC (Processor cycles) : 757738113
PM_ST_DISP (Store instructions dispatched) : 251088030
PM_LD_DISP (Load instr dispatched) : 516488120
PM_FPU1_CMPL (FPU1 produced a result) : 0
PM_ST_CMPL (Store instruction completed) : 62582
PM_LD_CMPL (Loads completed) : 134812
PM_FPU0_CMPL (Floating-point unit produced a result) : 38
PM_FPU_FMA (FPU executed multiply-add instruction) : 0
Derived metric group: default
Utilization rate : 189.830 %
% TLB misses per cycle : 0.050 %
number of loads per TLB miss : 0.355
Total l2 data cache accesses : 0.775 M
% accesses from L2 per cycle : 0.102 %
L2 traffic : 47.276 MBytes
L2 bandwidth per processor : 44.431 MBytes/sec
Total load and store operations : 0.197 M
Instructions per load/store : 4300.145
number of loads per load miss : 839.569
number of stores per store miss : 1569.133
number of load/stores per D1 miss : 990.164
L1 cache hit rate : 0.999 %
% Cycles LSU is idle : 30.355 %
MIPS : 199.113
Instructions per cycle : 1.120