Evaluate performance for Linux on POWER
Analyze performance using Linux tools
Note:
- White boxes are specific POWER7 PCMs watched in a profile: Completion Stall Cycles <C>, Stall by FXY <C2>, FXU Multi-cycle Instruction <C2A>, Stall by Scalar <C3C>, Stall by Scalar Long <C3C1>, Stall by Vector <C3B>, Stall by Vector Long <C3B1>, Stall by DFU <C3A, Stall by LSU <C1>, Stall by Reject <C1A>, Translation Stall <C1A1>, Other Reject <C1A2>, Stall by D-cache Miss <C1B>, Stall Store <C1B>, Stall due SMT <C4>, Stall due IFU <C5>, Stall due BRU <C5A>, GCT Empty Cycles <B>, GCT Empty due lcache Miss <B1>, GCT Empty due branch Mispredict <B2>, GCT Empty due branch Mispredict and lcache Miss <B3>, Completion Cycles <A>, Base COmpletion Cycles <A1>
- Gray boxes [marked with an asterisk (*)] are calculated (these metrics have no specific hardware counters): Stall by VSU <C3>, FXU Other, Stall by Scalar Other <C3C2>, Stall by Vector Other <C3B2>, LSU Other <C1D>, Other IFU Stall <C5B>, Other Stall <C6>, GCT Empty Other, Overhard of expansion
(Print using landscape format.)
Table 1. Partial POWER 7 CBM
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 |
---|---|---|---|---|
Cycles(PM_RUN_CYC) |
Completion Stall Cycles
<C> (PM_CMPLU_STALL) | Stall by
FXY <C2> (PM_CMPLU_STALL_FXU) | FXU
Multi-cycle Instruction
<C2A> (PM_CMPLU_STALL_DIV) | |
FXU Other * (C2-C2A) (PM_CMPLU_STALL_FXU_OTHER) | ||||
Stall by VSU <C3> * (C3A + C3B + C3C) (PM_CMPLU_STALL_VSU) | Stall by
Scalar
<C3C> (PM_CMPLU_STALL_SCALAR) | Stall by
Scalar Long
<C3C1> (PM_CMPLU_STALL_SCALAR_LONG) | ||
Stall by Scalar Other <C3C2>
* (C3C - C3C1) (PM_CMPLU_STALL_SCALAR_OTHER) | ||||
Stall by
Vector
<C3B> (PM_CMPLU_STALL_VECTOR) | Stall by
Vector Long
<C3B1> (PM_CMPLU_STALL_VECTOR_LONG) | |||
Stall by Vector Other <C3B2>
* (C3B - C3B1) (PM_CMPLU_STALL_VECTOR_OTHER) | ||||
Stall by
DFU <C3A> (PM_CMPLU_STALL_DFP) | ||||
Stall by
LSU <C1> (PM_CMPLU_STALL_LSU) | Stall by
Reject
<C1A> (PM_CMPLU_STALL_REJECT) |
Translation Stall
<C1A1> (PM_CMPLU_STALL_ERAT_MISS) | ||
Other
Reject <C1A2> (C1A - C1A1) (PM_CMPLU_STALL_ERAT_OTHER) | ||||
Stall by
D-cache Miss
<C1B> (PM_CMPLU_STALL_DCACHE_MISS) | ||||
Stall
Store
<C1B> (PM_CMPLU_STALL_STORE) | ||||
LSU Other <C1D> * (C - C1A - C1B - C1C) (PM_CMPLU_STALL_LSU_OTHER) | ||||
Stall due
SMT <C4> (PM_CMPLU_STALL_THRD) | ||||
Stall due
IFU <C5> (PM_CMPLU_STALL_IFU) | Stall due
BRU <C5A> (PM_CMPLU_STALL_BRU) | |||
Other IFU Stall <C5B> * (C5 - C5A) (PM_CMPLU_STALL_IFU_OTHER) | ||||
Other Stall <C6> * (C - C1 - C2 - C3 - C4 - C5) (PM_CMPLU_STALL_OTHER) | ||||
GCT Empty
Cycles <B> (PM_GCT_NOSLOT_CYC) | GCT Empty
due lcache Miss
<B1> (PM_GCT_NOSLOT_IC_MISS) | |||
GCT Empty
due branch Mispredict
<B2> (PM_GCT_NOSLOT_BR_MPRED) | ||||
GCT Empty
due branch Mispredict and lcache Miss
<B3> (PM_GCT_NOSLOT_BR_MPRED_IC_MISS) | ||||
GCT Empty Other * (B1 - B1 - B2 - B3) (PM_GCT_EMPTY_OTHER) | ||||
Completion
Cycles <A> (PM_GRP_CMPL) | Base
COmpletion Cycles
<A1> (PM_1PLUS_PPC_CML) | |||
Overhard of expansion * (A-A1) |