Topic
3 replies Latest Post - ‏2010-11-05T11:26:47Z by jadamcze
rotqbyi
rotqbyi
3 Posts
ACCEPTED ANSWER

Pinned topic branch hints

‏2010-11-04T21:13:05Z |
I'm trying to understand branch hinting. My C++-code:




void f () 
{ asm 

volatile (
"nop"); 
}   

int main () 
{ spu_write_decrementer (0xffffffff); asm (
"hbra .foo, %0" :: 
"p" (&f)); 
// *** 

for (

int i = 0; i < 10000; i++) asm (
".foo: brsl $lr, %0" :: 
"p" (&f)); unsigned t = 0xffffffff - spu_read_decrementer (); printf (
"%u\n", t); 
}


Here is the relevant section of the output from objdump -d for the code above:


00000168 <_Z1fv>: 168:     40 20 00 00     nop     $0 16c: 35 00 00 00     bi      $0   00000170 <main>: 170:   40 ff ff 82     il      $2,-1 174:      12 00 03 89     hbrr    198 <.foo+0x8>,190 <.foo>   # 190 178:      24 00 40 80     stqd    $0,16($1) 17c:  24 ff 80 81     stqd    $1,-32($1) 180: 1c f8 00 81     ai      $1,$1,-32 184:  21 a0 03 82     wrch    $ch7,$2 188:    10 00 2d 02     hbra    190 <.foo>,168 <_Z1fv> 18c: 40 93 88 02     il      $2,10000        # 2710   00000190 <.foo>: 190:       33 7f fb 00     brsl    $0,168 <_Z1fv>    # 168 194:      1c ff c1 02     ai      $2,$2,-1 198:   21 7f ff 02     brnz    $2,190 <.foo>     # 190 19c:      01 a0 04 04     rdch    $4,$ch8 1a0:    42 03 d8 03     ila     $3,1968 # 7b0 1a4:      00 20 00 00     lnop 1a8:       09 21 02 04     nor     $4,$4,$4 1ac:   33 00 10 80     brsl    $0,230 <printf>   # 230 1b0:      32 00 00 00     br      0


And when hbr is commented out in the C++ code:


00000168 <_Z1fv>: 168:   40 20 00 00     nop     $0 16c: 35 00 00 00     bi      $0   00000170 <main>: 170:   40 20 00 7f     nop     $127 174:       12 00 03 89     hbrr    198 <.foo+0x8>,190 <.foo>   # 190 178:      40 ff ff 82     il      $2,-1 17c:      24 00 40 80     stqd    $0,16($1) 180:  24 ff 80 81     stqd    $1,-32($1) 184: 1c f8 00 81     ai      $1,$1,-32 188:  21 a0 03 82     wrch    $ch7,$2 18c:    40 93 88 02     il      $2,10000        # 2710   00000190 <.foo>: 190:       33 7f fb 00     brsl    $0,168 <_Z1fv>    # 168 194:      1c ff c1 02     ai      $2,$2,-1 198:   21 7f ff 02     brnz    $2,190 <.foo>     # 190 19c:      01 a0 04 04     rdch    $4,$ch8 1a0:    42 03 d8 03     ila     $3,1968 # 7b0 1a4:      00 20 00 00     lnop 1a8:       09 21 02 04     nor     $4,$4,$4 1ac:   33 00 10 80     brsl    $0,230 <printf>   # 230 1b0:      32 00 00 00     br      0

Without the hbr line in the C++ code, the code is faster.

I think that the second hbr disables the first hbr inserted by g++ for the loop condition. How can I keep both hints active? Do I have to move them inside the loop? (Can I force g++ to do that for me or would I have to code the loop in asm?) How would pipelining the hints work?

Thank you.
Updated on 2010-11-05T11:26:47Z at 2010-11-05T11:26:47Z by jadamcze
  • jadamcze
    jadamcze
    219 Posts
    ACCEPTED ANSWER

    Re: branch hints

    ‏2010-11-04T23:26:45Z  in response to rotqbyi
    From the CellBE Programming Handbook 1.12, page 697:


    24.3.3.5 Rules for Using Branch Hints

    The following general rules apply to the hint for branch (HBR) instructions:

    • An HBR instruction should be placed at least 11 cycles followed by four instruction pairs before the branch instructions being hinted by the HBR instruction. In other words, an HBR instruction must be followed by at least 11 cycles of instructions, followed by eight instructions aligned on an even address boundary. More separation between the hint and branch will probably improve the performance of applications on future SPU implementations.

    • If an HBR instruction is placed too close to the branch, then a hint stall will result. This results in the branch instruction stalling until the timing requirement of the HBR instruction is satisfied.

    • If an HBR instruction is placed closer to the hint-trigger address than four instruction pairs plus one cycle, then the hint stall does not occur and the HBR is not used.

    • Only one HBR instruction can be active at a time. Issuing another HBR cancels the current one.

    • An HBR instruction can be moved outside of a loop and will be effective on each loop iteration as long as another HBR or sync instruction is not executed.

    • The HBR instruction must be placed within -256 to +255 instructions of the branch instruction.

    • The HBR instruction only affects performance.


    Sections 24.3.{4.5.6.7} further address the topic of branch prediction.
    The hint that you are adding is too close to the branch to be of any use, and cancels the hint inserted by the compiler.

    For providing hints to the compiler, I'd suggest that it's generally better to use __builtin_expect() (carefully, and when you're sure the compiler is doing the wrong thing), and let the compiler provide the hints for function calls.

    If you want to experiment further with branch prediction, perhaps writing assembly directly would be more useful - that way you can avoid clashing with the compiler's attempts to do so. You can get a dump of the assembly generated by the compiler using the -S option.
    • rotqbyi
      rotqbyi
      3 Posts
      ACCEPTED ANSWER

      Re: branch hints

      ‏2010-11-05T09:01:39Z  in response to jadamcze
      Thank you for your answer.

      I don't fully understand the sentence "Only one HBR instruction can be active at a time. Issuing another HBR cancels the current one."

      There seems to exist a pipelined hint mode. Doesn't this mean that I should be able to send a second hbr before the first branch is taken?

      
      f:       bi   $0   main:    hbrr .1, f ...  enough cmds hbrr .2, f ...  enough cmds .1:      brsl $0, f .2:      brsl $0, f
      

      Will the first hbr be inactive in this case?

      Is there a better / preferred way to optimize a sequence of brsl-instructions (other than inlining the functions)?
      • jadamcze
        jadamcze
        219 Posts
        ACCEPTED ANSWER

        Re: branch hints

        ‏2010-11-05T11:26:47Z  in response to rotqbyi
        From what I understand of the pipelined hint mode (after re-reading 24.3.3 several times) it does not permit multiple active hints. Instead, pipelined mode negates (disables?) hint stall, which I think reduces the amount of lead time required between the hint and the branch.

        imho, the best option is eliminating the branches - eliminating the need for hints. Hint if you have to.

        I can highly recommend careful application of selb, __attribute__((always_inline)) and __attribute__((flatten)) :)

        Do you have an example of a case where many calls to brsl are causing a significant time penalty?