Topic
6 replies Latest Post - ‏2008-09-08T12:21:08Z by SystemAdmin
SystemAdmin
SystemAdmin
10114 Posts
ACCEPTED ANSWER

Pinned topic PPU<->SPU mailboxes unbearably slow?

‏2006-08-30T08:10:37Z |
Hi, do you know if the latency or "reaction" times are specified anywhere for PPU<->SPU mailbox communication?

Asking since right now I'm getting something like 5..10ms of delay from putting someting into a mailbox on the PPU side and receiving it on the SPU side. 10ms is awfully much, esp. considering that the task my SPEs should carry out for each message take only about 10us. Is that 5..10ms really normal Cell mailbox behaviour?

(as an aside, spe_create_thread() also takes several hundred milliseconds, not entirely sure why, it can't be a main->ls memory bandwidth issue, or...?)

The background is that I'm attempting to set up persisting "server" threads on SPEs, and hand them tasks to process concurrently. Are mailboxes the best way for dispatching tasks? Or are faster ways?
PPU, heavily snipped C:
code
speid_t dispatchTask(params) {
/* start up all threads */
if(!spes_running) {
for (i=0; i<FFT_SPE_COUNT; i++) {
spe_ids[i] = spe_create_thread(0, &fft_spe, &ctx, NULL, -1, 0);
spe_isfree[i] = 1; /* each createthread takes several 100ms */
}
spes_running = 1;
}
/* ... get and reserve one SPE in spe_isfree[k]==1 */
/* free_spe=spe_ids[k] */
/* ... prepare task context 'ctx' struct */
/*... send context pointer to SPE */
spe_write_in_mbox(free_spe, (unsigned int)ctx);
/* return speid_t for later blocking wait */
return free_spe;
}
int waitTask(speid_t id) {
int status = spe_read_out_mbox(id);
while(-1==status) { /* this loops usually 0..1 times */
usleep(10);
status = spe_read_out_mbox(id);
}
/*... update spe_isfree[] */
return status;
}
void main() {
/* the dispatch + wait takes ~10ms in total */
spe1 = dispatchTask(1);
waitTask(spe1);
}
[/code]
SPU side:
code
main() {
int mbox_data;
while(1) {
mbox_data = spu_read_in_mbox(); /* blocking wait */
prof_clear();
prof_start();
/* ... DMA-in the task context from *mbox_data */
/* ... buffered DMA-in of additional work data, simulatenous calcs */
prof_stop();
spu_write_out_mbox(1); // return result 1="ok"
}
}
[/code]
Typical profiling result: SPU3: CP0, 24556(22562), 33153, so at 3200Mhz SPE default, 33153 cycles are about 10us, right?
Btw as an aside aside, why do forum posting code tags not properly format and colour single line // comments, only /* */? Please fix! :-)))
Many thanks!
Updated on 2008-09-08T12:21:08Z at 2008-09-08T12:21:08Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

    ‏2006-08-30T18:45:41Z  in response to SystemAdmin
    > usleep(10);

    Do a test to see how many usleep() calls you can do per second.
    You may be surprised.
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

      ‏2006-08-31T06:36:14Z  in response to SystemAdmin
      >Do a test to see how many usleep() calls you can do per second.
      >You may be surprised.

      Ouch! 10ms/100Hz system timer...

      Commented out the usleep() and now things are indeed much much faster. Many thanks for the info!

      Now it takes 70us for PPU preprocess->SPE task->PPU result, compared to 140us when the same preprocess and task steps are done on PPU only 8-))

      Using nanosleep() instead of usleep() didn't make any difference. Do you know if on the PPU side there's any blocking equivalent to the blocking spu_read_in_mbox()?
      • SystemAdmin
        SystemAdmin
        10114 Posts
        ACCEPTED ANSWER

        Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

        ‏2006-09-06T12:55:02Z  in response to SystemAdmin

        > Using nanosleep() instead of usleep() didn't make any
        > difference. Do you know if on the PPU side there's
        > any blocking equivalent to the blocking
        > spu_read_in_mbox()?

        The only PPU-side blocking calls that I'm aware of are
        spe_get_event() (using SPE_EVENT_MAILBOX to do a glorified
        poll() on the interrupt mailbox) and spe_wait() (which isn't
        what you're looking for :). I haven't found much in the way
        of examples for spe_get_event().

        That said, it seems that the PPU-side model is
        turn-and-burn/busy-wait; perhaps someone will jump in and
        tell us otherwise...
        • SystemAdmin
          SystemAdmin
          10114 Posts
          ACCEPTED ANSWER

          Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

          ‏2008-09-08T12:21:08Z  in response to SystemAdmin
          Use "spe_out_intr_mbox_read" for blocking reads on the ppu side.
      • gshi
        gshi
        280 Posts
        ACCEPTED ANSWER

        Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

        ‏2006-09-06T20:44:28Z  in response to SystemAdmin
        The latency I get is 7~8us, round trip: ppe->spe->ppe
  • CellServ
    CellServ
    1346 Posts
    ACCEPTED ANSWER

    Re: PPU&lt;-&gt;SPU mailboxes unbearably slow?

    ‏2006-08-31T16:12:57Z  in response to SystemAdmin
    One way to get a big improvement is to use direct problem state mailbox access.

    Even reading and writing problem state has significant latency because it is mapped as guarded. Building queues of requests and letting the SPE grab its work (vs. PPE pushing work) is generally a better solution.

    --
    IBM SDK Service Administrator