Before you start
The SPE unit is exceptionally powerful, but has very specific requirements to unleash that power. The SPE is single instruction, multiple data (SIMD) only, so all scalars have to be presented as vectors, and all vectors have to be aligned. The local store memory is single-port, and extreme data throughput can lead to instruction starvation without care to regularly fill the instruction buffer. Without branch prediction in Cell BE architecture, hints become critical to keeping the instruction buffer properly filled. Optimizing code for the even/odd dual-issue instruction logic is another performance tool. This tutorial discusses these issues and ancillary ones at length.
This article presumes some basic familiarity with the architecture of the Cell BE, and some basic understanding of computer architecture. Readers who managed the previous tutorial should be fine.