Skip to main content

Tech tips: SPU vector intrinsics at your fingertips

Here's a quick list to keep you on the right side of common Cell/B.E. SPU vector intrinsics

Jonathan Bartlett (johnnyb@eskimo.com), Director of Technology, New Medio
Jonathan Bartlett is the author of the book Programming from the Ground Up , an introduction to programming using Linux assembly language. He is the lead developer at New Media Worx, responsible for developing Web, video, kiosk, and desktop applications for clients.

Summary:  Know these common C/C++ language extensions intrinsics and greatly simplify the arduous task of using the SPU's assembly language.

View more content in this series

Date:  01 May 2007
Level:  Intermediate
Activity:  3602 views

Instead of focusing on the SPU's assembly language to help you get to know the Cell Broadband Engine (Cell/B.E.) processor intimately, this tip, excerpted from the developerWorks article "Programming high-performance applications on the Cell BE processor, Part 5," provides a quick look at C/C++ so you can let the compiler do a large amount of the work for you. To use the SPU C/C++ language extensions, the header file spu_intrinsics.h must be included at the beginning of your code.

Vector intrinsics basics

The C/C++ language extensions include data types and intrinsics that give the programmer nearly full access to the SPU's assembly language instructions. However, many intrinsics are provided which greatly simplify the SPU's assembly language by coalescing many similar instructions into one intrinsic.

Instructions that differ only on the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand. For addition, spu_add, when given two vector unsigned ints as parameters, will generate the a (32-bit add) instruction. However, if given two vector floats as parameters, it will generate the fa (float add) instruction.

Note that the intrinsics generally have the same limitations as their corresponding assembly language instructions. However, in cases where an immediate value is too large for the appropriate immediate-mode instruction, the compiler will promote the immediate value to a vector and do the corresponding vector/vector operation. For instance, spu_add(myvec, 2) generates an ai (add immediate) instruction while spu_add(myvec, 2000) first loads the 2000 into its own vector using il and then performs the a (add) instruction.

The order of operands in the intrinsics is essentially the same as those of the assembly language instruction except that the first operand (which holds the destination register in assembly language) is not specified, but instead is used as the return value for the function. The compiler supplies the actual parameter in the code it generates.

For more on vector intrinsics, see "Programming on the Cell/B.E. processor, Part 5," the article from which this tip was taken.


Basic SPU intrinsics

This list will supply some of the more common SPU intrinsics; types are not given as most of them are polymorphic.

  • spu_add(val1, val2)
    Adds each element of val1 to the corresponding element of val2. If val2 is a non-vector value, it adds the value to each element of val1.
  • spu_sub(val1, val2)
    Subtract each element of val2 from the corresponding element of val1. If val1 is a non-vector value, then val1 is replicated across a vector, and then val2 is subtracted from it.
  • spu_mul(val1, val2)
    Because the multiplication instructions operate so differently, the SPU intrinsics do not coalesce them as much as they do for other operations. spu_mul handles floating point multiplication (single and double precision). The result is a vector where each element is the result of multiplying the corresponding elements of val1 and val2 together.
  • spu_and(val1, val2), spu_or(val1, val2), spu_not(val), spu_xor(val1, val2), spu_nor(val1, val2), spu_nand(val1, val2), spu_eqv(val1, val2)
    Boolean operations operate bit-by-bit, so the type of operands the boolean operations receive is not relevant except for determining the type of value they will return. spu_eqv is a bitwise equivalency operation, not a per-element equivalency operation.
  • spu_rl(val, count), spu_sl(val, count)
    spu_rl rotates each element of val left by the number of bits specified in the corresponding element of count. Bits rotated off the end are rotated back in on the right. If count is a scalar value, then it is used as the count for all elements of val. spu_sl operates the same way, but performs a shift instead of a rotate.
  • spu_rlmask(val, count), spu_rlmaska, spu_rlmaskqw(val, count), spu_rlmaskqwbyte(val, count)
    These are very confusingly named operations. They are named "rotate left and mask," but they are actually performing right shifts (they are implemented by a combination of left shifts and masks, but the programming interface is for right shifts). spu_rlmask and spu_rlmaska shifts each element of val to the right by the number of bits in the corresponding element of count (or the value of count if count is a scalar). spu_rlmaska replicates the sign bit as bits are shifted in. spu_rlmaskqw operates on the whole quadword at a time, but only up to 7 bits (it performs a modulus on count to put it in the proper range). spu_rlmaskqwbyte works similarly, except that count is the number of bytes instead of bits, and count is modulus 16 instead of 8.
  • spu_cmpgt(val1, val2), spu_cmpeq(val1, val2)
    These instructions perform element-by-element comparisons of their two operands. The results are stored as all ones (for true) and all zeros (for false) in the resulting vector in the corresponding element. spu_cmpgt performs a greater-than comparison while spu_cmpeq performs an equality comparison.
  • spu_sel(val1, val2, conditional)
    This corresponds to the selb assembly language instruction. The instruction itself is bit-based, so all types use the same underlying instruction. However, the intrinsic operation returns a value of the same type as the operands. As in assembly language, spu_sel looks at each bit in conditional. If the bit is zero, the corresponding bit in the result is selected from the corresponding bit in val1; otherwise it is selected from the corresponding bit in val2.
  • spu_shuffle(val1, val2, pattern)
    This is an interesting instruction which allows you to rearrange the bytes in val1 and val2 according to a pattern, specified in pattern. The instruction goes through each byte in pattern, and if the byte starts with the bits 0b10, the corresponding byte in the result is set to 0x00; if the byte starts with the bits 0b110, the corresponding byte in the result is set to 0xff; if the byte starts with the bits 0b111, the corresponding byte in the result is set to 0x80; finally (and most importantly), if none of the previous are true, the last five bits of the pattern byte are used to choose which byte from val1 or val2 should be taken as the value for the current byte. The two values are concatenated, and the five-bit value is used as the byte index of the concatenated value. This is used for inserting elements into vectors as well as performing fast table lookups.

All of the instructions that are prefixed with spu_ will try to find the best instruction match based on the types of operands. However, not all vector types are supported by all instructions -- it is based on the availability of assembly language instructions to handle it.

In addition, if you want a specific instruction rather than having the compiler choose one, you can perform almost any non-branching instruction with the specific instrinsics. All specific intrinsics take the form si_assemblyinstructionname where assemblyinstructionname is the name of the assembly language instruction as defined in the SPU Assembly Language Specification. So, si_a(a, b) forces the instruction a to be used for addition.

All operands to specific intrinsics are cast to a special type called qword, which is essentially an opaque register value type. The return value from specific intrinsics are also qwords, which can then be cast into whatever vector type you wish.


But wait! There's more

In "Programming on the Cell/B.E. processor, Part 5," the article from which this tip was taken, you can learn more about how to use the vector extensions and discover how to direct the compiler to do branch prediction and to perform DMA transfers in C/C++.


Resources

Learn

Get products and technologies

Discuss

About the author

Jonathan Bartlett is the author of the book Programming from the Ground Up , an introduction to programming using Linux assembly language. He is the lead developer at New Media Worx, responsible for developing Web, video, kiosk, and desktop applications for clients.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=216558
ArticleTitle=Tech tips: SPU vector intrinsics at your fingertips
publish-date=05012007
author1-email=johnnyb@eskimo.com
author1-email-cc=dwpower@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers