Matrix Multiply Accelerate built-in functions
IBM® Open XL C/C++ for Linux® on Power® 17.1.1 adds Matrix Multiple Accelerate (MMA) built-in functions.
MMA is embedded into the IBM Power10 processor and is designed to achieve faster AI inference for FP32, BFloat16, and INT8 calculations.
ACC is an MMA 512-bit accumulator. MMA intrinsic procedures can be used to
directly exploit ACC on the Power10 processor and accelerate matrix multiplication
computations.
Intrinsic types
IBM Open XL C/C++ for Linux on Power 17.1.1 supports the following intrinsic types:
__vector_pair- 32-byte opaque vector type
__vector_quad- 64-byte opaque vector type