Introduction to the MMA (Matrix Math Accelerator) component of Power10 systems

How To

Summary

This article provides a brief guide for integrating and optimizing the MMA (Matrix Math Accelerator) component of Power10 systems.

Steps

P10 Compute & MMA Architecture

2x Bandwidth matched SIMD*
8 independent Fixed & Float SIMD engines per Core
4 – 32x Matrix Math Acceleration*
4 512 bit engine per core = 2048b results / cycles
Matrix math outer products of Single, Double & Reduced precision.
MMA Architecture support introduced in POWER ISA v3.1
Supports SP, DP, BF16, HP, Int-16, Int-8 & Int-4 precision levels.

P10 MMA Applications & Workload Integration

ML & HPC applications with dense linear algebra computations, matrix multiplications, convolutions, FFT can be accelerate with MMA
GCC version >= 10 & LLVM version >=12 supports MMA through built-ins.
OpenBLAS, IBM ESSL & Eigen Libraries already optimized with MMA instructions for P10.
Easy integration of MMA for enterprise applications, ML frameworks and Open
Community packages via above BLAS libraries.

Related Information

PowerPC Matrix-Multiply Assist Built-in Functions

Matrix-Multiply Assist Best Practices Guide

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"POWER10","label":"IBM Power10"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]}]

Tips

Introduction to the MMA (Matrix Math Accelerator) component of Power10 systems

How To

Summary

Steps

Related Information

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?