IBM Support

Introduction to the MMA (Matrix Math Accelerator) component of Power10 systems

How To


Summary

This article provides a brief guide for integrating and optimizing the MMA (Matrix Math Accelerator) component of Power10 systems.

Steps

P10 Compute & MMA Architecture
  • 2x Bandwidth matched SIMD*
  • 8 independent Fixed & Float SIMD engines per Core
  • 4 – 32x Matrix Math Acceleration*
  • 4 512 bit engine per core = 2048b results / cycles
  • Matrix math outer products of Single, Double & Reduced precision.
  • MMA Architecture support introduced in POWER ISA v3.1
  • Supports SP, DP, BF16, HP, Int-16, Int-8 & Int-4 precision levels.
P10 MMA Applications & Workload Integration
  • ML & HPC applications with dense linear algebra computations, matrix multiplications, convolutions, FFT can be accelerate with MMA
  • GCC version >= 10 & LLVM version >=12 supports MMA through built-ins.
  • OpenBLAS, IBM ESSL & Eigen Libraries already optimized with MMA instructions for P10.
  • Easy integration of MMA for enterprise applications, ML frameworks and Open
  • Community packages via above BLAS libraries.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"POWER10","label":"IBM Power10"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]}]

Document Information

Modified date:
17 September 2024

UID

ibm17155093