How To
Summary
This article provides a brief guide for integrating and optimizing the MMA (Matrix Math Accelerator) component of Power10 systems.
Steps
P10 Compute & MMA Architecture
- 2x Bandwidth matched SIMD*
- 8 independent Fixed & Float SIMD engines per Core
- 4 – 32x Matrix Math Acceleration*
- 4 512 bit engine per core = 2048b results / cycles
- Matrix math outer products of Single, Double & Reduced precision.
- MMA Architecture support introduced in POWER ISA v3.1
- Supports SP, DP, BF16, HP, Int-16, Int-8 & Int-4 precision levels.
P10 MMA Applications & Workload Integration
- ML & HPC applications with dense linear algebra computations, matrix multiplications, convolutions, FFT can be accelerate with MMA
- GCC version >= 10 & LLVM version >=12 supports MMA through built-ins.
- OpenBLAS, IBM ESSL & Eigen Libraries already optimized with MMA instructions for P10.
- Easy integration of MMA for enterprise applications, ML frameworks and Open
- Community packages via above BLAS libraries.
Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"POWER10","label":"IBM Power10"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]}]
Was this topic helpful?
Document Information
Modified date:
17 September 2024
UID
ibm17155093