Inferencing does not need as much compute power as compared to the compute power required in training an artificial intelligence (AI) model. Thus, it is totally possible—and even more energy efficient—to inference without any extra hardware accelerators (such as GPUs) and even perform on edge devices. It is common to have AI inferencing models run on smart phones and similar devices just by using the CPU. Many picture and face filters on social media phone apps are all AI inferencing models.
IBM was the pioneer in adding on-processor accelerators for inferencing in its IBM Power10 chip, called the Matrix Math Accelerator (MMA) engines. This gave the Power10 platform the ability to be faster than other hardware architectures without the need to spend an extra watt in energy with added GPUs. The Power10 chip can extract insights from data faster than any other chip architecture and consumes much less energy than GPU-based systems, and that is why it is optimized for AI.
Leveraging IBM Power10 for AI, especially for inferencing, does not require any extra effort from AI DevOps teams. The data science libraries—such as openBLAS, libATen, Eigen and MLAS, to name a few—are already optimized to make use of the MMA engines. So, AI frameworks that leverage these libraries—such as Pytorch, Tensorflow and ONNX—already benefit from the on-chip acceleration. These optimized libraries are available through the RocketCE channel in anaconda.org.
IBM Power10 can also speed up inferencing by using reduced-precision data. Instead of feeding the inference model with 32-bit floating point data, one can feed it with 16-bit floating point data, for example—filling the processor with twice as much data for inferencing at the same time. This works well for some models without prejudice in the accuracy of the inferenced data.
Inferencing is the last stage of the AI DevOps cycle, and the IBM Power10 platform was designed to be AI-optimized, thus helping clients extract insights from data in a more cost-effective way both in terms of energy efficiency and reducing the need for extra accelerators.
If you want to learn more about inferencing on Power10, please reach out to IBM Technology Expert Labs.