For enterprise AI, horsepower changes everything

This blog post is sponsored by IBM. The author, Peter Rutten, is a Research Director for the IDC Enterprise Infrastructure Practice, focusing on high-end, accelerated, and heterogeneous infrastructure and their use cases. Information and opinions contained in this article are his own, based on research conducted for IDC.

An IDC survey that I performed in 2018 (N = 200) shows what organizations require most from an artificial intelligence (AI) platform above all else is performance–more so than affordability, a specific form factor, vendor support or a full AI software stack on the platform. In other words, businesses want horsepower and they want it from all the critical components: the processors, the accelerators, the interconnects, and the I/O system.

Horsepower is indeed what the industry has been focused on. Most of the disruption in the infrastructure ecosystem is caused by a renewed quest for processor and co-processor performance in the AI era. Significant progress has been made in the past two years and there is a lot of horsepower available to organizations to run their AI workloads, including deep learning training tasks. On the other hand, though, the bar gets set higher and higher. The algorithms are becoming more complex, not to mention larger, and the data volumes that the algorithms train on are growing immensely.

There’s an interesting IDC chart that shows the percentage of worldwide x86 server units reaching near 99 percent in 2016 and a simultaneous sudden ramp-up in co-processor sales starting in that same year. That, of course, was the year when AI– more specifically, deep learning–entered the stage. It quickly became evident that general purpose CPUs couldn’t handle core-hungry AI workloads.

What do I mean by “core hungry”? AI is based on sophisticated mathematical and statistical computations. Take, for example, image and video analytics. Images are converted to matrices, with each pixel being represented by a number. Millions of matrices plus their classifications are fed into a neural network for correlation. The matrices are then multiplied with each other to find the right result. To speed this process up, it must be done in parallel on many more cores than CPUs can provide.

CPUs are designed for serial processing and they are close to reaching their maximum potential due to the size and cost of their cores. Hence, the rise of different types of CPUs as well as accelerators such as GPUs and custom designed processors (ASIC, FPGAs). These accelerators have massively parallel architectures with hundreds or even thousands of cores on a die that affordably deliver the parallel compute performance needed.

The AI performance paradigm is Massively Parallel Compute (MPC). AI workloads (but also big data analytics and simulation & modeling) require performance that can only be achieved with clustered server nodes that house multiple co-processors that contain thousands of cores–tensor cores in a GPU, for example. Co-processors–typically GPUs, FPGAs, or ASICs–are being used to improve performance across various workloads. The most common accelerated workloads today are networking, security, encryption, real-time analytics, and compression, followed closely by AI deep learning training.

AI performance illustrated via workloads that run on acceleration technology, by deployment

The scramble to compensate for limited host processor performance is not unique to AI

The world’s fastest supercomputer in 2019–Summit–was built with thousands of nodes that each have 2 IBM POWER9 processors and 4 NVIDIA V100 GPUs. Every year, the Supercomputer Top 500 lists more systems that leverage such co-processors rather than relying solely on CPUs.

The same survey referenced earlier shows that businesses achieve between 58-73 percent performance improvements from acceleration with co-processors. They do so at the expense of increased CAPEX (if on premises) or OPEX (if using a CSP) between 26-33 percent.

Those are decent statistics, but there’s a fundamental concern that has crept into the discussion: is it really enough? Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and a co-founder of AI hardware start-up Samba Nova has stated: “Organizations are currently settling for expensive temporary solutions that are being cobbled together to run AI applications. A radically new architecture is needed.”

There are many startups in the purpose-built AI infrastructure category, and I fully expect one or more of them to make a significant impact in the near future.

However, the large processor incumbents–IBM, Intel, AMD, Xilinx, Google, AWS, and NVIDIA–have been innovating aggressively to solve the performance gap.

What will the result of all this innovation be? First and foremost, more horsepower for AI. Much more! But also, some competitive chaos in the AI infrastructure marketplace. Processor startups will need to continue their financing efforts (processor startups are not cheap) while creating software ecosystems and server OEM and CSP partnerships. The processor incumbents are competing to get server OEM’s and CSP’s buy-in, the only exception being IBM, which builds its own AI-optimized servers with its own processors, both for Power Systems and IBM Z. And IT will have to evaluate which processor platform warrants investing. My advice to IT would be:

  • Don’t get core starved. Performance issues with AI are generally the result of insufficient parallelization of the infrastructure that the AI workload runs on. AI infrastructure is increasingly based on MPC, meaning clusters of accelerated server nodes with fast interconnects.
  • Keep track of the new AI processor technologies. While some are not yet expected for several years, others are available today, especially those from the large incumbents. Request information from your server vendor what their stance is toward emerging performance requirements in terms of new processors, co-processors, interconnects, or combinations thereof.
  • Don’t be afraid to build a heterogenous AI compute infrastructure, even if that means a bit more complexity than with a 100% homogeneous environment–AI requires it. Remember that heterogeneous infrastructure is no longer complicated the way it used to be thanks to open source layers that abstract away from the hardware (think: Linux, containers, Kubernetes, etc.)

In short: to achieve the horsepower you need for AI, embrace the diversity. Talk to your data scientists and AI developers about infrastructure parallelization. Then, investigate platforms that you may not have had in the datacenter before, platforms with different processors, co-processors, and interconnects for better parallelization. Your AI performance will depend on it.

To learn more about infrastructure for AI, you can read my IDC paper, Rethinking Your Infrastructure for Enterprise AI.