AI Hardware

Extending 8-bit training breakthroughs to the toughest deep learning models

Share this post:

Over the past few years, reduced precision techniques have proven exceptionally effective in accelerating deep learning training and inference on AI hardware.  State-of-the-art hardware platforms for training deep neural networks (DNNs) have largely evolved from a traditional single precision floating point (FP32-bit) computations towards FP16-bit precision, in large part due to the high-energy efficiency and smaller bit storage associated with using reduced-precision representations.

IBM Research has played a leading role in developing reduced precision technologies and pioneered a number of key breakthroughs, including the first 8-bit training techniques (presented at NeurIPS 2018), and state-of-the-art 2-bit inference results (presented at SysML 2019).

For more computation intensive training, the choices are limited beyond FP16-bit precisions. The most common 8-bit solutions that adopt an INT8 format are limited to inference only, not training. In addition, it’s difficult to prove whether existing reduced precision training and inference beyond 16-bit are preferable to deep learning domains other than common image classification networks like ResNets50.

At this year’s NeurIPS conference, IBM Research continues to advance its 8-bit training platform to improve performance and maintain accuracy for the most challenging emerging deep learning models, as presented in the NeurIPS paper “Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks.”

Last year IBM Research demonstrated a FP8 scheme which robustly worked with convolutional networks such as Resnet50.  This new hybrid method for training fully preserves model accuracy across a broader spectrum of deep learning models. The Hybrid FP8-bit format also overcomes previous training accuracy loss on models like MobileNet (Vision) and Transformer (NLP), which are more susceptible to information loss from quantization. To overcome this challenge, the Hybrid FP8 scheme adopts a novel FP8-bit format in the forward path for higher resolution and another FP8-bit format for gradients in the backward path for larger range.


Figure 1: Hybrid FP8 scheme chooses 2 different FP8 formats – (1,4,3) with bias for weights and activations and (1,5,2) for gradients with loss scaling.

This new hybrid format fully preserves model accuracy across a wide spectrum of deep learning models in image classification, natural language processing, and speech and object detection.

Baseline vs, Hybrid FP8 training on Image, Language, Speech, and Object-Detection Models


Figure 2: IBM Research’s HFP8 scheme achieves comparable accuracy to FP32 across a suite of complex models for vision, speech, and language.

This new scheme offers several benefits. For existing models already trained in higher-precision formats, the new FP-8bit formats demonstrated straightforward mapping of such networks to inference deployments without re-training. Low-precision inference models typically require time-consuming re-training the network for the those formats. However, this 8-bit solution requires no tuning or quantization-aware training prior to inference deployment.

Finally, when local computation has been accelerated by the 8-bit training, the communication of weight gradients becomes the bottleneck in distributed learning. This new weight update protocol for distributed training broadcasts 8-bit weights for better use of bandwidth.  This approach further reduces training time by 30-60 percent.

HFP8 is part of IBM Research’s work on Digital AI Cores within the IBM Research AI Hardware Center, opened earlier this year, and part of the center’s ambitious roadmap for AI acceleration. These advances support a critical need of AI hardware to handle increased model processing power while managing energy consumption.

Research Staff Member

Kailash Gopalakrishnan

IBM Fellow and Senior Manager, Accelerator Architectures and Machine Learning, IBM Research

More AI Hardware stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading