AI Hardware

Iso-accuracy Deep Learning Inference with In-memory Computing

Share this post:

Can analog AI hardware support deep learning inference without compromising accuracy? Our research team at IBM Research Europe in Zurich thought so when we started developing a groundbreaking technique that achieves both energy efficiency and high accuracy on deep neural network computations using phase-change memory devices. We believe this could be a way forward in advancing AI hardware-accelerator architectures. 

The Team: (Top L to R) Manuel Le Gallo, Simon Haefeli, Abu Sebastian, Irem Boybat, Martino Dazzi (Bottom L to R) Evangelos Eleftheriou, Bipin Rajendran, S.R. Nandakumar, Vinay Joshi, Christophe
Piveteau

Deep neural networks (DNNs) are revolutionizing the field of artificial intelligence as they continue to achieve unprecedented success in cognitive tasks such as image and speech recognition. However, running DNNs on current von Neuman computing architectures limits the achievable performance and energy efficiency. As power efficiency and performance should not be compromised, new hardware architecture is needed to optimized deep neural network inference.

The Sky is Not the Limit

For obvious reasons, Internet giants with server farms would ideally prefer to keep running such deep learning algorithms on the existing von Neuman infrastructure. At the end of the day, what’s adding on a few more servers to get the job done? This may work for a while, but server farms consume an enormous amount of energy. As deep learning continues to evolve and demand greater processing power, companies with large data centers will quickly realize that building more power plants to support an additional one million times the operations needed to run categorizations of a single image, for example, is just not economical, nor sustainable.

Of course, many companies are currently turning to the Cloud as a solution. Indeed, cloud computing has favorable capabilities, including faster processing which helps improve the performance of deep learning algorithms. But cloud computing has its shortcomings too. There are data privacy issues, potential response delays associated with the transmission of the data to the cloud and back, continual service costs, and in some areas of the world, slow internet connectivity.

And the problem goes well beyond data centers. Think drones, robots, mobile device and the like. Or consumer products, such as smart cameras, augmented reality goggles and devices. Clearly, we need to take the efficiency route going forward by optimizing microchips and hardware to get such devices running on fewer watts.

Merging Memory and Processing

While there has been significant progress in the development of hardware-accelerator architectures for inference, many of the existing set-ups physically split the memory and processing units. This means that DNN models are typically stored in off-chip memory, and that computational tasks require a constant shuffling of data between the memory and computing units – a process that slows down computation and limits the maximum achievable energy efficiency.

Our research, featured in Nature Communications, exploits in-memory computing methods using resistance-based (memristive) storage devices as a promising non-von Neumann approach for developing hardware that can efficiently support DNN inference models. Specifically, we propose an architecture based on phase-change memory (PCM) that, like the human brain, has no separate compartments to store and compute data, and therefore consumes significantly less energy.

Accuracy is Just as Essential

The challenge in using PCM devices, however, is achieving and maintaining computational accuracy. As PCM technology is analog in nature, computational precision is limited due to device variability as well as read and write conductance noise. To overcome this, we needed to find a way to train the neural networks so that transferring the digitally trained weights to the analog resistive memory devices would not result in significant loss of accuracy.

Our approach was to explore injecting noise to the synaptic weights during the training of DNNs in software as a generic method to improve the network resilience against analog in-memory computing hardware non-idealities. Our assumption was that injecting noise comparable to the device noise during the training of DNNs would improve the robustness of the models.

It turned out that our assumption was correct – training ResNet-type networks this way resulted in no considerable accuracy loss when transferring weights to PCM devices. We achieved an accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to analog PCM synapses. And after programing the trained weights of ResNet-32 on 723,444 PCM devices of a prototype chip, the accuracy computed from the measured hardware weights stayed above 92.6% over a period of 1 day. To the best of our knowledge, this is the highest accuracy experimentally reported to-date on the CIFAR-10 dataset by any analog resistive memory hardware.

Can We Do More?

Despite these unprecedented results, we still wanted to understand if we could improve the accuracy retention over time by introducing additional techniques. So, we developed an online compensation technique that exploits the batch normalization parameters to periodically correct the activation distributions during inference. This allowed us to improve the one-day CIFAR-10 accuracy retention up to 93.5% on hardware.

In parallel, the team also experimented with training DNN models using analog PCM synapses. Although training is a much more difficult problem to tackle than inference, using an innovative mixed-precision architecture, we were able to achieve software-equivalent accuracies on several types of small-scale DNNs, including multilayer perceptrons, convolutional neural networks, long-short-term-memory networks, and generative adversarial networks. This research was recently published in the peer-reviewed journal Frontiers In Neuroscience.

In an era transitioning more and more towards AI-based technologies, including internet-of-things battery-powered devices and autonomous vehicles, such technologies would highly benefit from fast, low-powered, and reliably accurate DNN inference engines. The strategies developed in our studies show great potential towards realizing accurate AI hardware-accelerator architectures to support DNN training and inferencing in an energy-efficient manner.

Fellow researchers affiliated with King’s College London, The Swiss Federal Institute of Technology in Zurich (ETH Zürich), and École Polytechnique Fédérale de Lausanne (EPFL) also contributed to this work. Our research is part of the IBM AI Hardware Center, which was launch one year ago. The center focuses on enabling next-generation chips and systems that support the tremendous processing power and unprecedented speed that AI requires to realize its full potential.


Accurate deep neural network inference using computational phase-change memory, Vinay Joshi, Manuel Le Gallo, Simon Haefeli, Irem Boybat, S.R. Nandakumar, Christophe Piveteau, Martino Dazzi, Bipin Rajendran, Abu Sebastian, Evangelos Eleftheriou, Nature Communications, 11, Article number: 2473 (2020)

Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.

 

Research Staff Member in the Neuromorphic and In-Memory Computing, IBM Research Europe

More AI Hardware stories

Homomorphic Encryption Comes to Linux on IBM Z 

For decades, society has benefitted from modern cryptography to protect our sensitive data during transmission and at rest. It seems daily that we see news about data breaches, privacy lapses, and inadvertent disclosures of information. In a real sense data privacy has gone from boardroom discussion a decade ago, to dinner table discussion. For IBM […]

Continue reading

Programming microfluidic functionalities in real-time with virtual channels

Work by our group at IBM Research Europe in Zurich has led to a new method for the rapid implementation of microfluidic operations. By tailoring the potential landscape inside a flow cell, we form so-called “virtual channels” on demand to perform high-precision guiding and transport, splitting, merging and mixing of microfluidic flows. This allows to […]

Continue reading

IBM Differential Privacy Library: The single line of code that can protect your data

IBM published a new release of its IBM Differential Privacy Library, which boasts a suite of tools for machine learning and data analytics tasks, all with built-in privacy guarantees. It's not unlike the the differential privacy the US Census will use to keep the responses of its citizens confidential when the data is made available.

Continue reading