December 11, 2019 | Written by: Stefano Ambrogio, Praneet Adusumilli, and Evangelos Eleftheriou
Categorized: AI | AI Hardware | Publications
Share this post:
Continued technological and theoretical advances in AI are making it ever more pervasive, leading to unprecedented improvements to many aspects of our lives. From facial recognition to unlock our personal electronic devices to speech detection and machine translation that enable more natural interaction with smart speakers, AI has experienced amazing growth in the past several years. And while theseAI advancements have been made possible by harnessing ever more complex artificial neural network (ANN) models, typically in a data center, they require a large computational and energy cost that is unsustainable over the long term. Some of the biggest growth opportunities, as well as challenges, will be moving from AI in the cloud to AI closer to the ‘edge’ – IoT and mobile devices, drones, robots and sensors – which operate in power and connectivity-constrained environments.
In today’s state-of-the-art AI systems, lots of energy and time are spent shuttling huge amounts of data between the compute and memory blocks, an obstacle commonly referred to as the von-Neumann bottleneck. An alternative approach is to compute “in-memory” – moving compute to where the data is stored – which is more energy efficient and may be critical to the continued growth in AI capabilities. However, the development of AI in edge devices, such as smartphones, IoT devices, drones, robots and so on, presents a larger challenge for computational architectures. The increased capabilities of contemporary AI models provide unprecedented recognition accuracy, but often at the expense of larger computational and energetic effort. Therefore, the development of novel hardware based on radically new processing paradigms is crucial for AI research to progress. One solution is using analog devices to perform in-memory computing, which has recently shown significant improvements in speed and energy efficiency. Deep Neural Network (DNN) training and inference are the two main areas of application that this novel analog hardware can impact.
Proposed techniques to enhance analog in-memory computing.
In parallel, the use of emerging technologies such as Phase-Change Memory (PCM) still poses major challenges. PCM devices are susceptible to noise, resistance-drift, non-symmetric and non-linear conductance change in response to an electrical stimulus, and reliability concerns.
To address these issues, IBM researchers from labs in Almaden, Yorktown Heights, Tokyo, and Zurich have developed new devices, new algorithmic and architectural solutions, a novel model training technique, and a full custom design. These advances, under the umbrella of the IBM AI Hardware Center, will be presented in the form of four papers at the 2019 International Electron Devices Meeting (IEDM) and Neural Information Processing Systems (NeurIPS) conferences. More information about each paper is below.
The Marriage of Training and Inference for Scaled Deep Learning Analog Hardware
By T. Gokmen et al. (IEDM)
Blindly using floating point weights, determined by state-of-the-art digital processors to perform inference with analog crossbar arrays, degrades performance due to device non-idealities such as noise and programming variability. The Marriage of Training and Inference for Scaled Deep Learning Analog Hardware shows that introducing variables such as device noise, analog to digital quantization and device failures during training process can mitigate these negative impacts. This practice will result in the discovery of optimum weights which are more robust to these non-ideal factors and improve classification accuracy. Crucially, this novel technique allows for a relaxation of the requirements on these analog devices and peripheral circuitry.
Reducing the Impact of Phase-Change Memory Conductance Drift on the Inference of Large-scale Hardware Neural Networks
By S. Ambrogio et al. (IEDM)
While the proposed technique provides resilience to noise and quantization, the paper Reducing the Impact of Phase-Change Memory Conductance Drift on the Inference of large-scale Hardware Neural Networks focuses on stability of PCM conductance during inference. PCM technology suffers from conductance drift – the analog programmed conductance decays over time – which is detrimental for classification accuracy. To address this challenge, the paper presents a novel algorithmic solution: correcting the slope of the activation function during inference time, thus largely restoring the information lost due to weight decay.
Reduction of the PCM drift impact on the MNIST dataset.
Metal-oxide Based, CMOS-compatible ECRAM for Deep Learning Accelerator
By Seyoung Kim et al. (IEDM)
Hardware has very different requirements for inference and for DNN training. One of the principal obstacles to high-accuracy training using analog hardware is the need for symmetric device conductance change during weight update phase. This, along with forward and back-propagation are the three essential steps of the algorithm at the heart of modern DNNs. Metal-oxide based, CMOS-compatible ECRAM for Deep Learning Accelerator presents an innovative non-volatile memory called Electro-Chemical Random-Access Memory, or ECRAM. ECRAM demonstrates sub-microsecond programming speed, a high conductance change linearity and symmetry, and a 2×2 array configuration without access selectors. ECRAM devices fabricated for the first time using CMOS-compatible materials have a symmetrical conductance response to thousands of electrical pulses and enable a hardware demonstration of a linear regression using the stochastic gradient descent algorithm – an important building block for DNN training.
Experimental demonstration of linear regression with stochastic gradient descent using ECRAM devices. Credit: M. Ishii et al.
On-Chip Trainable 1.4M 6T2R PCM Synaptic Array with 1.6K Stochastic LIF Neurons for Spiking RBM
By M. Ishii et al. (IEDM)
Another way to counteract device non-idealities and to decrease the power consumption is to implement spiking neural networks, therefore encoding the information in spikes and programming the weights using a brain-inspired algorithm such as spike-timing dependent plasticity, or STDP. On-Chip Trainable 1.4M 6T2R PCM Synaptic Array with 1.6K Stochastic LIF Neurons for Spiking RBM demonstrates a full hardware implementation of a Restricted Boltzmann Machine that can do training and inference on the MNIST dataset.
Mapping of ResNet-32 on a 4-by-10 array of computational memory cores. Each layer is mapped onto one core. Credit: M. Dazzi et al.
Parallel Prism: A topology for pipelined implementations of convolutional neural networks (CNNs) using computational memory
By M. Dazzi et al. (NeurIPS)
In the development of hardware for AI, the use of computational memory presents new opportunities. Researchers have mapped the synaptic weights corresponding to each layer to one or more computational memory (CM) cores. While this architecture could enable the execution of these networks in a highly pipelined fashion, designing an efficient communication fabric becomes a key challenge. The paper 5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks (CNNs) using computational memory presents a novel communication topology that allows unstaggered pipelined execution of all state-of-the-art CNNs including ResNet, Inception, and DenseNet. In a comparison with well-established communication topologies, the one presented by Dazzi et al. yields significant factors of improvement in throughput (7x) and bandwidth (4x).
Test accuracies over 10000 epochs for VGG-16 trained on CIFAR-10 (a) without MI-based regularizer, and (b) with MI-based regularizer. Credit: H. Jonsson et al.
Convergence of DNNs with mutual-information-based regularization
By H. Jonsson et al. (NeurIPS)
In general, information theory concepts understand and improve DNNs. However, mutual information (MI) estimation is important for high-dimensional continuous random variables. The paper Convergence of DNNs with mutual-information-based regularization investigates the convergence of MI estimates between hidden layers and network input/output for a state-of-the-art VGG-16. It demonstrates that MI-based regularization improves and stabilizes the test accuracy and prevents the model from overfitting, especially for a large number of training epochs. It is envisaged that these methods will increase the robustness of analog implementations of DNNs.