Publications

A Biologically Plausible Learning Algorithm for Neural Networks

Share this post:

In spite of the great success of deep learning on a range of computationally challenging tasks, questions remain as to the extent of the similarity between the computational properties of deep neural networks and those of the human brain. The particularly nonbiological aspect of deep learning is the supervised training process with a backpropagation algorithm, which requires massive amounts of labeled data, and a nonlocal learning rule for changing the weights. My colleague and I developed a learning algorithm inspired by synaptic plasticity rules conceptually similar to those operating in real biological neural networks. We describe this algorithm in our paper “Unsupervised Learning by Competing Hidden Units” published last week in the journal Proceedings of the National Academies of Science of the United States of America. The proposed algorithm learns the weights of the lower layer of neural networks in a completely unsupervised fashion. These weights are agnostic about the task that the neural network will have to solve eventually in higher layers. In spite of this, they can be used to train a good classifier in the higher layers that is tailored for some specific task. The entire algorithm utilizes local learning rules for updating the weights.

The key idea of the algorithm relies on a network motif with the lateral inhibition between the hidden neurons and a learning rule that is inspired by Hebbian-like plasticity mechanisms known to exist in biology. For every data point, such as an image, the hidden neurons compete with each other so that eventually only a handful of them remain active while the majority fall below the activation threshold. The activities of those active hidden neurons together with the activities of the visible neurons are then used to update the weights. This learning rule utilizes only local information that is directly knowable by the two neurons that are connected by a given weight. The growth of the weights as the training progresses is shown in the video below.

The paper compares the performance of two networks. The first one is trained using the two-stage procedure: the proposed “biological” training of the first layer followed by the standard gradient descent training of the classifier in the top layer. The second one is trained end-to-end with the backpropagation algorithm on a supervised task. In our paper we investigate the proposed “biological” algorithm in the framework of fully connected neural networks with one hidden layer on the pixel permutation invariant MNIST and CIFAR-10 datasets.

In the case of MNIST, the weights of the hidden layer together with the errors on the training and the test sets for the two networks are shown in the figure below. The network that is trained with the backpropagation algorithm end-to-end demonstrates the well-known benchmarks: training error = 0%, test error = 1.5%. The network trained in a “biological” way reaches the error on the training set 0.4%. Thus, it never fits the training data perfectly. At the same time, the error on the held-out test set is 1.46%, the same as the error of the network that is trained end-to-end. This is surprising because the proposed “biological” algorithm learned the weights of the first layer without knowing what task these weights will be used for, unlike the network that is trained end-to-end. Also, the “biological” algorithm was constrained by the requirement to utilize only local plasticity rules for learning those weights.

Another interesting aspect is that the weights of the proposed “biological” algorithm (left panel) are very different from the weights learned by the backpropagation algorithm (middle panel). At the same time, they are not just copies of the individual training examples, and encode both the presence (red color) and the absence (blue color) of ink in the MNIST images. The full “biological” network learns a distributed representation of the training data over multiple hidden units.

A similar comparison was done on CIFAR-10 dataset. In this case the accuracy of our “biological” algorithm is slightly worse than that of the network trained end-to-end. However, it still demonstrates a good performance.

Learn more details by reading the full paper or watching a video lecture discussing the results. You can also download the code used for the “biological” training.

Research Staff Member, IBM Research

More Publications stories

Why Water Doesn’t Behave Like a “Normal” Liquid

New research suggests that several factors acting synergistically cause the behavioral anomalies of water, an essential yet incompletely understood liquid.

Continue reading

Shaping Microscale Flows with Electric Fields

The field of lab-on-a-chip seeks to revolutionize chemical and biological analysis by reducing large scale laboratories to the size of a microfluidic chip.

Continue reading

High-Efficiency Distributed Learning for Speech Modeling

A distributed deep learning architecture for automatic speech recognition that shortens run time without compromising model accuracy.

Continue reading