Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Share this post:

Since its introduction in the 1970s, the backpropagation algorithm (backprop) has been the workhorse for training neural networks has and contributed to impressive successes in deep learning for a wide range of applications. Backprop plays an important role in enabling neural networks to track the errors they make, learn from those mistakes and improve over time.

However, backprop also suffers from a number of flaws, including several well-known computational issues such as the vanishing gradient problem: as the networks get deeper, the gradients of the loss function may start approaching zero, making the network hard to train.

Other limitations of backprop include its inability to handle non-differentiable nonlinearities, e.g. in binary neural networks, which is important for memory- and energy-efficient computing, especially in mobile devices that have limited hardware resources. Furthermore, the sequential nature of backprop (i.e., chain-rule differentiation) does not across networks layers.  Doing so could speed up computation considerably, especially in very deep or recurrent networks. Finally, backprop is often criticized as a biologically implausible learning mechanism that does not explicitly model neural activity.  Backprop uses non-local synaptic updates and has several other properties that do not conform to known biological mechanisms of learning.

Various limitations of backprop continue to motivate the exploration of alternative neural net learning methods. In fact, one of its creators previously said he is “deeply suspicious of back-propagation’’ and his view is “throw it all away and start again.”

Our study, “Beyond Backprop: Online Alternating Minimization with Auxiliary Variables,” in collaboration with NYU and MIT, presented this week at the 2019 ICML conference, proposes a novel alternative to backprop. This new approach shifts the focus towards explicit propagation of neuronal activity by introducing noisy “auxiliary variables,” which break the “gradient chain” into local, independent, layer-wise weight updates that can be done in a parallel manner.


The paper provides novel theoretical convergence guarantees for a general class of online alternating optimization methods. Promising empirical results using multiple datasets and network architectures demonstrate that the new approach can perform on par with the state-of-art stochastic gradient descent (SGD) implementations of backprop and often learns faster initially, when only a small amount of data is available for training.


Our goal initially is not to outperform backprop, but rather to explore alternative learning methods that show competitive performance and, more importantly, and provide new and useful properties that backprop lacks. In our method, such properties are a natural consequence of breaking backprop’s gradient chains into simpler, local optimization problems. As a result, we get  parallel/asynchronous weight updates, elimination of vanishing gradients and easier ways of handling non-differentiable nonlinearities, which enables more energy-efficient computation in binary networks.


Auxiliary-variable methods such as the one developed in this study are also a step closer to biologically plausible learning mechanisms, due to their explicit propagation of neural activity and local synaptic updates.

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables, Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss, Mattia Rigotti, Irina Rish, Brian Kingsbury, Paolo DiAchille, Viatcheslav Gurev, Ravi Tejwani, Djallel Bouneffouf

Staff Member, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading