IBM and MIT researchers find a new way to prevent deep learning hacks

Share this post:

Deep learning may have revolutionized AI – boosting progress in computer vision and natural language processing and impacting nearly every industry. But even deep learning isn’t immune to hacking.

Specifically, it’s vulnerable to a curious form of hacking dubbed ‘adversarial examples.’ It’s when a hacker very subtly changes an input in a specific way – such as imperceptibly altering the pixels of an image or the words in a sentence – forcing the deep learning system to catastrophically fail.

AI has to be robust to withstand such attacks – and adversarial robustness also extends to its level of defenses against ‘natural’ adversaries, be it white noise, black-outs, image corruption, text typos or unseen data. While computer vision models are advancing rapidly, it’s possible to make them more robust by exposing them to subtly altered images through adversarial training. But this process is computationally expensive and imperfect; there will always be outlier images that may trip the model up.

And this is what recent research described in a paper presented at this year’s NeurIPS conference aims to change.

In the study, a team of neuroscientists from MIT and the MIT-IBM Watson AI Lab investigated how neuroscience and AI can inform one another. They’ve explored whether the human brain can offer clues on how to make deep neural networks (DNNs) even more powerful and secure. Turns out it can.

The paper describes a new biology-inspired model, dubbed VOneNet (for a specific region of the brain called V1), based on learning from the brain that can help address malicious adversarial attacks of AI models.

The research was led by Harvard graduate student Joel Dapello, the head of MIT’s Department of Brain and Cognitive Sciences James DiCarlo, and Tiago Marques, an MIT postdoc. They worked together with MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Co-director David Cox – to gain insight from the brain’s truly mysterious ways.

Understanding the brain

By its very nature, deep learning or deep neural networks (DNNs) is loosely based on the functioning of the brain, inspired by the structure of biological nervous systems. Deep neural networks are composed of individual ‘cells’ – neurons – connected to each other by ‘synpases’.  “Like in the brain, organizing these elements in a ‘deep’ hierarchy of successive processing stages gives the artificial deep neural networks much of their power,” says IBM researcher David Cox.

However, adversarial attacks highlight a big difference in how deep neural networks and our brains perceive the world. Humans are not fooled at all by the subtle alterations that are able to trick deep neural networks, and our visual systems seem to be substantially more robust. Animal camouflage and optical illusions are probably the closest equivalent to adversarial examples against our brains.

But with a machine, it’s possible to carefully perturb the pixels in the image of a stop sign to trick a deep learning-based computer vision system into misclassifying it as a speed limit sign or anything else the adversary chooses, even though the image looks unchanged to the human eye. It is even possible to create physical objects that will trick AI-based systems, irrespective of the direction the object is viewed from, or how it is lit.

While researchers have made some progress in defending against these kinds of attacks, first discovered in 2013, they are still a serious barrier to a wide deployment of deep learning-based systems. The current approach, called adversarial training, is also extremely computationally expensive. And this is exactly what the new research paper is trying to address.

Learning from biology

The MIT-IBM collaboration has been uncovering useful tricks from neuroscience to infuse into our AI systems for years. Recently, the DiCarlo Lab has developed metrics for comparing data collected from the human brain with artificial neural networks, to understand which systems are closer or further away from biology.

In the latest study, the team explored the adversarial robustness of different models and studied if that was related to how similar they were to the brain. “To our surprise, we have found a strong relationship,” says Cox. “The more adversarially robust a model was, the more closely it seemed to match a particular brain area—V1, the first processing stage of visual information in the cerebral cortex.”

So the team decided to add some well-known elements of V1 processing in the input-stage of a standard DNN. They found out that this addition made any model substantially more robust. On top of that, including this block doesn’t add any more complexity or training cost to the models. It’s much computationally cheaper than the typical adversarial training, and surprisingly effective. It also confers robustness against other kinds of image degradation, like adding noise.

Their brain-inspired model, VOneNet, outperforms the state-of-the-art white-box attacks, where the attacker has access to the model architecture. It also outperforms black-box attacks, where the attacker has no visibility inside. And it does so with little added cost.

While impressive, “there’s certainly more work to be done to ensure models are invulnerable to adversarial attacks,” says Cox. And it’s not just a problem for computer vision. What’s clear, Cox adds, is that this research shows the need to keep learning from neuroscience to further boost adversarial robustness – and vice versa, to understand why something works in an artificial system, and how it can possibly help improve our still limited understanding of the human brain.


IBM Research AI is proudly sponsoring NeurIPS2020 as a Platinum Sponsor, as well as the Women in Machine Learning and Black in AI workshops. We are pleased to report that IBM has had its best year so far at NeurIPS: 46 main track papers, out of which eight are spotlight papers, with one oral presentation. In addition, IBM has 26 workshop papers, six demos and is also organizing three workshops and a competition. We hope you can join us from December 6 – 12 to learn more about our research. Details about our technical program can be found here


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


IBM Research Editorial Lead

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading