#### AI

# Making Neural Networks Robust with New Perspectives

August 13, 2019 | Written by: Pin-Yu Chen and Sijia Liu

Share this post:

Making neural networks robust to adversarially modified data, such as images perturbed imperceptibly by noise, is an important and challenging problem in machine learning research. As such, ensuring robustness is one of IBM’s pillars for Trusted AI.

Adversarial robustness requires new methods for incorporating defenses into the training of neural networks. These methods need to produce models that are both accurate and robust, and also be compatible with conventional training pipelines — while simultaneously increasing the cost of an attack and maintaining the model’s generalization capability. While numerous efforts have been made in robustness for visual recognition models, improving the adversarial robustness of neural networks to new data types like text and graph data requires new approaches. Depending on the data modalities and applications, both the neural network models and the feasible adversarial actions (i.e. the threat models) may vary. IBM researchers have partnered with scientists from MIT, Northeastern University, Boston University and University of Minnesota to publish two papers on novel attacks and defenses for graph neural networks [1] and on a new robust training algorithm called hierarchical random switching [2] at IJCAI 2019.

## Adversarial Robustness of Graph Neural Networks

Graph-structured data plays a crucial role in many AI applications. It is a versatile way to model a wide variety of datasets from many domains, such as molecules, social networks, or interlinked documents with citations. Graph neural networks (GNNs) — which apply deep neural networks to graph data — have achieved significant performance for the task of semi-supervised node classification. However, despite the great success on inferring from graph data, the inherent lack of adversarial robustness in deep learning models still carries over to security-related domains such as blockchain or communication networks.

In our recent work “Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective” [1] at IJCAI’19, we evaluate the robustness of GNNs from the perspective of first-order optimization adversarial attacks. Different from adversarial examples in the image domain, recent work [3,4] suggests that first-order continuous optimization methods do not directly apply to attacks using edge manipulations (namely, topology attack) due to the discrete nature of graphs. We close this gap by studying the problem of generating topology attacks via convex relaxation so that gradient-based adversarial attacks become plausible for GNNs. Benchmarking on node classification tasks using GNNs, our gradient-based topology attacks outperform current state-of-the-art attacks subject to the same topology perturbation constraint. Most important, with the aid of the proposed gradient-based attack, we develop the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both gradient-based and greedy attack methods without sacrificing classification accuracy on the original graph.

Our new attack-generation and adversarial training methods for GNNs are built upon the theoretical foundation of spectral graph theory, first-order optimization, and robust (min-max) optimization. From the attacker’s perspective, we examine two scenarios: a) attacking a pre-defined GNN and b) attacking a re-trainable GNN. This yields two new topology attacks: a projected gradient descent (*PGD*) topology attack, and a *min-max* topology attack. In Table 1, we compare the misclassification rate of four variants of our methods (namely, PGD and min-max attacks under CE and CW attack loss, respectively) to three existing attack methods (namely, DICE [5], Meta-Self attack [6] and greedy attack, a variant of Meta-Self attack without weight retraining for GNN). It is clear that the proposed attack method achieves the best misclassification rate against the natural model and has competitive performance even against the retrained model.

From the defender’s perspective, we examine the performance of our adversarial training for GNNs via robust optimization by leveraging the proposed first-order topology attacks. Figure 1 shows that our robust trained model can provide universal defense to CE-PGD, CW-PGD and greedy attacks.

## Protecting Neural Networks with Hierarchical Random Switching

Stochastic defense is a branch of defense methods aiming to improve adversarial robustness. The main objective of stochastic defense is to introduce useful randomness to the model at the inference phase to increase the difficulties for attackers to craft prediction-evasive adversarial examples. Examples of stochastic defense include enabling dropout at test time and adding Gaussian perturbations to neural networks (e.g. model weights or activation functions), to name a few. However, without proper design of the randomness, many stochastic defenses were later shown to be broken or weakened by advanced adversarial attacks [7]. For instance, the expectation over transformation (EOT) attack [8] can be used to aggregate multiple gradients of a stochastic model to assemble a stronger attack.

While many defenses improve adversarial robustness at the price of decreasing standard accuracy, and there are many technical explanations and empirical findings for the so-called robustness-accuracy trade-off, our IJCAI’19 paper, “Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses,” [2] aims to propose a novel stochastic defense method that can achieve a better robustness-accuracy trade-off.

Our proposed method, which we call hierarchical random switching (HRS), contains several blocks of randomly switching channels derived from a base neural network model in order to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. A visual illustration of our HRS defense is shown in Fig. 2, where a base neural network model is divided into *M* blocks. Each block contains several parallel channels connected by a random switcher.

HRS features a switcher that randomly assigns the input of the block to one of the parallel channels in the run time. All selected channels from each block constitute an* active path* which has the same structure as the base model. Intuitively, an HRS-protected model can assure comparable performance to the base model if each possible active path is fully functional, while its random switching nature prevents the attacker from exploiting the weakness of a fixed model structure. Given the same threat model, HRS increases the attacker’s cost to craft a successful attack as its random switching nature circumvents *all* possible active paths. It is worth noting that HRS is easily compatible with current neural network training pipelines as it is derived from a single base model. The detailed training procedure for HRS can be found in the paper.

Our proposed HRS stochastic defense has the following features:

**1. Improved robustness-accuracy tradeoff:** Under the robustness-accuracy premise, we use the defense efficiency score (DES) as the performance measure, which is defined as the defense rate (fraction of correctly classified adversarial examples) divided by the drop in test accuracy. Fig. 3 shows that on the CIFAR-10 dataset and the same base model, our HRS method attains improved robustness-accuracy trade-off by at least four times greater than three other defense methods, including stochastic defenses and adversarial training [9].

**2. Diversified and decentralized stochastic defense:** When comparing to different stochastic defense methods, Fig. 4 shows our HRS method consistently attains lower attack success rates against standard adversarial attacks (e.g. the projected gradient attack) and advanced randomness-countermeasure attacks such as EOT and fixed-randomness approaches. Our improved defense performance can be explained by the more dispersed input gradient distribution as shown in Fig. 5. Since HRS is a decentralized randomization scheme, each active path of HRS has no privilege over others due to random switching. Therefore, it is fundamentally different from stochastic defenses, making the base model a centralized surrogate model, potentially leveraged by attackers to bypass these defenses.

**References:**

- Kaidi Xu*, Hongge Chen*, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, and Xue Lin, “Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective,” IJCAI 2019 (*equal contribution)
- Xiao Wang*, Siyue Wang*, Pin-Yu Chen, Yanzhi Wang, Brian Kulis, Xue Lin, Sang Chin. “Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses,” IJCAI 2019
- Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J. and Song, L., Adversarial attack on graph structured data. ICML, 2018
- Zügner, Daniel, Amir Akbarnejad, and Stephan Günnemann. “Adversarial attacks on neural networks for graph data.” KDD, 2018.
- Waniek, Marcin, Tomasz P. Michalak, Michael J. Wooldridge, and Talal Rahwan. “Hiding individuals and communities in a social network.” Nature Human Behaviour, 2018
- Zügner, Daniel, and Stephan Günnemann. “Adversarial attacks on graph neural networks via meta learning.” ICLR, 2019
- Anish Athalye, Nicholas Carlini, David Wagner. “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples,” ICML 2019
- Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok. “Synthesizing Robust Adversarial Examples,” ICML 2019
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. “Towards Deep Learning Models Resistant to Adversarial Attacks,” ICLR 2018

**Pin-Yu Chen**

Research Staff Member, IBM Research

**Sijia Liu**

Research Staff Member, IBM Research AI, MIT-IBM Watson AI Lab

### New research helps make AI fairer in decision-making

To tackle bias in AI, our IBM Research team in collaboration with the University of Michigan has developed practical procedures and tools to help machine learning and AI achieve Individual Fairness. The key idea of Individual Fairness is to treat similar individuals well, similarly, to achieve fairness for everyone.

### IBM researchers investigate ways to help reduce bias in healthcare AI

Our study "Comparison of methods to reduce bias from clinical prediction models of postpartum depression” examines healthcare data and machine learning models routinely used in both research and application to address bias in healthcare AI.

### Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.