Thanks to the efforts of researchers who have discovered these weaknesses, countermeasures have been developed to help increase the robustness of machine learning models.
For evasion attacks of the sort just described, experts have developed methods of so-called adversarial training. Essentially, the process simply involves including, alongside “clean” data, data that has been tweaked in the way that hackers might attempt, so the model learns to properly label even these adversarial examples. This mitigation, while effective, can be costly in two senses: 1) it involves more compute, and 2) models may become slightly less accurate overall after exposure to perturbed data. “[T]raining robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy,” write the MIT researchers behind the 2018 paper, “Robustness May Be at Odds with Accuracy.”9
In general, the principles of good cybersecurity apply to the realm of machine learning. Operational defenses include anomaly detection and intrusion detection tools that check for unusual patterns in data or in traffic that might indicate a hacker is attempting to meddle with an ML system, whatever the stage of its life cycle. Additionally, red teaming, or deliberately exposing models to controlled attacks from cybersecurity professionals that simulate those of adversaries, are an effective way to stress-test systems.
In a field as fast moving as AI, the risk landscape is constantly shifting. Organizations like the National Institute of Standards and Technology are sources for the latest developments. NIST’s 2024 report10 on AI risk management touches on adversarial machine learning, while also encompassing approaches to AI risk more broadly—including themes like bias, hallucination, and privacy. Adopting an AI governance framework can also further help secure models against adversaries.