February 23, 2020 | Written by: Heiko Ludwig and Mathieu Sinn
Categorized: Awards and Prizes
Share this post:
Your AI model might be telling you this is not a cat
Modern AI systems have reached human-level abilities on tasks spanning object recognition in photos, video annotations, speech-to-text conversion and language translation.
Many of these breakthrough achievements are based on a technology called Deep Neural Networks (DNNs). DNNs are complex machine learning models with an uncanny similarity to the interconnected neurons in the human brain, giving them the capability to deal with millions of pixels of high-resolution images, representing patterns of those inputs at various levels of abstraction, and relating those representations to high-level semantic concepts.
But besides all the enthusiasm about the potential of this technology, it is also vulnerable to adversarial attacks. For example, an adversary can change only a few pixels of an image of a cat, which could trick the AI into thinking it’s an ambulance (see this interactive demonstration). Or another form of threat are poisoning attacks, where adversaries tamper with an AI model’s training data before it is created in order to introduce a backdoor that can later be exploited via designated triggers such as a person’s voice or photo (see an interactive demonstration here).
These attacks aren’t science fiction and they aren’t years away, they are happening now. This is why government agencies are taking action. This month DARPA has awarded IBM Research scientists with a $3.4M grant which will run until November 2023. The project is initially awarded for one year, with extensions for up to four. The project is kicked off this past week.
The research will be based on IBMs Adversarial Robustness 360 (ART) toolbox, an open-source library for adversarial machine learning – it’s essentially a weapon for the good-guys with state-of-the-art tools to defend and verify AI models against adversarial attacks.
We will develop open-source extensions of ART to support the evaluation of defenses against adversarial evasion and poisoning attacks under various scenarios, such as black- and white-box attacks, multi-sensor input data, and adaptive adversaries that try to bypass existing defenses.
Of particular interest is the evaluation against adversarial attack scenarios in the physical world. In such scenarios, the attacker first uses ART to generate a digital object (e.g. an STL file). The digital object is then synthesized into a real-world one (e.g. the STL file is printed out with a 3D printer) and then mounted in the designated physical-world context. The next step is to re-digitize the real-world object, e.g. by taking pictures of it with a digital camera from different angles, distances or under controlled lighting conditions. Finally, the re-digitized objects are imported into ART where they serve as inputs to the AI models and defences under evaluation.
We will be publishing updates on a regular basis with updates found in GitHub https://github.com/IBM/adversarial-robustness-toolbox