Protecting the Intellectual Property of AI with Watermarking

Share this post:

Paper authors Jialong Zhang and Marc Ph. Stoecklin

Inventors and co-authors Jialong Zhang and Marc Ph. Stoecklin.

If we can protect videos, audio and photos with digital watermarking, why not AI models?

This is the question my colleagues and I asked ourselves as we looked to develop a technique to assure developers that their hard work in building AI, such as deep learning models, can be protected. You may be thinking, “Protected from what?” Well, for example, what if your AI model is stolen or misused for nefarious purposes, such as offering a plagiarized service built on stolen model? This is an concern, particularly for AI leaders such as IBM.

Earlier this month we presented our research at the AsiaCCS ’18 conference in Incheon, Republic of Korea, and we are proud to say that our comprehensive evaluation technique to address this challenge was demonstrated to be highly effective and robust. Our key innovation is that our concept can remotely verify the ownership of deep neural network (DNN) services using simple API queries.

As deep learning models are more widely deployed and become more valuable, they are increasingly targeted by adversaries. Our idea, which is patent-pending, takes inspiration from the popular watermarking techniques used for multimedia content, such as videos and photos.

Watermarking framework model accuracy over training procedure (CIFAR10)

Model accuracy over training procedure (CIFAR10)

When watermarking a photo there are two stages: embedding and detection. In the embedding stage, owners can overlay the word “COPYRIGHT” on the photo (or watermarks invisible to human perception) and if it’s stolen and used by others we confirm this in the detection stage, whereby owners can extract the watermarks as legal evidence to prove ownership. The same idea can be applied to DNN.

By embedding watermarks to DNN models, if they are stolen, we can verify the ownership by extracting watermarks from the models. However, different from digital watermarking, which embeds watermarks into multimedia content, we needed to design a new method to embed watermarks into DNN models.

In our paper, we describe an approach to infuse watermarks into DNN models, and design a remote verification mechanism to determine the ownership of DNN models by using API calls.

We developed three watermark generation algorithms to generate different types of watermarks for DNN models:

  1. embedding meaningful content together with the original training data as watermarks into the protected DNNs,
  2. embedding irrelevant data samples as watermarks into the protected DNNs, and
  3. embedding noise as watermarks into the protected DNNs.

To test our watermarking framework, we used two public datasets: MNIST, a handwritten digit recognition dataset that has 60,000 training images and 10,000 testing images and CIFAR10, an object classification dataset with 50,000 training images and 10,000 testing images.

Running the experiment is rather straightforward: we simply provide the DNN with a specifically crafted picture, which triggers an unexpected but controlled response if the model has been watermarked. This isn’t the first time watermarking has been considered, but previous concepts were limited by requiring accessing model parameters. However, in the real world, the stolen models are usually deployed remotely, and the plagiarized service would not publicize the parameters of the stolen models. In addition, the embedded watermarks in DNN models are robust and resilient to different counter-watermark mechanisms, such as fine-tuning, parameter pruning, and model inversion attacks.

Alas, our framework does have some limitations. If the leaked model is not deployed as an on-line service but used as an internal service, then we cannot detect any theft, but then of course the plagiarizer cannot directly monetize the stolen models.

In addition, our current watermarking framework cannot protect the DNN models from being stolen through prediction APIs, whereby attackers can exploit the tension between query access and confidentiality in the results to learn the parameters of machine learning models. However, such attacks have only been demonstrated to work well in practice for conventional machine learning algorithms with fewer model parameters such as decision trees and logistic regressions.

We are currently looking to deploy this within IBM and explore how the technology can be delivered as a service for clients.

Protecting Intellectual Property of Deep Neural Networks with Watermarking
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph. Stoecklin, Heqing Huang, Ian Molloy

More Cryptography stories

SysFlow: Scalable System Telemetry for Improved Security Analytics

No organization is safe against cybercrime. Recent studies have shown that these crimes will cost the world well over $5 trillion a year by 2024. Cyber attackers breach corporate networks using a myriad of techniques, with application vulnerabilities corresponding to 25% of all exploitable attack vectors. More disturbing is that these attacks can go unnoticed […]

Continue reading

Top Brazilian Bank Pilots Privacy Encryption Quantum Computers Can’t Break

More than 5,000 publicly disclosed data breaches and billions of personal records exposed. Hacking attacks and deliberate data exfiltration by insiders leading to leaks of sensitive financial, medical and government information. That’s only in 2019, globally — despite the data often being encrypted. But one of the top banks in Brazil wants to avoid such […]

Continue reading

Speculator : A Microscope for Spectre-like Attacks

Ever since the discovery of Spectre and Meltdown at the beginning of 2018, my team at IBM Research – Zurich and I have been focusing very closely on the study of transient execution attacks. Beginning that journey, we quickly realized that in order to study transient execution attacks, we first needed to observe and understand […]

Continue reading