IBM Research Europe

IBM Differential Privacy Library: The single line of code that can protect your data

Share this post:

This year for the first time in its 230-year history the US Census will use differential privacy to keep the responses of its citizens confidential when the data is made available. But how does it work?

Differential privacy uses mathematical noise to preserve individuals’ privacy and confidentiality while allowing population statistics to be observed. This concept has a natural extension to machine learning, where we can protect models against privacy attacks, while maintaining overall accuracy.

The single-line equation that can protect your privacy can help weather the storm from hacks and unintentional data leaks.

For example, if you want to know my age (32) I can pick a random number out of a hat, say ±7 – you will only learn that I could be between 25 and 39. I’ve added a little bit of noise to the data to protect my age and the US Census will do something similar.

While the US government built its own differential privacy tool, IBM has been working on its own open source version and today we are publishing our latest release v0.3. The IBM Differential Privacy Library boasts a suite of tools for machine learning and data analytics tasks, all with built-in privacy guarantees.

Our library is unique to others in giving scientists and developers access to lightweight, user-friendly tools for data analytics and machine learning in a familiar environment – in fact, most tasks can be run with only a single line of code.

What also sets our library apart is our machine learning functionality enables organisations to publish and share their data with rigorous guarantees on user privacy like never before.

Technical details
With v0.3, the library now comes with a budget accountant to track privacy budget spend across different operations. Using advanced composition techniques, the budget accountant allows users to extract even more insight than simpler accounting methods and while it’s hard to quantify, under typical workloads, privacy budget savings in excess of 50 percent are not uncommon.

Our library includes an array of functionality to extract insight and knowledge from data with robust privacy guarantees. We have focused on developing solutions for the most popular algorithms, including histograms, logistic regression, k-means clustering and principal component analysis (PCA), as well as giving developers the basic building blocks of differential privacy to allow them to develop their own custom solutions.

The library includes the following key components which don’t exist in similar libraries currently available:

  • Accountant: Track and limit privacy spend across multiple operations;
  • Mechanisms: A comprehensive collection of the basic building blocks of differential privacy, used to build new tools and applications;
  • Machine learning: Machine learning algorithms for pre-processing, classification, regression and clustering.

Also included is a collection of fundamental tools for data exploration and analytics. All the details for getting started with the library can be found at IBM’s Github repository.

Diffprivlib: The IBM Differential Privacy Library, Naoise Holohan, Stefano Braghin, Pól Mac Aonghusa, Killian Levacher

Research Staff Member, Privacy and Security, IBM Research Europe

More IBM Research Europe stories

Programming microfluidic functionalities in real-time with virtual channels

Work by our group at IBM Research Europe in Zurich has led to a new method for the rapid implementation of microfluidic operations. By tailoring the potential landscape inside a flow cell, we form so-called “virtual channels” on demand to perform high-precision guiding and transport, splitting, merging and mixing of microfluidic flows. This allows to […]

Continue reading

Biological remodelling of liquid water

It is well known that the human body is mostly composed of water: The brain, for example, is 75 percent water and even bones are not “dry” – containing as much as one third water. All of this water maintains the shape and structure of biological cells and is involved in numerous biochemical processes. It […]

Continue reading

IBM Releases Fully Homomorphic Encryption Toolkit for MacOS and iOS; Linux and Android Coming Soon

Often, when I begin explaining fully homomorphic encryption (FHE) to someone for the first time I start by saying that I’ve been working in the field for nearly a decade and yet, I still have to pause to spell it right. So, let’s call it FHE. Half-kidding aside, FHE really sounds like magic when you […]

Continue reading