GDPR and protecting data privacy with cryptographic pseudonyms

Share this post:

Within two years, most of today’s cybersecurity technologies will be obsolete.

Since the beginning of 2016, hackers have stolen more than 8 billion records — more than double the two previous years combined — and that doesn’t account for unreported intrusions.

The current system of patches, firewalls and blacklists isn’t working. It’s no match for the organized crime rings that carry out more than 80 percent of attacks. These groups systematically probe for weaknesses, share tools and techniques, and continually develop countermeasures for even today’s most advanced security technologies.

The best course of action is to constantly innovate.

One method is known as fully homomorphic encryption, which makes it possible to crunch data while its encrypted, meaning the data used never yields any private information. While this could be a great solution, it’s still a few years away from being practical because of processing speed.

Another innovation is called pseudonymization, or if that is a mouthful, desensitized data. The idea is simple, even obvious — transform data so it looks and behaves like the real data, but it’s not.

For the past several years, IBM cryptographers in Zurich have been developing this technology and it is commercialized under the name of the IBM High Assurance Desensitization Engine.

The timing for the availability of this technology couldn’t be better in light of the recent data privacy leaks and the need to meet the EU’s upcoming General Data Protection Regulation (GDPR). This regulation seeks to create a harmonized data protection law framework across the EU which imposes strict rules on those hosting, moving and processing this data anywhere in the world.

The pseudo engine that could

The technology works by creating replicas of production data which are significantly less sensitive than the original data, but maintain all the desired characteristics needed for further use. Put simply, the data maintains its utility while also being privacy friendly.

The IBM tokenization solution works efficiently with different database technologies, provides consistent data across comprehensive application landscapes, includes advanced security functionality and scales to very large volumes (production size). These tokenized replicas can be used for various activities such as performing data analytics, protecting internal confidentiality and supporting regulatory compliance or testing.

In fact, today we are announcing that Rabobank, the Dutch multi-national bank and financial services company, is successfully using the technology. It’s being used for both GDPR compliance and providing data for performance testing for the development of new innovative technologies and services such as mobile apps and payment solutions.

This is what Peter Claassen, Delivery Manager Radical Automation of Rabobank, said publically about the use application of the technology:

“It’s critical for our DevOps team to use data which is as close as possible to production during the testing phase, so when we go live, we are confident that our services will perform. Being able to test and iterate using pseudonymized data is going to unleash new innovations from our DevOps team bringing even more security, innovation and convenience to our clients.”

GDPR compliance

What Rabobank is referring to is not uncommon. In a world where data is considered a natural resource, many enterprises use production data that includes personal client data for more than their primary purpose. The data is also used to run analytics to get better customer insight, or a copy of production data may be used for testing to increase the quality of the software development and minimize production incidents when deploying new releases.

Beginning 25 May 2018, GDPR will impose stricter controls than its predecessor legislations on the use of personal data and prevention of reidentification of individuals, in particular for use beyond the primary business need. Therefore, these additional uses are not allowed anymore in the same way, but only possible under restrictive constraints.

Thankfully, this is where our technology helps. It provides data which is similar to the original data in its behavior, but bears a significantly lower risk for reidentification of individuals. For example, a customer’s name, birthday, address and bank account number would be converted to a completely random set of identifiers.

The benefit is obvious. If this protected data were to fall into the wrong hands, it would be completely useless. Therefore, the regulatory constraints for such data are considerably less restrictive and a range of activities can be executed on the data as before, subject only to some basic operational and technical controls.

The crypto innovation from IBM

Traditional attempts to tokenize multiple application databases have typically suffered from a tradeoff between tokenization security, interoperability between databases, and scalability or efficiency. Our technology largely eliminates these dependencies and constraints, providing high security and high performance tokenization that can scale to large volumes.

The tokenization engine at its core provides not only advanced cryptography to protect the data, but also highly efficient functionality to maintain the format and semantics of the original data. The IBM engine can also cope with reserved values, consider blacklists and whitelists, and manage exceptions and anomalies in existing data.

The simultaneous availability of these capabilities enables us to process terabytes of data and tokenize tens of billions of values quickly, exploiting the built-in data consistency to support full heterogeneous application landscapes.

“All in” when it comes to data privacy

No industry is immune to the threat of a data privacy leak. Fortunately, pseudonymization can be applied across virtually any industry in support of regulatory compliance and the protection of confidential company information.

Typical use cases are the creation of production-grade test data (to increase quality of testing and maintain stable production systems during releases), secure analytics (development and execution of analytic queries on granular but less sensitive data, reducing the need for privileged access), or the exchange of sensitive data between parties for joint use but without disclosing the full information.

This technology is particularly good news for data scientists in sensitive fields such as healthcare who want to study aging demographics or the spread of diseases.

25 May deadline for GDPR

Tokenization is a recurring activity and is best set up in a factory mode, preceded by a pilot project for general and client-specific configuration and tuning. Based on our experience, the definition of an initial tokenization configuration usually takes a few weeks and requires tight collaboration between client and IBM experts. The time required to process a set of pilot databases depends on the availability of infrastructure and the accessibility of data in scope.

Overall, the preparation and setup of a data tokenization factory is expected to take between 4 – 6 months. In factory mode, each new database needs to go through some onboarding steps. Once on-boarded, a database can be reprocessed in a few days in a largely automated mode.


For more information, view the press release:

Associate Partner, IBM Services

More Automation stories

3 Helpful Tips From NRF, Retail’s Biggest Expo

Retail is changing fast. Point-of-sale data, the weather, local events, social media and other data sources are creating unique windows into a customer’s life, transforming the industry. They are creating snapshots into who they are. What clothes they need and when, why they purchased that particular type of baseball bat, and what kind of mattress […]

Continue reading

Let’s Talk Quantum Computing for Business

Learn More The business world is approaching a moment akin to the mainframe’s entrance into the modern office, except this time it’s the emergence of quantum computers. The exciting thing about quantum computers is that they work fundamentally differently from today’s computers. A classical computer makes use of bits to process information, where each bit […]

Continue reading

Integrating hybrid cloud for better organizational performance.

Success. It’s what you want for any project, cloud or otherwise. What one trait do you think successful organizations share in identifying cloud opportunities and implementing integrated solutions? More than twice as many high-performing organizations report fully integrating their cloud initiatives compared to low-performing organizations.1 It stands to reason that decision-makers who want to be […]

Continue reading