Organizations need to strike a perplexing balance when launching strategic AI initiatives: data needs to be accessible, without compromising privacy regulation compliance or the speed of business innovation. Customer trust and brand reputation are key competitive advantages, so accelerated digital transformation and growth relies on businesses being smart about protecting sensitive customer data while still preserving data utility for AI and analytics teams.

Three questions organizations need to confront when it comes to leveraging customer data are:

  • How can initiatives inside and outside my organization work securely with personal information (PI) and sensitive data?
  • How can I remove PI from datasets without affecting the integrity of the data or accuracy of my projects’ results?
  • How can I actively protect PI and sensitive data whenever they are accessed, wherever they reside?

When organizations do not have answers readily available to the questions above, then AI projects are often stalled and collaboration using meaningful data is limited. Gartner predicts that by 2024, the use of data protection techniques will increase industry collaborations on AI projects by 70%.

In my blog, I discussed the new IBM® AutoPrivacy framework and the key use cases delivered via IBM Cloud Pak® for Data. Today I will expand on the advanced data protection use case, which is one of key capabilities in the AutoPrivacy framework.

Data protection and de-identification of sensitive data are not new concepts. Although these concepts have been well known for many years, most enterprises did not employ these practices consistently. The enforcement of GDPR has drastically changed that and in the post-GPDR era, enterprises are hyperaware of data protection regulations that they must adhere to. With the enforcement of GDPR (Europe), CCPA (California), LGPD (Brazil) and many other data protection legislations in recent months, consumers are now well aware of their privacy rights and are demanding that enterprises provide transparent privacy protection approaches.

Historically, enterprises have used many methods of sensitive data protection, including redaction and various forms of masking such as substitution, shuffling or randomization. However, with the employment of deep (learning) neural network technology in AI, data science and analytical modeling, the risk of re-identification has been increasing. Hence, there is a need for newer data protection techniques and robust encryption algorithms that can enhance privacy but also preserve utility of the data.

By far, the most important requirement from IBM customers has been the consistent enforcement of data protection policies, regardless of where the data resides.

Data cannot simply be de-identified randomly; important relationships must be maintained. Format preservation is a fundamental requirement. Values must be de-identified consistently across the enterprise, respecting relationships across multiple data assets. For example, de-identification of a credit card number, personal first and last names, or any other entity identifiers must be repeatable consistently across data sources in on-premises and hybrid cloud environments.

In addition, I have often encountered unique industry use cases where there is a need for special treatment of certain data elements. For example, in financial services and healthcare, the time intervals between certain dates should be the same whether unmasked or masked. The accuracy of dates of disease treatment in healthcare are critical for biomedical research, so while shifting dates, it’s very important to maintain the right intervals. Similarly, the interval between a date of birth and date of an auto policy agreement (in other words, the customer’s age) may make a very big difference in the cost and available features of auto insurance.

Most customers require support for custom de-identification when it comes to complex, multi-field computation using a low-code or no-code approach. There are also several use cases that require the addition of statistical noise to hide individual data and only surface group level information for analytics.

These rich data protection and consistent policy enforcement capabilities are available via IBM Watson® Knowledge Catalog Enterprise Edition to address a wide range of use cases.

The future is bright as the latest privacy enhancing technologies such as differential privacy, synthetic data fabrication and more are brought into the solution. These technologies, paired with the power of IBM Cloud Pak for Data, will allow data science teams to make choices along the privacy-utility spectrum and continue to push the boundaries of AI initiatives.

Learn more about data and AI at IBM

Read more about the IBM unified data privacy framework that can help you understand how sensitive data is used, stored and accessed throughout your organization.

Explore the IBM unified data privacy framework

To help our clients solve for synthetic data generation and more, we offer IBM As part of the IBM watsonx platform that brings together new generative AI capabilities, is powered by foundation models and traditional machine learning into a powerful studio spanning the AI lifecycle.

With, you can train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with ease and build AI applications in a fraction of the time with a fraction of the data. Within the solution you can generate a synthetic tabular data set leveraging your existing data or a custom data schema. You can also connect to your existing database, upload a data file, anonymize columns and generate as much data as needed to address your data gaps or train your classical AI models.

Learn more about
Was this article helpful?

More from Artificial intelligence

IBM unveils IBM Cloud Pak for Data 5.0, revolutionizing AI implementations with Data Fabric Architecture

6 min read - The notion that AI cannot exist without IA holds especially true in the current technology landscape. Enterprises need an information architecture (IA) that produces strong data quality for trustworthy outputs. A data fabric architecture solves that challenge for organizations by providing an open and trusted data foundation that enhances the quality of the organizations’ data, ultimately enabling them to scale AI.  IBM® has announced the release of IBM Cloud Pak® for Data 5.0, a cloud-native insight platform that integrates the…

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

Rethink IT spend in the age of generative AI

3 min read - It’s the burning question for today’s CIOs: what do you spend your IT budget on? Cloud costs were already a challenge—in a recent survey, 24% estimated they wasted software spend. The explosion of generative AI makes it critical for organizations to consider frameworks like FinOps and technology business management (TBM) for visibility and accountability of all tech spend. But what does this all mean in practice? How can organizations shift to a more disciplined, value-driven approach to IT spend? What…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters