Creating secure test data to test systems

Share this post:

Tamer Salman
Editor’s note: This article is authored by Tamer Salman, senior researcher of Security & Quality Technologies at IBM Research – Haifa.
How does an insurance company, financial institution, healthcare organization, or government body get personal and confidential data to develop and test the performance of new software? The challenges associated with managing personal and confidential data are huge, especially with the increasingly stringent data privacy regulations. Some data is private and confidential, other data may have been redesigned and transformed, and some may not exist at all. Typically, project leaders or database administrators will set up an additional environment for development and testing. The big challenge is how to populate it with data.
With expertise in constraint satisfaction and automatic test generation, IBM researchers in Haifa developed the Data Fabrication Platform (DFP). It’s a solution that efficiently creates high quality test data, while eliminating any potential data security or privacy concerns. The platform is already helping a large insurance company revamp their current processes around test data.
Generating masses of personal (but fabricated) data
For most situations, generating the mass of data needed involves in-house scripting, simple data generation techniques, manual insertions and updates, and a lot of masking and data scrubbing. Even after the test data is ready, the requirements can change during development, rendering the current data useless and necessitating a repeat of some processes. The result is a tedious, costly, and time consuming process that doesn’t necessarily deliver results.
In order to accommodate distributed and outsourced development and testing, our client needed test data that would not be susceptible to leaks or breaches in security and privacy. They also need the ability to transform and evolve the data as the business needs changed or were updated. DFP does this by allowing for rule sharing and migration. It also minimizes test-data generation efforts by eliminating security and privacy concerns, and offering support for early development and regression tests. 


Data rules

The logic of what’s needed in these secure, confidential instances can be described using various rules that define the relationships between different columns in your databases, resources for populating new data columns, or transformations from archived data. DFP lets companies put these rules into the system, and get the data needed as output. The platform consumes the provided rules and generates the requested data, which can be automatically inserted into the target databases, or any of a variety of formats, such as XML, CSV, and DML files.

At the heart of the DFP lies a powerful Constraint Satisfaction Problem (CSP) solver, also developed in Haifa. A CSP typically involves numerous possibilities that can’t be solved automatically by straightforward algorithms within an acceptable amount of time. A form of artificial intelligence, the CSP solver from IBM solves these unique complex problems using it’s ability to arrive at many more buildable solutions than traditional optimization approaches. The CSP solver provides accelerated solutions and helps eliminate errors by generating only data that is valid for the specific requirements.

In summary , the IBM Data Fabrication Platform is an easy to use technology that allows for rule sharing and migration, minimizes test-data generation efforts, eliminates security and privacy concerns, and makes it easier for companies to outsource development and testing. 

More stories

A new supercomputing-powered weather model may ready us for Exascale

In the U.S. alone, extreme weather caused some 297 deaths and $53.5 billion in economic damage in 2016. Globally, natural disasters caused $175 billion in damage. It’s essential for governments, business and people to receive advance warning of wild weather in order to minimize its impact, yet today the information we get is limited. Current […]

Continue reading

DREAM Challenge results: Can machine learning help improve accuracy in breast cancer screening?

        Breast Cancer is the most common cancer in women. It is estimated that one out of eight women will be diagnosed with breast cancer in their lifetime. The good news is that 99 percent of women whose breast cancer was detected early (stage 1 or 0) survive beyond five years after […]

Continue reading

Computational Neuroscience

New Issue of the IBM Journal of Research and Development   Understanding the brain’s dynamics is of central importance to neuroscience. Our ability to observe, model, and infer from neuroscientific data the principles and mechanisms of brain dynamics determines our ability to understand the brain’s unusual cognitive and behavioral capabilities. Our guest editors, James Kozloski, […]

Continue reading