Protect your sensitive data using InfoSphere Optim Designer

Ways to design data management services to mask sensitive data

IBM® InfoSphere® Optim™ Data Masking Solution V2.2 is a solution to protect sensitive data in data sources with data privacy functions. IBM InfoSphere Optim Data Masking Solution V2.2 provides various features for data masking as "data privacy policies." Optim Designer is a component of IBM InfoSphere Optim Data Masking Solution V2.2 that allows defining and designing "data management services" to mask sensitive data. This article explains how to protect sensitive data using Optim Designer and provides examples.

Yuka Nomura (yukanm@jp.ibm.com), Software Engineer, IBM

Yuka NomuraYuka Nomura is a software engineer at Yamato Software Development Laboratory, IBM Japan. She has been working for the quality assurance of InfoSphere Optim products.



03 November 2011

Overview

IBM InfoSphere Optim Data Masking Solution V2.2 offers data privacy protection by masking sensitive data such as personally identifying information. This solution offers effective data masking methods in development environments. This article explains how to mask sensitive data using the data masking methods (data privacy policies) in InfoSphere Optim V2.2 and how to design data management services using Optim Designer. In addition, it give examples illustrating how to mask data.


What are Optim Designer and data privacy policies?

IBM InfoSphere Optim Data Masking Solution V2.2

IBM InfoSphere Optim Data Masking Solution V2.2 includes the following components:

  • Optim Designer
  • Optim Manager
  • Optim Management Server
  • Optim Proxy
  • Optim Executor

Optim Designer is a component for designing data masking processes. Such processes are defined as data management services. The designer can invoke the Optim Executor that runs the services. Alternatively, a service can be published to the Optim Manager and passed to the Optim Management Server for execution. The Optim Management Server in turn invokes the Optim Proxy, which then starts the Optim Executor to run the service. To learn about how to install and configure the components of InfoSphere Optim V2.2, please see the "Optim Data Masking Solution Installation Guide."

Optim Designer

Optim Designer is an InfoSphere Optim V2.2 component. It is used to design data masking processes, which are called data management services. Optim Designer is implemented as an Eclipse plug-in with a user-friendly GUI (see Figure 1).

Figure 1. Optim Designer
Screen capture shows Data Project Explorer on left, logical data model editor on right

Optim Designer requires the following:

  • InfoSphere Data Architect — Optim Designer requires an activated copy of InfoSphere Data Architect (IDA) since Optim Designer is installed as one of IDA's plug-ins.
  • Optim Executor and Optim Management Server — To test the data management services using Optim Designer, it is necessary to install Optim Executor on the Optim Designer computer or on a network drive accessible from the Optim Designer computer. The location of Optim Executor must be specified in the Optim Designer Preferences. To use the Lookup option to support data masking, the Optim Management Server must also be installed and operational.

Data privacy policy

This section describes how to mask data using the InfoSphere Optim V2.2 data privacy policies. Data privacy policies are classified into the following:

  • Identity mask policies— Identity mask policies are used to mask various kinds of personal information. This list covers the typical personal information that can be masked by Optim Designer:
    • Name, including first name and last name
    • Email address
    • National ID numbers
    • Credit card numbers
    • Birthdate
    • Address

    For example, if a name mask policy is chosen, then all personal names in the specified column are replaced. The address mask policy replaces postal addresses. If separate fields are available for such address information as ZIP codes, states, and cities, that information should be masked in a consistent way. Credit card numbers can be masked in ways that preserve the key attributes of each credit card issuer. The policies for name and address are Lookup type, which replaces values from selected source entities with values from corresponding lookup table columns. The mask policies for e-mail addresses, IDs, and credit card numbers have two implementations. One is rule-based, and the other is a JavaScript type, which enables you implement custom transformations. For more details about policy types, please refer to the "IBM Optim Designer User Guide."

  • Scramble mask policies— Scramble mask policies are used to mask character and numeric data types. There are two ways to use these policies: scramble or replacement. Scramble changes the order of the characters in the data. Replacement mask each target string with characters that match the type of the replaced character, so that an alphabetic character is replaced with another character from the same alphabet, and a numeric digit is replaced with some random digit. There are two policy types for scramble mask policies: rule-based and JavaScript.
  • Numeric mask policies— Numeric mask policies are used to mask numeric data by generating random values. There are two policy types for numeric mask policies: rule-based and JavaScript.
  • Date privacy policies— Date privacy policies are used to mask dates. The random date-in-range policy generates a randomly selected date within a specified date range. It can be used for credit card expiration date or birthdates. There are two policy types for date privacy policies: rule-based and JavaScript.
  • JavaScript policies— A JavaScript policy includes JavaScript files that describe data transformations applied to attributes in a service. You can use JavaScript to mask numbers with random values, extract substrings, concatenate entity values, and perform other data transformations available by leveraging JavaScript. You can also use JavaScript functions to apply the policies described above.

Steps to design and run data management service with masking policies

Optim Designer supports masking data using data privacy policies. There are two types of data management services: copy services and data transformation services. A copy service copies data from one data model to another and can use filter criteria to select data. A data transformation service masks the data in the source data model. This section focuses on a copy service. The steps are shown in Figure 2 and listed below.

Figure 2. Steps to design and run a data management service
Diagram shows data source flowing to physical data model, to Optim logical data model, to data access plan, providing input to the data management service
  1. Define a data source connection (such as DB2® or Oracle).
  2. Create a physical data model for the source schema describing the data to be masked. This is based on the schema information of the data source.
  3. Transform the physical data model to a logical data model.
  4. Add a Data Access Plan to the logical data model, and define the data privacy policies (except JavaScript Policies) in the plan.
  5. Repeat steps B and C for each target schema for masked data.
  6. Create a data management service specifying the created source logical data model, data access plan, and target logical data model. (The JavaScript Policy can be defined after the creation of the service.)
  7. Run the created service.

Following these steps, the sensitive data will be masked and inserted into the target schema. The next section describes in detail how to implement this process, using example data.


Running a data management service with data privacy policies

This example is a data management service that masks data and runs the actual services. The example uses a DB2 database on Windows® XP and illustrates a scenario for this masking process:

  • Preparing the fabricated customer data (table name: DEMO.CUSTOMERS)
  • Masking all of the information except for the ID numbers
  • Copying the results to the output table DEMOTARGET.CUSTOMERS

Here are the steps to create the example DB2 database with the proper tables. Download the data definition language (DDL) file for the example data from the Download section of this article. Then execute these shell commands:

> db2sampl  --(if you do not have SAMPLE database)
> db2 connect to sample
> db2 -vtf create_demo_tables.txt

Project and data sources

Create a data design project for the existing data models and services, and define a connection to a database. This is done by launching Optim Designer and opening a workspace, then creating a new data design project. Select File > New > Data Design Project. On the New Data Design Project panel, enter "Sample Project" as project name (see Figure 3) and click Finish. The new project is added to the Data Project Explorer view.

Figure 3. New data design project
Screen capture shows entering project name

Next, define a data source connection to the SAMPLE database on the DB2 server. On the Data Source Explorer view, right-click on Database Connections and select New. On the New Connection panel (see Figure 4), select DB2 for Linux®, UNIX®, and Windows® and edit the definition for the SAMPLE database. When finished, click Test Connection, confirm the message "Ping succeeded!" then click OK on the New Connection panel. The data design project and data source are now ready.

Figure 4. New connection
Sreen cap shows entering connection parameters

Data model, data access plan, and data privacy policy

Source data model creation

First, define a data model named "Sample Source," which is a data model for a source table DEMO.CUSTOMERS. In the Data Project Explorer view, open the project you created in the last section. Right-click on the Data Models node and select New > Physical Data Model. On the Model File Panel (see Figure 5), enter the required information, and select (enable) the "Create from reverse engineering" radio button, then click Next. The required information:

  • Filename: Sample Source
  • Database: DB2 for Linux, UNIX, and Windows
  • Version: (select DB2 version)
Figure 5. Model file panel
Screen capture shows input destination folder, filename, database, and version

On the Source panel, choose Database and click Next. On the Select Connection panel, choose SAMPLE and click Next. Then on the Select Schema panel, check DEMO schema and click Finish (see Figure 6).

Figure 6. Select schema
Screen capture shows select schema to reverse-engineer

Next, convert the physical data model to a logical data model. Right-click on Sample Source.dbm and select Transform to Optim logical data model. Click Next, then on the Native Data Source Access panel, uncheck Use the native data source connection as the default for services. Click Next twice, confirm that the transformation to Optim logical data model is complete, then click Finish.

Applying data privacy policy

The next step is supplying the data access plan and privacy policies that will mask the data.

First, create a data access plan. Click to open the logical data model (ldm format file) in the Data Project Explorer, open the package (the default name is Package1), right-click on Data Access Plan node, and select the New > Data Access Plan context menu. Input name as Sample Plan and click Next. Select DEMO and click Next on the Select a Package panel. Select the Select entities based on relationships with a start entity radio button, and click Next. Select CUSTOMERS as the start entity and click Finish.

Next, specify the data privacy policies to mask the data according to Table 1.

Table 1. Data privacy policy
Column NameData TypeDescriptionApplying PolicyPolicy Type
CUSTOMER_IDchar(5)Customer IDN/AN/A
FAMILY_NAMEvarchar(15)Family NameIdentity Privacy Policy(personLookupType.LastName)Lookup
GIVEN_NAMEvarchar(15)Given NameIdentity Privacy Policy(personLookupType.FirstName)Lookup
CREDIT_CARD_NUMvarchar(19)Credit Card NumberMask credit card numbers from all providersRule-based
PASSPORT_NUMchar(9)Passport NumberScramble(Replace Characters)Rule-based
BIRTH_DATEdateBirthdateDate Privacy PolicyRule-based
AGEintAgeJavaScript Policy (to be defined in the Service Plan)

Identity information (lookup)

Mask the United States name and birthdate information using the hash lookup function. Open Sample Plan and click the Data Privacy button on the editor. On the left-side panel, select Executor as the platform and Mask United States personal information as the policy. Choose the GIVEN_NAME column and click Apply (see Figure 7).

Figure 7. Apply identity privacy policies
Screen capture shows data access plan editor

The Map Columns panel will appear and create the mappings as shown in Figure 8.

Figure 8. Map columns
Screen capture shows table columns and selected privacy policy

When finished, click Next. On the Lookup Key Generator panel, select Calculate and use hash value as key, and click Next. Select Sample Project/Sample Source.ldm/Package1/DEMO/CUSTOMERS/CUSTOMERS_ID by clicking the >> button on the Hash Value Inputs, then click Finish.

Credit card mask policy (rule-based)

The credit card policies mask a credit card number (CCN) from the following issuers: American Express, Diners Club, Discover, JCB, MasterCard, and VISA. The usual policies preserve the first four digits of the issuer identifier from the source CCN and mask the remaining two digits of the issuer identifier number and the account number based on the source CCN. Each policy also generates a check digit.

Open Sample Plan and click Data Privacy in the editor, in the same way as for Identity Information masking. On the left-side panel, select Executor as the platform and Mask credit card numbers from all providers as the policy. Choose the CREDIT_CARD_NUM column and click Apply (see Figure 9).

Figure 9. Apply credit card policy
Screen capture shows policies on left, attributes on right

In this example, the policy to be applied does not mask the credit card issuer, so the last two digits of the issuer identifier number will not be masked, but all six of those digits will be visible in the results.

Scramble privacy policy (rule-based)

Mask the passport numbers using the replace-characters policies. The replace-characters policies mask each character in a string with a randomly generated character that matches the type of character being replaced. Since a passport number consists of two alphabetic characters followed by seven digits, two random alphabetic characters will replace the first two, and seven random digits will replace the rest. Select Replace Characters in Scramble Privacy Policies, select the PASSPORT_NUM column, and click Apply. On the Language Character Set panel, select English from the pull-down menu, and click Finish (see Figure 10).

Figure 10. Language character set
Image shows selecting language from pull-down

Date Privacy Policy (Rule-based)

This scenario uses a random date in range policy for masking the birthdates. Select Random date in range from the Date Privacy Policies, select the BIRTH_DATE column, then click Apply. In the Random Date In Range Mask Options panel, enter the values as shown in Figure 11 and click Finish.

Figure 11. Random date in range
Image shows adding policy screen for data range parameters

In this example, the masked data will become random dates from 1 Jan 1940 to 31 Dec 2000. Also, if the date is 4 Dec 1983, the displayed format will be 1983-12-04.

All data privacy policies defined in the data access plan are applied immediately. There is one more policy: the JavaScript policy. It is defined when the service is created.

Target data model creation

The next step defines a data model named "Sample Target," which is a data model for the target table DEMOTARGET.CUSTOMERS. The procedure is the same as before. In the Data Project Explorer view, open the project that was created. Right-click on the Data Models node and select New > Physical Data Model. In the Model File panel, input the following information, select the Create from reverse engineering radio button, then click Next.

  • Filename: Sample Target
  • Database: DB2 for Linux, UNIX, and Windows
  • Version: (select DB2 version)

On the Source panel, choose Database and click Next. On the Select Connection panel, choose SAMPLE and click Next. Then on the Select Schema panel, select the DEMOTARGET schema and click Finish. Next, the physical data model must be converted into a logical data model. Right-click Sample Target.dbm and select Transform to Optim logical data model. Click Next, going to the Native Data Source Access panel, and uncheck Use the native data source connection as the default for services. Click Next twice, confirm that the transformation to the Optim logical data model is complete, and click Finish.

Data Management Service and JavaScript Policy

To create, in the Data Project Explorer view, right-click on the Service node and select the New > Service context menu. On the New Service panel (see Figure 12), choose Copy Service as the service type and input Sample Service as the service name, then click Next.

Figure 12. New service
Image shows new service

In Select a Source Optim Data Source panel (see Figure 13), uncheck Use native data source (if it is checked). Choose SAMPLE and click Next. In the Select a Source Logical Data Model panel (see Figure 14), choose the source data model created previously and click Next.

Figure 13. Select a source Optim data source
Image shows selecting a source Optim data source
Figure 14. Select a source logical data model
Image shows selecting a source logical data model

In the Select a Data Access Plan panel, choose the Sample Plan and click Next. Select Selection a target model and use an auto mapping of entities as a Target Model Option and click Next. In the Select a Source Optim Data Source panel, choose SAMPLE and click Next. In the Select a Target Logical Data Model panel, select Insert as the Target Operation Type, choose the target data model, and click Next. The source schema DEMO and the target schema DEMOTARGET will be listed in the Match Schemas panel. Choose DEMO and DEMOTARGET, then click Next (see Figure 15).

Figure 15. Matching schemas
Image shows matching schemas

Check the message All entities have been mapped and click Finish.

Defining a JavaScript policy

A JavaScript policy is optionally defined as the last step to create a service. Use a service plan to add a JavaScript policy to a service. JavaScript policies can be applied to columns in the target table. Each policy may have one or more JavaScript files bound to it. In this example scenario, a JavaScript policy is added to calculate age from the date of birth and the current date. Double-click Sample Service.svc to open the Service Editor. Click Service Plan then Add Policy (see Figure 16). Enter Sample JavaScript as the policy name. Select Target Data Source Policies in the Policy Category Selector panel and select JavaScript Policy in the Policy Selector panel, then click Finish (see Figure 17).

Figure 16. Service editor
Image shows the service editor
Figure 17. Policy selector
Image shows policy selector

Select the created policy and click Add JavaScript... (see Figure 18).

Figure 18. Add JavaScript
Image shows policy selector

Select the AGE column in the Select an attribute panel and click Next. Leave the filename as the default and click Next. Then the JavaScript editor will open. Following is the example JavaScript for this function:

new Date().getFullYear()
 - record.getItem('/DEMO/CUSTOMERS/BIRTH_DATE').get(java.util.Calendar.YEAR)

This script masks the value of the AGE column, and the new value is subtracted from the value of BIRTH_DATE column using the current date. Save the JavaScript file by clicking File > Save.

Service execution and result verification

Before executing the service, start the Optim Management Server. Right-click on Sample Service.svc and select the Execute Optim Service... context menu (see Figure 19). Then Optim Manager is launched in Designer (called the "Embedded Manager" here). Click Run in the Run Service panel (see Figure 20). Optim Executor is launched, and the service starts running. To see the execution log, click the Service Monitoring tab in the Embedded Manager (see Figure 21).

Figure 19. Execute Optim service
Image shows executing Optim service
Figure 20. Run the service
Image shows running the service
Figure 21. Service monitoring
Image shows service monitoring

When the service terminates successfully, check the DEMOTARGET.CUSTOMERS table to see that the content of DEMO.CUSTOMERS was copied. In the Data Source Explorer view, extract the DEMOTARGET schema in SAMPLE database. Right-click on the CUSTOMERS table and select the Data > Sample Contents context menu (see Figure 22).

Figure 22. Sample contents
Image shows sample contents

The table is displayed in the SQL Result view, with the data as masked from the source data. Figure 23 shows the source data, and Figure 24 is the masked data. The results will be different each time if the policy includes random masking.

Figure 23. Source data
Image shows source data
Figure 24. Masked data
Image shows masked data

Conclusion

IBM InfoSphere Optim Data Masking Solution V2.2 is a solution to protect sensitive data in data sources by using data privacy functions. Optim Designer provides a simple way to design data management services that apply various masking policies to data. This article introduced how to protect sensitive data with Optim Designer based on an example.


Download

DescriptionNameSize
DDL for create the sample table10119_download_20111031.zip---

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=769155
ArticleTitle=Protect your sensitive data using InfoSphere Optim Designer
publish-date=11032011