IBM InfoSphere Optim Data Masking Solution V2.2 offers data privacy protection by masking sensitive data such as personally identifying information. This solution offers effective data masking methods in development environments. This article explains how to mask sensitive data using the data masking methods (data privacy policies) in InfoSphere Optim V2.2 and how to design data management services using Optim Designer. In addition, it give examples illustrating how to mask data.
What are Optim Designer and data privacy policies?
IBM InfoSphere Optim Data Masking Solution V2.2
IBM InfoSphere Optim Data Masking Solution V2.2 includes the following components:
- Optim Designer
- Optim Manager
- Optim Management Server
- Optim Proxy
- Optim Executor
Optim Designer is a component for designing data masking processes. Such processes are defined as data management services. The designer can invoke the Optim Executor that runs the services. Alternatively, a service can be published to the Optim Manager and passed to the Optim Management Server for execution. The Optim Management Server in turn invokes the Optim Proxy, which then starts the Optim Executor to run the service. To learn about how to install and configure the components of InfoSphere Optim V2.2, please see the "Optim Data Masking Solution Installation Guide."
Optim Designer is an InfoSphere Optim V2.2 component. It is used to design data masking processes, which are called data management services. Optim Designer is implemented as an Eclipse plug-in with a user-friendly GUI (see Figure 1).
Figure 1. Optim Designer
Optim Designer requires the following:
- InfoSphere Data Architect — Optim Designer requires an activated copy of InfoSphere Data Architect (IDA) since Optim Designer is installed as one of IDA's plug-ins.
- Optim Executor and Optim Management Server — To test the data management services using Optim Designer, it is necessary to install Optim Executor on the Optim Designer computer or on a network drive accessible from the Optim Designer computer. The location of Optim Executor must be specified in the Optim Designer Preferences. To use the Lookup option to support data masking, the Optim Management Server must also be installed and operational.
This section describes how to mask data using the InfoSphere Optim V2.2 data privacy policies. Data privacy policies are classified into the following:
- Identity mask policies— Identity mask policies
are used to mask various kinds of personal information. This list
covers the typical personal information that can be masked by Optim
- Name, including first name and last name
- Email address
- National ID numbers
- Credit card numbers
Steps to design and run data management service with masking policies
Optim Designer supports masking data using data privacy policies. There are two types of data management services: copy services and data transformation services. A copy service copies data from one data model to another and can use filter criteria to select data. A data transformation service masks the data in the source data model. This section focuses on a copy service. The steps are shown in Figure 2 and listed below.
Figure 2. Steps to design and run a data management service
- Define a data source connection (such as DB2® or Oracle).
- Create a physical data model for the source schema describing the data to be masked. This is based on the schema information of the data source.
- Transform the physical data model to a logical data model.
- Repeat steps B and C for each target schema for masked data.
- Run the created service.
Following these steps, the sensitive data will be masked and inserted into the target schema. The next section describes in detail how to implement this process, using example data.
Running a data management service with data privacy policies
This example is a data management service that masks data and runs the actual services. The example uses a DB2 database on Windows® XP and illustrates a scenario for this masking process:
- Preparing the fabricated customer data (table name: DEMO.CUSTOMERS)
- Masking all of the information except for the ID numbers
- Copying the results to the output table DEMOTARGET.CUSTOMERS
Here are the steps to create the example DB2 database with the proper tables. Download the data definition language (DDL) file for the example data from the Download section of this article. Then execute these shell commands:
> db2sampl --(if you do not have SAMPLE database) > db2 connect to sample > db2 -vtf create_demo_tables.txt
Project and data sources
Create a data design project for the existing data models and services, and define a connection to a database. This is done by launching Optim Designer and opening a workspace, then creating a new data design project. Select File > New > Data Design Project. On the New Data Design Project panel, enter "Sample Project" as project name (see Figure 3) and click Finish. The new project is added to the Data Project Explorer view.
Figure 3. New data design project
Next, define a data source connection to the SAMPLE database on the DB2 server. On the Data Source Explorer view, right-click on Database Connections and select New. On the New Connection panel (see Figure 4), select DB2 for Linux®, UNIX®, and Windows® and edit the definition for the SAMPLE database. When finished, click Test Connection, confirm the message "Ping succeeded!" then click OK on the New Connection panel. The data design project and data source are now ready.
Figure 4. New connection
Source data model creation
First, define a data model named "Sample Source," which is a data model for a source table DEMO.CUSTOMERS. In the Data Project Explorer view, open the project you created in the last section. Right-click on the Data Models node and select New > Physical Data Model. On the Model File Panel (see Figure 5), enter the required information, and select (enable) the "Create from reverse engineering" radio button, then click Next. The required information:
- Filename: Sample Source
- Database: DB2 for Linux, UNIX, and Windows
- Version: (select DB2 version)
Figure 5. Model file panel
On the Source panel, choose Database and click Next. On the Select Connection panel, choose SAMPLE and click Next. Then on the Select Schema panel, check DEMO schema and click Finish (see Figure 6).
Figure 6. Select schema
Next, convert the physical data model to a logical data model. Right-click on Sample Source.dbm and select Transform to Optim logical data model. Click Next, then on the Native Data Source Access panel, uncheck Use the native data source connection as the default for services. Click Next twice, confirm that the transformation to Optim logical data model is complete, then click Finish.
The next step is supplying the data access plan and privacy policies that will mask the data.
First, create a data access plan. Click to open the logical data model (ldm
format file) in the Data Project Explorer, open the package (the default
name is Package1), right-click on Data Access Plan node, and select the
New > Data Access Plan context menu. Input name as
Sample Plan and
click Next. Select DEMO and click
Next on the Select a Package panel.
Select the Select entities based on relationships with a start
entity radio button, and click Next. Select
CUSTOMERS as the
start entity and click Finish.
Next, specify the data privacy policies to mask the data according to Table 1.
|Column Name||Data Type||Description||Applying Policy||Policy Type|
|CREDIT_CARD_NUM||varchar(19)||Credit Card Number||Mask credit card numbers from all providers||Rule-based|
|PASSPORT_NUM||char(9)||Passport Number||Scramble(Replace Characters)||Rule-based|
Identity information (lookup)
Mask the United States name and birthdate information using the hash lookup function. Open Sample Plan and click the Data Privacy button on the editor. On the left-side panel, select Executor as the platform and Mask United States personal information as the policy. Choose the GIVEN_NAME column and click Apply (see Figure 7).
Figure 7. Apply identity privacy policies
The Map Columns panel will appear and create the mappings as shown in Figure 8.
Figure 8. Map columns
When finished, click Next. On the Lookup Key Generator panel, select Calculate and use hash value as key, and click Next. Select Sample Project/Sample Source.ldm/Package1/DEMO/CUSTOMERS/CUSTOMERS_ID by clicking the >> button on the Hash Value Inputs, then click Finish.
Credit card mask policy (rule-based)
The credit card policies mask a credit card number (CCN) from the following issuers: American Express, Diners Club, Discover, JCB, MasterCard, and VISA. The usual policies preserve the first four digits of the issuer identifier from the source CCN and mask the remaining two digits of the issuer identifier number and the account number based on the source CCN. Each policy also generates a check digit.
Open Sample Plan and click Data Privacy in the editor, in the same way as for Identity Information masking. On the left-side panel, select Executor as the platform and Mask credit card numbers from all providers as the policy. Choose the CREDIT_CARD_NUM column and click Apply (see Figure 9).
Figure 9. Apply credit card policy
In this example, the policy to be applied does not mask the credit card issuer, so the last two digits of the issuer identifier number will not be masked, but all six of those digits will be visible in the results.
Mask the passport numbers using the replace-characters policies. The replace-characters policies mask each character in a string with a randomly generated character that matches the type of character being replaced. Since a passport number consists of two alphabetic characters followed by seven digits, two random alphabetic characters will replace the first two, and seven random digits will replace the rest. Select Replace Characters in Scramble Privacy Policies, select the PASSPORT_NUM column, and click Apply. On the Language Character Set panel, select English from the pull-down menu, and click Finish (see Figure 10).
Figure 10. Language character set
This scenario uses a random date in range policy for masking the birthdates. Select Random date in range from the Date Privacy Policies, select the BIRTH_DATE column, then click Apply. In the Random Date In Range Mask Options panel, enter the values as shown in Figure 11 and click Finish.
Figure 11. Random date in range
In this example, the masked data will become random dates from 1 Jan 1940 to 31 Dec 2000. Also, if the date is 4 Dec 1983, the displayed format will be 1983-12-04.
Target data model creation
The next step defines a data model named "Sample Target," which is a data model for the target table DEMOTARGET.CUSTOMERS. The procedure is the same as before. In the Data Project Explorer view, open the project that was created. Right-click on the Data Models node and select New > Physical Data Model. In the Model File panel, input the following information, select the Create from reverse engineering radio button, then click Next.
- Filename: Sample Target
- Database: DB2 for Linux, UNIX, and Windows
- Version: (select DB2 version)
On the Source panel, choose Database and click Next. On the Select Connection panel, choose SAMPLE and click Next. Then on the Select Schema panel, select the DEMOTARGET schema and click Finish. Next, the physical data model must be converted into a logical data model. Right-click Sample Target.dbm and select Transform to Optim logical data model. Click Next, going to the Native Data Source Access panel, and uncheck Use the native data source connection as the default for services. Click Next twice, confirm that the transformation to the Optim logical data model is complete, and click Finish.
To create, in the Data Project Explorer view, right-click on the
Service node and select
the New > Service context menu. On the New Service
panel (see Figure 12),
choose Copy Service as the service type and input
Sample Service as
the service name, then click Next.
Figure 12. New service
In Select a Source Optim Data Source panel (see Figure 13), uncheck Use native data source (if it is checked). Choose SAMPLE and click Next. In the Select a Source Logical Data Model panel (see Figure 14), choose the source data model created previously and click Next.
Figure 13. Select a source Optim data source
Figure 14. Select a source logical data model
In the Select a Data Access Plan panel, choose the Sample Plan and click Next. Select Selection a target model and use an auto mapping of entities as a Target Model Option and click Next. In the Select a Source Optim Data Source panel, choose SAMPLE and click Next. In the Select a Target Logical Data Model panel, select Insert as the Target Operation Type, choose the target data model, and click Next. The source schema DEMO and the target schema DEMOTARGET will be listed in the Match Schemas panel. Choose DEMO and DEMOTARGET, then click Next (see Figure 15).
Figure 15. Matching schemas
Check the message All entities have been mapped and click Finish.
birth and the current date. Double-click Sample Service.svc to open the
Service Editor. Click Service Plan then Add
Policy (see Figure 16). Enter
panel, then click Finish (see Figure 17).
Figure 16. Service editor
Figure 17. Policy selector
new Date().getFullYear() - record.getItem('/DEMO/CUSTOMERS/BIRTH_DATE').get(java.util.Calendar.YEAR)
Service execution and result verification
Before executing the service, start the Optim Management Server. Right-click on Sample Service.svc and select the Execute Optim Service... context menu (see Figure 19). Then Optim Manager is launched in Designer (called the "Embedded Manager" here). Click Run in the Run Service panel (see Figure 20). Optim Executor is launched, and the service starts running. To see the execution log, click the Service Monitoring tab in the Embedded Manager (see Figure 21).
Figure 19. Execute Optim service
Figure 20. Run the service
Figure 21. Service monitoring
When the service terminates successfully, check the DEMOTARGET.CUSTOMERS table to see that the content of DEMO.CUSTOMERS was copied. In the Data Source Explorer view, extract the DEMOTARGET schema in SAMPLE database. Right-click on the CUSTOMERS table and select the Data > Sample Contents context menu (see Figure 22).
Figure 22. Sample contents
The table is displayed in the SQL Result view, with the data as masked from the source data. Figure 23 shows the source data, and Figure 24 is the masked data. The results will be different each time if the policy includes random masking.
Figure 23. Source data
Figure 24. Masked data
IBM InfoSphere Optim Data Masking Solution V2.2 is a solution to protect sensitive data in data sources by using data privacy functions. Optim Designer provides a simple way to design data management services that apply various masking policies to data. This article introduced how to protect sensitive data with Optim Designer based on an example.
|DDL for create the sample table||10119_download_20111031.zip||---|
- Learn more by reading the Release notes - IBM InfoSphere Optim Data Masking Solution, Version 2, Release 2.
- Visit InfoSphere Optim data lifecycle management on developerWorks.
- Be sure to check out the InfoSphere Optim Data Masking Solution.
- Learn more about protecting data privacy and ensuring data security with InfoSphere Optim and InfoSphere Guardium® solutions.
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on Twitter.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.