The InfoSphere Optim Data Masking solution provides functionality to de-identify data that might be deemed sensitive, for example social security numbers or business data such as revenue figures. However, the InfoSphere Optim Data Masking solution does not support a native connection to a Netezza appliance. To mask data inside a Netezza appliance, organizations should first establish a connection using InfoSphere Federation Server. The InfoSphere Federation Server provides the functionality to correlate data from local tables and remote data sources, as if the data is stored locally in the InfoSphere Federation Server database.
To enable the connectivity from InfoSphere Federation Server to a Netezza appliance, the Netezza ODBC driver can be used. Once the connection is established, tables from the Netezza appliance can be mapped to a local database residing in the InfoSphere Federation Server.
Once the table mapping is complete, data from a Netezza appliance can be extracted into a file using the InfoSphere Optim Data Masking solution. The extracted data can be masked utilizing a variety of built-in masking algorithms or custom masking routines. The InfoSphere Optim Data Masking solution can either insert the masked data back into the Netezza appliance, or load it into a local database residing on the InfoSphere Federation Server. The diagram in Figure 1 shows how connectivity is achieved across InfoSphere Federation Server, InfoSphere Optim Data Masking solution and a Netezza appliance.
Figure 1. Connectivity between a Netezza Data Warehouse appliance and InfoSphere Optim Data Masking solution
Establishing the connection to the Netezza appliance
Setting up the ODBC DSN connection
On a Microsoft Windows machine, do the following steps, also shown in Figure 2.
- Install the Netezza ODBC driver.
- Create the ODBC System DSN: Select Control panel -> Administrative Tools -> DataSources(ODBC).
Figure 2. Configuring connection to Netezza appliance using Windows ODBC Data Source Administrator
Connect to a Netezza appliance
Enter the connection information to connect to a Netezza appliance as follows.
- Provide an IP address or host name of the Netezza appliance.
- Enter the database name that contains the data to be de-identified.
- Enter a Netezza database user name and password. Note: The remote user ID is a Netezza database user account that has permission to insert and access the data from the tables in the Netezza database.
- Provide a name to identify this data source configuration.
- Enter the optional description parameter for this configuration.
- Test your connection.
- Click OK to exit.
Set up InfoSphere Federation Server
Perform the following steps to set up InfoSphere Federation Server.
- Open DB2 control center.
- Right click All Databases and create a standard
database as shown in Figure 3.
Figure 3. Creating the setup using InfoSphere Federation Server
(View a larger version of Figure 3.)
- Expand the database that was created. Click Federated Database Objects and create a new wrapper.
- Select ODBC from the drop-down box to create the ODBC wrapper.
- Click the Settings tab. Change DB2_FENCED value to
Y, and then click OK, as shown
in Figure 4.
Figure 4. Creating a wrapper using InfoSphere Federation Server
- Go to Tools -> Command editor.
Select the database and create the server definition using the
following SQL shown in Listing 1.
Listing 1. SQL for server definition
CREATE SERVER "NZSQL" TYPE ODBC VERSION '3.0' WRAPPER "ODBC" OPTIONS (DB2_CHAR_BLANKPADDED_COMPARISON 'Y' ,DB2_IUD_ENABLE 'Y' ,DB2_MAXIMAL_PUSHDOWN 'Y' ,DB2_ONE_REQUEST_PER_CONNECTION 'N' ,DB2_POSITIONED_UD 'N' ,DB2_TUPLEID_TYPE 'U' ,DB2_VARCHAR_BLANKPADDED_COMPARISON 'Y' ,IUD_APP_SVPT_ENFORCE 'N' ,NODE 'NZSQL' ,PUSHDOWN 'Y' ,VARCHAR_NO_TRAILING_BLANKS 'Y' );
Note: The NODE name here MUST be the same as the ODBC DSN name that connects with the remote database.
- Refresh and expand the Server Definition.
- Highlight User Mappings. Right click and select Create.
- Enter the remote user ID and password values under the
Settings tab as shown in Figure 5.
Figure 5. User mapping using InfoSphere Federation Server
Note: The remote user ID is a Netezza database user account that has permission to insert and access the data from the tables in the Netezza database.
- Push your local user ID to the Selected User ID section under the Users tab.
To insert the de-identified data (described in the final step) into a table different from the source table, create that table on the Netezza appliance as a container. The nickname must include that table's name. To create a table (customer_masked) with the same schema as the source table (customer), connect to the database on Netezza appliance and use the following SQL query shown in Listing 2.
Listing 2. SQL for creating table customer_masked
create table customer_masked as select * from customer limit 0;
This SQL query will create a table named
customer_masked on the Netezza appliance.
To create the nickname, complete the following steps.
- Highlight Nicknames and right-click Create.
- Click the Discover button and add the user ID and password to connect to the remote database.
- Select the table(s) that contains the data that needs transformation, and then click OK.
Setting up InfoSphere Optim
InfoSphere Optim needs to be configured as follows.
- Open Optim Configuration -> Tasks -> Configure the first workstation.
- Enter the new Optim directory name and click the Proceed button.
- Enter the database connection parameters, the connection string (same as the name of the DB2 database) when prompted, and then click the Proceed button.
Creating DB alias
- Open Optim Configuration -> Tasks -> Create DB alias.
- Enter the Optim directory name and click the Proceed button.
- Provide a DB alias name (or use existing if one was already created
earlier) as shown in Figure 6, and then click the
Figure 6. Creating/updating DB alias
- Enter the database connection parameters, the connection string (same as the name of the DB2 database) and then click the Proceed button.
Create the access definition
To extract the data, the access definition must be created first as follows.
- Open InfoSphere Optim -> Definitions -> Access Definition.
- From the File menu, click New, and
as shown in Figure 7, choose a default qualifier and a start table
from which the data will be extracted for conversion.
Figure 7. Creating Access definition
(View a larger version of Figure 7.)
- From the File menu, click Save.
Create the column map
After creating the access definition, column map is configured as follows.
- In InfoSphere Optim, select Definitions -> Column Map.
- From the File menu, click New, and then select the Move/Archive for Validation Rules option.
- As shown in Figure 8, choose the source table for which the access
definition was created. Choose a destination table name also.
Figure 8. Creating the column map
(View a larger version of Figure 8.)
- Save the column map and exit.
Create the table map
Note the following information about destination and source tables.
- If the destination table name in the table map definition is the same as the source table name, then data will be inserted into the source table on the Netezza appliance.
- If the destination table name is different from the source table name and the destination table name does not exist in the nickname, then InfoSphere Federation Server will create the table in the InfoSphere Federation Server database and will insert the data into that table.
- If the destination table name is different from the source table name and the destination table name exists in the nickname on InfoSphere Federation Server, then the data will be inserted into the destination table on Netezza appliance.
Perform the following steps to create the table map.
- In InfoSphere Optim, select Definitions -> Table Map.
- From the File menu, click New, and then choose the access definition that was saved for the table.
- As shown in Figure 9, select the destination qualifier and then modify
the destination table to a name of preference (optional). Select the
column map (right click List column maps) that was
saved for that table.
Figure 9. Creating table map
(View a larger version of Figure 9.)
- Save the table map and exit.
Note: To avoid problems when extracting the data in the next step, adjust the product options by going to InfoSphere Optim -> Options -> Product.
Extract the data
This step explains the data extraction from the Netezza appliance into a file.
- From InfoSphere Optim, select Actions -> Extract.
- From the File menu, click New, and then choose a file where the extracted data will reside.
- As shown in Figure 10, choose the Access Definition
option, and then from the File menu, click
Figure 10. Extracting the data from source
Configure the column maps
The following steps explain how to configure the column maps to de-identify the data for one or more columns.
- In InfoSphere Optim, select Definitions -> Column map.
- Enter the applicable function for the source columns, then save and exit.
- Figure 11 shows the use of the TRANS COL and
Figure 11. Configuring the column maps
(View a larger version of Figure 11.)
Convert the data
As shown in Figure 12, perform the following steps to convert the data.
- From InfoSphere Optim, select Actions -> Convert.
- Choose the source file where the extracted data resides.
- Choose a control file and a destination file.
- Choose the table map options.
- Click the Report Options tab and choose the option to notify the errors.
- From the File menu, click Run.
Figure 12. Data conversion
The following steps explain how to insert the de-identified data into the source table residing in the Netezza appliance. Alternatively, data can be inserted into a different table on Netezza appliance or into a table residing on the InfoSphere Federation Server database. See the Create the table map section shown previously for details.
- From InfoSphere Optim, select Actions -> Insert.
- As shown in Figure 13, choose the source file where the converted data resides.
- Choose a control file.
- Choose a table map option (Local or named). Select the Always View Table Map check box to view the table maps before the insert starts.
- Pick a desirable process option.
- Click the Report Options tab and choose the option to notify the errors.
- From the File menu, click Run.
Figure 13. Data insertion
Note: It is good practice to update the statistics by running the GENERATE STATISTICS SQL query on the tables updated with the new de-identified data. This may be beneficial in improving the query performance.
This article demonstrated how the InfoSphere Optim Data Masking solution can be used to de-identify data that might be deemed sensitive residing on a Netezza appliance. This is very useful for customers and developers using the Netezza appliance to configure the InfoSphere Optim Data Masking solution and InfoSphere Federation Server, de-identify the data, and insert it into a table residing in a Netezza appliance or into a table in InfoSphere Federation Server.
- Learn more about Data privacy functions.
- Refer to the InfoSphere Optim Data Masking solution web page for more information.
- Learn about the InfoSphere Federation Server.
- Stay current with IBM Netezza Data Warehouse Appliances.
- Visit the developerWorks Information Management zone: Find more resources for DB2 developers and administrators.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Information management on developerWorks
Experiment with new directions in software development.
Read and subscribe for the best and latest technical info to help you deal with your development challenges.
Software development in the cloud. Register today and get free private projects through 2014.
Evaluate IBM software and solutions, and transform challenges into opportunities.