Data transformation with IBM Netezza appliance using IBM InfoSphere Data Masking solution

Protecting privacy in the data warehouse

Learn about using the IBM® InfoSphere® Optim® Data Masking solution with IBM InfoSphere Federation Server to de-identify data residing in an IBM Netezza Data Warehouse appliance. In this article, you will follow a series of detailed steps to configure InfoSphere Federation Server and InfoSphere Optim, establish a connection to a Netezza® appliance, extract and de-identify data from Netezza appliance using InfoSphere Optim, and upload the masked data back to the Netezza appliance.

Share:

Harpreet Singh (singhh@us.ibm.com), Quality Assurance Engineer, IBM China

Photo of author Harpreet SinghHarpreet Singh is a member of the Netezza software quality assurance team, specializing in the transaction manager and SQL Optimizer areas. He was previously part of the Informix quality assurance team. His projects over the past five years included leading Informix client SDK testing, leading testing for various porting efforts including Mac OS X and Linux for zSeries, leading embeddability, virtualization, and IBM cloud testing. He currently leads the quality assurance effort for major components and features in Netezza's optimizer area. He is an IBM Certified Specialist for Netezza Platform Software v6. In his spare time, he enjoys traveling, composing music, and playing guitar.



Rajshekar Iyer (iyerr@us.ibm.com), Quality Assurance Engineer, IBM China

Photo of Rajshekar IyerRajshekar Iyer is a quality assurance lead at IBM Netezza. His expertise includes Netezza appliance system management and appliance fault tolerance testing, and he is an IBM Certified Specialist for Netezza Performance Server v6. Previously, he worked on IBM Informix for more than 12 years in the development, quality assurance, and project management roles.



26 April 2012

Also available in Russian

Introduction

Intended versions

The instructions in this article are intended for those using Optim version 8 or below to connect to a Netezza appliance. For instructions on using Optim 9.1 or above to connect directly to a Netezza appliance (PureData System for Analytics), please see the article Use the InfoSphere Optim 9.1 Test Data Management and Data Masking solution for Netezza Data Warehouse.

The InfoSphere Optim Data Masking solution provides functionality to de-identify data that might be deemed sensitive, for example social security numbers or business data such as revenue figures. However, the InfoSphere Optim Data Masking solution does not support a native connection to a Netezza appliance. To mask data inside a Netezza appliance, organizations should first establish a connection using InfoSphere Federation Server. The InfoSphere Federation Server provides the functionality to correlate data from local tables and remote data sources, as if the data is stored locally in the InfoSphere Federation Server database.

To enable the connectivity from InfoSphere Federation Server to a Netezza appliance, the Netezza ODBC driver can be used. Once the connection is established, tables from the Netezza appliance can be mapped to a local database residing in the InfoSphere Federation Server.

Once the table mapping is complete, data from a Netezza appliance can be extracted into a file using the InfoSphere Optim Data Masking solution. The extracted data can be masked utilizing a variety of built-in masking algorithms or custom masking routines. The InfoSphere Optim Data Masking solution can either insert the masked data back into the Netezza appliance, or load it into a local database residing on the InfoSphere Federation Server. The diagram in Figure 1 shows how connectivity is achieved across InfoSphere Federation Server, InfoSphere Optim Data Masking solution and a Netezza appliance.

Figure 1. Connectivity between a Netezza Data Warehouse appliance and InfoSphere Optim Data Masking solution
The figure describes connectivity between a Netezza Data Warehouse Appliance and InfoSphere Optim Data Masking Solution using IBM InfoSphere Federation server

Establishing the connection to the Netezza appliance

Setting up the ODBC DSN connection

On a Microsoft Windows machine, do the following steps, also shown in Figure 2.

  1. Install the Netezza ODBC driver.
  2. Create the ODBC System DSN: Select Control panel -> Administrative Tools -> DataSources(ODBC).
Figure 2. Configuring connection to Netezza appliance using Windows ODBC Data Source Administrator
The figure describes Configuring connection to Netezza appliance using Windows ODBC Data Source Administrator

Connect to a Netezza appliance

Enter the connection information to connect to a Netezza appliance as follows.

  1. Provide an IP address or host name of the Netezza appliance.
  2. Enter the database name that contains the data to be de-identified.
  3. Enter a Netezza database user name and password. Note: The remote user ID is a Netezza database user account that has permission to insert and access the data from the tables in the Netezza database.
  4. Provide a name to identify this data source configuration.
  5. Enter the optional description parameter for this configuration.
  6. Test your connection.
  7. Click OK to exit.

Set up InfoSphere Federation Server

Perform the following steps to set up InfoSphere Federation Server.

  1. Open DB2 control center.
  2. Right click All Databases and create a standard database as shown in Figure 3.
    Figure 3. Creating the setup using InfoSphere Federation Server
    The figure describes Creating the setup using InfoSphere Federation Server

    (View a larger version of Figure 3.)

  3. Expand the database that was created. Click Federated Database Objects and create a new wrapper.
  4. Select ODBC from the drop-down box to create the ODBC wrapper.
  5. Click the Settings tab. Change DB2_FENCED value to Y, and then click OK, as shown in Figure 4.
    Figure 4. Creating a wrapper using InfoSphere Federation Server
    The figure describes Creating wrapper using InfoSphere Federation Server
  6. Go to Tools -> Command editor. Select the database and create the server definition using the following SQL shown in Listing 1.
    Listing 1. SQL for server definition
    CREATE SERVER "NZSQL"
    TYPE ODBC
    VERSION '3.0'
    WRAPPER "ODBC"
    OPTIONS
    (DB2_CHAR_BLANKPADDED_COMPARISON 'Y'
    ,DB2_IUD_ENABLE 'Y'
    ,DB2_MAXIMAL_PUSHDOWN 'Y'
    ,DB2_ONE_REQUEST_PER_CONNECTION 'N'
    ,DB2_POSITIONED_UD 'N'
    ,DB2_TUPLEID_TYPE 'U'
    ,DB2_VARCHAR_BLANKPADDED_COMPARISON 'Y'
    ,IUD_APP_SVPT_ENFORCE 'N'
    ,NODE 'NZSQL'
    ,PUSHDOWN 'Y'
    ,VARCHAR_NO_TRAILING_BLANKS 'Y'
    );

    Note: The NODE name here MUST be the same as the ODBC DSN name that connects with the remote database.

  7. Refresh and expand the Server Definition.
  8. Highlight User Mappings. Right click and select Create.
  9. Enter the remote user ID and password values under the Settings tab as shown in Figure 5.
    Figure 5. User mapping using InfoSphere Federation Server
    The figure describes User mapping using InfoSphere Federation Server

    Note: The remote user ID is a Netezza database user account that has permission to insert and access the data from the tables in the Netezza database.

  10. Push your local user ID to the Selected User ID section under the Users tab.

Creating nicknames

To insert the de-identified data (described in the final step) into a table different from the source table, create that table on the Netezza appliance as a container. The nickname must include that table's name. To create a table (customer_masked) with the same schema as the source table (customer), connect to the database on Netezza appliance and use the following SQL query shown in Listing 2.

Listing 2. SQL for creating table customer_masked
create table customer_masked as select * from customer limit 0;

This SQL query will create a table named customer_masked on the Netezza appliance.

To create the nickname, complete the following steps.

  1. Highlight Nicknames and right-click Create.
  2. Click the Discover button and add the user ID and password to connect to the remote database.
  3. Select the table(s) that contains the data that needs transformation, and then click OK.

Setting up InfoSphere Optim

InfoSphere Optim needs to be configured as follows.

  1. Open Optim Configuration -> Tasks -> Configure the first workstation.
  2. Enter the new Optim directory name and click the Proceed button.
  3. Enter the database connection parameters, the connection string (same as the name of the DB2 database) when prompted, and then click the Proceed button.

Creating DB alias

  1. Open Optim Configuration -> Tasks -> Create DB alias.
  2. Enter the Optim directory name and click the Proceed button.
  3. Provide a DB alias name (or use existing if one was already created earlier) as shown in Figure 6, and then click the Proceed button.
    Figure 6. Creating/updating DB alias
    The figure describes Creating/Updating DB Alias
  4. Enter the database connection parameters, the connection string (same as the name of the DB2 database) and then click the Proceed button.

Data extraction

Create the access definition

To extract the data, the access definition must be created first as follows.

  1. Open InfoSphere Optim -> Definitions -> Access Definition.
  2. From the File menu, click New, and as shown in Figure 7, choose a default qualifier and a start table from which the data will be extracted for conversion.
    Figure 7. Creating Access definition
    The figure describes Creating Access Definition

    (View a larger version of Figure 7.)

  3. From the File menu, click Save.

Create the column map

After creating the access definition, column map is configured as follows.

  1. In InfoSphere Optim, select Definitions -> Column Map.
  2. From the File menu, click New, and then select the Move/Archive for Validation Rules option.
  3. As shown in Figure 8, choose the source table for which the access definition was created. Choose a destination table name also.
    Figure 8. Creating the column map
    The figure describes Creating Column Map

    (View a larger version of Figure 8.)

  4. Save the column map and exit.

Create the table map

Note the following information about destination and source tables.

  • If the destination table name in the table map definition is the same as the source table name, then data will be inserted into the source table on the Netezza appliance.
  • If the destination table name is different from the source table name and the destination table name does not exist in the nickname, then InfoSphere Federation Server will create the table in the InfoSphere Federation Server database and will insert the data into that table.
  • If the destination table name is different from the source table name and the destination table name exists in the nickname on InfoSphere Federation Server, then the data will be inserted into the destination table on Netezza appliance.

Perform the following steps to create the table map.

  1. In InfoSphere Optim, select Definitions -> Table Map.
  2. From the File menu, click New, and then choose the access definition that was saved for the table.
  3. As shown in Figure 9, select the destination qualifier and then modify the destination table to a name of preference (optional). Select the column map (right click List column maps) that was saved for that table.
    Figure 9. Creating table map
    The figure describes Creating Table Map

    (View a larger version of Figure 9.)

  4. Save the table map and exit.

Note: To avoid problems when extracting the data in the next step, adjust the product options by going to InfoSphere Optim -> Options -> Product.

Extract the data

This step explains the data extraction from the Netezza appliance into a file.

  1. From InfoSphere Optim, select Actions -> Extract.
  2. From the File menu, click New, and then choose a file where the extracted data will reside.
  3. As shown in Figure 10, choose the Access Definition option, and then from the File menu, click Run.
    Figure 10. Extracting the data from source
    The figure describes Extracting the data from source

Data conversion

Configure the column maps

The following steps explain how to configure the column maps to de-identify the data for one or more columns.

  1. In InfoSphere Optim, select Definitions -> Column map.
  2. Enter the applicable function for the source columns, then save and exit.
  3. Figure 11 shows the use of the TRANS COL and Shuffle functions.
    Figure 11. Configuring the column maps
    The figure describes Configuring the column maps

    (View a larger version of Figure 11.)

Convert the data

As shown in Figure 12, perform the following steps to convert the data.

  1. From InfoSphere Optim, select Actions -> Convert.
  2. Choose the source file where the extracted data resides.
  3. Choose a control file and a destination file.
  4. Choose the table map options.
  5. Click the Report Options tab and choose the option to notify the errors.
  6. From the File menu, click Run.
    Figure 12. Data conversion
    The figure describes Data Conversion

Data insertion

The following steps explain how to insert the de-identified data into the source table residing in the Netezza appliance. Alternatively, data can be inserted into a different table on Netezza appliance or into a table residing on the InfoSphere Federation Server database. See the Create the table map section shown previously for details.

  1. From InfoSphere Optim, select Actions -> Insert.
  2. As shown in Figure 13, choose the source file where the converted data resides.
  3. Choose a control file.
  4. Choose a table map option (Local or named). Select the Always View Table Map check box to view the table maps before the insert starts.
  5. Pick a desirable process option.
  6. Click the Report Options tab and choose the option to notify the errors.
  7. From the File menu, click Run.
    Figure 13. Data insertion
    The figure describes Data Insertion

    Note: It is good practice to update the statistics by running the GENERATE STATISTICS SQL query on the tables updated with the new de-identified data. This may be beneficial in improving the query performance.

Conclusion

This article demonstrated how the InfoSphere Optim Data Masking solution can be used to de-identify data that might be deemed sensitive residing on a Netezza appliance. This is very useful for customers and developers using the Netezza appliance to configure the InfoSphere Optim Data Masking solution and InfoSphere Federation Server, de-identify the data, and insert it into a table residing in a Netezza appliance or into a table in InfoSphere Federation Server.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=811585
ArticleTitle=Data transformation with IBM Netezza appliance using IBM InfoSphere Data Masking solution
publish-date=04262012