Use the InfoSphere Optim 9.1 Test Data Management and Data Masking solution for Netezza Data Warehouse

IBM® Netezza® is known as the simple data warehouse appliance for serious analytics. Customers using Netezza have been looking at ways to leverage InfoSphere® Optim® to create their right-sized environments with realistic data for application testing and development. InfoSphere Optim 9.1 now has native support for Netezza to leverage Test Data Management (TDM) and Data Masking (DM) capabilities. This developerWorks article will give you an understanding of InfoSphere Optim solutions for Netezza, help you understand the technical architecture between the two products, and show you how Netezza can leverage the InfoSphere Optim solutions to reduce costs, reduce risks, and speed up data delivery.

Share:

Pirabu Pathmasenan (pirabu@ca.ibm.com), IM Premium Support Manager, IBM

Author photo of Pirabu PathmasenanPirabu Pathmasenan, is an IBM Premium Support Manager in the Information Management organization of IBM for DB2 for Linux, UNIX, and Windows, as well as InfoSphere Optim. Prior to this position, Pirabu was a technical leader in the DB2 QA organization where he was responsible for leading, developing, and inventing new tools and processes to improve the quality of the DB2 database product. Pirabu has been with IBM for over 6 years and currently has two papers published on ip.com. He has over 8 years of working experience in the software industry, mainly in software quality assurance and software project management.



Raki Robert (raki@ca.ibm.com), InfoSphere Optim and Guardium Technical Consultant, IBM

Author photo of Raki RobertRaki Robert is an InfoSphere Optim and Guardium Technical Consultant Team Lead within the IMTE Optim and Guardium team at IBM Canada. Raki specializes in delivering end-to-end Optim and Guardium technical enablements, training sessions, proof of concepts, product demos, and best practices to accelerate partner and customer successes. Raki has been part of the IMTE Optim and Guardium team since 2008, where he is an active leader in writing Optim training materials and mastery exams for partners and customers of IBM.



28 February 2013

Netezza for data analytics

IBM Netezza is known as the simple data warehouse appliance for serious analytics, which can plug and play into any industry to provide speedy answers to complex and interrelated business questions. Everything around us is changing fast, and technology is opening the doors for solution miracles for all sorts of small, unique, and diverse problems that once seemed impossible to solve. Businesses today are looking at ways to expand laterally while increasing revenue, and Netezza advanced analytics will be able to give businesses the ability to get answers they need to make smarter decisions to stay on top of their competitors and increase their market footprint.

The IBM Netezza appliance will be able to provide quick and effective answers to the most sophisticated business questions and demands. IBM Netezza has made it easier for enterprises to solve their business questions by allowing the questions and answers to be built and deployed right where the data resides, in the data warehouse. By doing this, Netezza is able to build and deploy analytic models much quicker. Netezza provides a simple, smarter, and quicker solution compared to the traditional method, where analytics are built and deployed across separate analytical servers. All-in-all, Netezza's warehouse appliance (combining server, database, and storage) into a single out-of-the-box solution will allow enterprises the following advantages.

  • Serious analytics: Ability to answer questions that were previously too complex, required too much data, or took too much time to analyze.
  • On-demand analytics: Quickly respond to dynamic business conditions to choose the best course of action.
  • Simplicity: An appliances stack that has low maintenance with high availability.
  • Pure performance: Using IBM Netezza's data warehouse Asymmetric Massively Parallel Processing (AMPP) architecture to execute large data sets quickly.

Test Data Management (TDM) and Data Masking (DM) with InfoSphere Optim 9.1

Testing applications can be, and usually is a very time consuming and expensive phase in any application development cycle. Organizations continually search for ways to improve their application testing processes to build quality and reliable applications within schedule, on budget, and with allocated resources.

Copying or generating data, masking confidential data, creating or refreshing test environments, executing test runs, and validating the test executions are steps within the testing cycle that are time consuming. IBM InfoSphere Optim provides capabilities that help to speed up these steps using its Test Data Management and Data Masking capabilities.

InfoSphere Optim Test Data Management and Data Masking capabilities provide testers, developers, and DBAs a mechanism to do the following with ease.

  • Create right-sized, referentially intact data sets for test scenarios.
  • Mask confidential data.
  • Create test databases on the fly.
  • Edit data before test execution to meet certain conditional checks in the test scenario.
  • Compare the before and after data to validate the test executions.

These capabilities allow a faster and easier mechanism to access data for test environments from production databases. InfoSphere Optim is also capable of doing subsets of data and eliminates costly cloning processes, and also keeps the data relationally intact.


Netezza with InfoSphere Optim 9.1

Now that you understand both Netezza and InfoSphere Optim Test Data Management and Data Masking capabilities, you can better understand how Netezza can utilize Optim to manage and protect the data movement from production to non-production environments.

Testing and validating new and updated BI reports against data in Netezza is an essential part for enterprises to optimize their business decisions. To accomplish the movement of analytical data from production to development, InfoSphere Optim 9.1 now has native connection to directly plug and play with Netezza. This article will provide step-by-step instructions on the following capabilities.

  • Installing and configuring Netezza Client on Optim Server.
  • Configuring Optim for Netezza.
  • Test Data Management and Masking process for Netezza that lets you do the following:
    • Extract
    • Convert
    • Compare
    • Insert

InfoSphere Optim 9.1 TDM and DM integration for Netezza using IBM Federation Server

In previous releases, InfoSphere Optim relied on the IBM InfoSphere Federation Server to manage the data in Netezza. The diagram in Figure 1 shows the connectivity between InfoSphere Optim and Netezza using the IBM InfoSphere Federation Server.

Figure 1. InfoSphere Optim and Netezza integration prior to InfoSphere Optim 9
This figure shows you the InfoSphere Optim and Netezza Integration prior to InfoSphere Optim 9

To understand the details of InfoSphere Optim and Netezza integration using the IBM InfoSphere Federation Server, read the "Data transformation with IBM Netezza appliance using IBM InfoSphere Data Masking solution" developerWorks article from the Resources section.

InfoSphere Optim 9.1 now supports a native connection to Netezza. Figure 2 shows the new native connectivity between InfoSphere Optim and Netezza.

Figure 2. InfoSphere Optim 9.1 and Netezza native integration
This figure shows InfoSphere Optim 9.1 and Netezza Native Integration

The following systems with the components specified are assumed to be installed, configured and running for this article.

  • Netezza Data Warehouse Appliance
  • Optim Client Components (Optim Designer, Repository Services, Optim for Linux, UNIX, and Windows)
  • Optim Server Components (Optim Manager, Optim Proxy, WAS-CE)
  • Optim Repository VM

The following section will show you how to set up communication between Netezza Data Warehouse appliance and InfoSphere Optim using ODBC and JDBC.


Install and configure Netezza ODBC driver

The Netezza ODBC executable can be obtained from your Netezza Administrator (nzodbcsetup.exe). During installation of the driver do the following.

  1. As Optim is a 32 bit application, it requires the Netezza 32-bit driver. Select the 32-bit option and click Next, as shown in Figure 3.
    Figure 3. Netezza 32-bit driver
    Since Optim is a 32 bit application, it requires the Netezza 32-bit driver
  2. The 32-bit Netezza driver is installed in the destination folder C:\windows\SysWOW64, as shown in Figure 4.
    Figure 4. Netezza ODBC 32-bit driver install directory
    Netezza 32-bit driver Install Directory

    In order to create and configure the ODBC data source for Netezza, launch the odbcad32.exe from C:\windows\SysWOW64 directory.

  3. Select NetezzaSQL driver from the list of drivers and click Finish, as shown in Figure 5.
    Figure 5. NetezzaSQL driver selection
    NetezzaSQL driver must be selected from the list of drivers
  4. Enter a data source name, a Netezza server IP or hostname, a Netezza port, a Netezza database name, and the database login credentials, and then click Test Connection, as shown in Figure 6. Once you have a successful connection, click OK.
    Figure 6. Netezza database configuration parameters for ODBC
    Data Source Name, Netezza Server IP or Hostname, Netezza Port, Netezza Database name, and database login credentials are required
  5. Once the data source has been successfully created, it will appear in the System DSN section of the ODBC Data Source Administrator. This article uses two System DSNs, one for the source database and one for the test database (NZSource and NZTest), as shown in Figure 7.
    Figure 7. NZSource and NZTest Data Source
    Once the data source has been successfully created, it will appear in the System DSN section of the ODBC Data Source Administrator

Install and configure Netezza JDBC driver

InfoSphere Optim Designer 9.1 requires the Netezza JDBC driver to communicate with Netezza. In order to install the Netezza JDBC Driver you should obtain nzjdbcsetup.exe from the Netezza Admin. Once you launch the executable, it will take you through the Netezza JDBC install wizard for installation. In this wizard, you will be required to define the install path for the Netezza JDBC Driver. The default path will be C:\JDBC, as shown in Figure 8.

Figure 8. Netezza JDBC driver install directory
This figure shows the InfoSphere Optim Designer 9.1 requires the Netezza JDBC driver to communicate with Netezza

Create and configure Netezza data store alias

Launch InfoSphere Optim Designer to create and configure your Netezza data store aliases using the JDBC driver that has been installed.

  1. Open the Designer Explorer, under Optim Repository, right-click on Data Store Aliases, and select New Data Store Alias, as shown in Figure 9.
    Figure 9. New Data Store Alias
    This figure shows creating a new data store alias
  2. Enter a data source name, description, database type and version, connection string, and username/password login credentials, which are the required information for your database client connection, as shown in Figure 10. Click Next.
    Figure 10. Database client connection properties
    This figure shows the Data Source Name, Description of the Data Source, Database type and version, Connection String, and Username/Password are the required database client connection information
  3. The JDBC connection properties require the Netezza hostname, port number, database name, and the location to nzjdbc.jar. The nzjdbc.jar can be uploaded by clicking the Upload button. Click Test Connection to verify the connection, as shown in Figure 11.
    Figure 11. JDBC connection properties
    This figure shows the JDBC Connection properties require the Netezza hostname, port number, database name, and the location to nzjdbc.jar
  4. Once the Netezza data store aliases have been created, it can be viewed in the Optim Designer. Figure 12 shows the data store aliases for NZSource and NZTest.
    Figure 12. NZSource and NZTest data store aliases
    This figure show how, once the Netezza data store aliases have been created, it can be viewed in the Optim Designer

TDM and DM using InfoSphere Optim Designer 9.1

In a test data management process, Optim is capable of extracting a relationally-intact subset of data from the production environment using Optim's Extract feature. This subset of data can be masked using Optim Convert and compared to the original production database with Optim's Compare utility for correctness. It can then either be loaded or inserted into a non-production environment using Optim Load or Insert, respectively.

You can then use Optim's Edit utility to modify the test data for boundary scenarios and special cases. Another extract can be performed to create a gold extract file, which captures the state of the testing environment before any application testing is executed against it. After application testing is performed, customers can then compare the state of the testing environment with the gold extract file. Once the comparison is complete, Optim can then refresh the test database using the gold extract file upon user demand.

This article will cover the extract, convert, compare, create and insert steps through the Optim Designer for Netezza. Figure 13 shows the end-to-end test data management and data masking process.

Figure 13. TDM process
This figure shows how, in a test data management process, Optim is capable of extracting a relationally-intact subset of data from the production environment using Optim's Extract feature

You need to create a folder to store the Optim definitions for the Netezza project within Optim Designer. Right-click Optim Repository, select New, and click Folder, then specify the name of the project folder, as shown in Figure 14.

Figure 14. Create and define TDM/DM project folder
This figure shows how to start, create a folder to store the Optim definitions for the Netezza project within Optim Designer

Optim extract service

Under the Netezza project folder, a new extract service needs to be defined.

  1. Right-click Services and click New Services, as shown in Figure 15.
    Figure 15. Define extract service
    This figure shows how to look under the Netezza project folder to define a new extract service
  2. For an extract service, an Optim Access Definition (AD) is required. For the purpose of this article, a local AD is used and the OPTIM_CUSTOMERS sample table will be selected as the start table, as shown in Figure 16.
    Figure 16. Select data store alias and start table
    This figure shows how an extract service an Optim Access Definition (AD) is required. For the purpose of this article a local AD is used and the OPTIM_CUSTOMERS sample table will be selected as the start table
  3. The Find related tables option is used as the table selection method to populate the AD. All related tables can be found by clicking the Find related tables button, as shown in Figure 17.
    Figure 17. Table Selection method and Find related tables
    This figure shows the Find Related Tables option is used as the table selection method to populate the AD
  4. Enter the extract file name to store the extracted Netezza data, and then click Finish, as shown in Figure 18.
    Figure 18. Target file data store name
    This figure shows how to enter the extract file name to store the extracted Netezza data
  5. Optim provides options to extract a subset of data which can be defined through an AD. Figure 19 highlights an AD defined to extract every 5th record to max of 500 records from the OPTIM_CUSTOMERS table.
    Figure 19. Access Definition Editor
    This figure shows how Optim provides options to extract subset of data which can be defined through an AD

    (View a larger version of Figure 19.)

  6. Once an extract service is aligned with an AD, this service can be executed from the Optim Designer. Figure 20 shows an extract service with an AD that is ready to be executed.
    Figure 20. Execute Extract Service
    This figure show how, once an extract service is aligned with an AD, this service can be executed from the Optim Designer
  7. Optim Designer has an embedded Optim Manager component that allows you to manage and monitor the Optim services. Click the Service Management tab of the Optim Manager to see that the extract service is ready for execution. This service can be started by clicking the RUN button, as shown in Figure 21.
    Figure 21. Optim Manager – Service Management
    This figure shows the Optim Designer has an embedded Optim Manager component that allows us to manage and monitor the Optim services

    (View a larger version of Figure 21.)

  8. The results of the extract service can be viewed through the Service Monitoring tab of Optim Manager. Figure 22 displays the status of the extract service and the output of the Optim process report.
    Figure 22. Optim Manager – Service Monitoring
    This figure show the results of the extract service can be viewed through the Service Monitoring tab of Optim Manager

    (View a larger version of Figure 22.)

Optim convert service

Once the subset of data has been extracted for the Netezza production environment, it will need to be masked to protect the sensitive data before inserting this data into the Netezza test environment. This section will show how to mask the data using the convert service.

  1. A new Optim convert service needs to be defined through Optim Designer. You need to enter a service name and select a Service, as shown in Figure 23, and then click Next.
    Figure 23. New convert service
    This figure shows a new Optim Convert Service needs to be defined through Optim Designer
  2. Select the extract file that consists of the Netezza production data, which will be the source in the convert service, and click Next, as shown in Figure 24.
    Figure 24. Source data for convert service
    This figure shows how to select the extract file that consists of the Netezza production data, which will be the source in the convert service
  3. Define the target extract file that will store the masked version of the source data, and click Finish, as shown in Figure 25.
    Figure 25. Convert extract file
    This figure shows how to define the target extract file that will store the masked version of the source data
  4. In an Optim convert process, a table map and column map are required. This article uses local table and column map definitions. Figure 26 shows a table map editor which maps the source tables to target tables. In the table map editor, a column map can be added by highlighting a table and clicking on the Add Column Map button.
    Figure 26. Table Map Editor for convert service
    This figure shows how, in an Optim Convert process a table map and column map are required
  5. The Column Map Editor allows you to select Optim masking functions to help mask the data in the sensitive columns. Figure 27 shows examples of a few Optim masking functions.
    Figure 27. Column Map Editor with Optim Masking Functions
    This figure shows the Column Map Editor allows us to select Optim masking functions to help mask the data in the sensitive columns
    A full list of the Optim masking functions can be found in the CommonElmts.pdf, which is located in your Optim install path under \rt\DOC directory.
  6. Once the table and column map have been assigned to a convert service, it is ready to be executed. The execution and monitoring of the convert service can be performed from the Optim Manager by clicking Run Service, as shown in Figure 28.
    Figure 28. Execute Convert Service
    This figure shows how, once the table and column map have been assigned to a convert service, it is ready to be executed

Optim Compare request

Optim compare service can be used to verify that sensitive data has been masked according to privacy standards.

  1. Right-click on Compare, and then lick New Compare, as shown in Figure 29.
    Figure 29. New Optim Compare Request
    this figure shows how the Optim compare service can be used to verify that sensitive data has been masked according to privacy standards
  2. In the Compare Request editor, you need to define the compare file name, browsing options, and the type of comparison, as shown in Figure 30.
    Figure 30. Compare Request Editor
    This figure shows how, in the Compare Request Editor a compare file name, browsing options, and the type of comparison needs to be defined
  3. You can click the Source tab to specify the sources that you want to compare. The Netezza source data (Source 1) is being compared with the masked data (Source 2). Click the running man icon to execute the compare request, as shown in Figure 31.
    Figure 31. Sources for comparison
    This figurem shows the Netezza source data (source 1) being compared with the masked data (source 2)
  4. The compare file can be browsed to view the results of the compare request. In the browse compare file output, users can view unmatched, equal, and different row counts, as shown in Figure 32.
    Figure 32. Browse Compare file
    This figure shows the compare file can be browsed to view the results of the compare request
  5. Exact details of the differences can be viewed by double clicking on the tables in the compare file output. Figure 33 shows the compare results between extracted customers data and masked customers data from Netezza. Once the output is audited, the data is ready for the Netezza test environment.
    Figure 33. Audit Masked Data
    Compare results between extracted customers data and masked customers data from Netezza

    (View a larger version of Figure 33.)

Creating a Netezza test environment using Optim

You can use the Optim Create utility to create the database objects on the Netezza test environment. The database object definitions are stored with the data in the Optim Extract files.

Now you will see how Optim will use these extract files to create database objects on the Netezza environment.

  1. Click Optim Create utility from Optim Designer, as shown in Figure 34.
    Figure 34. Create button on the designer
    This figure shows how the Optim Create utility can be launched from Optim Designer
  2. Selected an extract file with a table map in the Create Options window, and then click OK, as shown in figure 35.
    Figure 35. Create Options
    This figure shows how an extract file must be selected with a table map in the create options window
  3. The table map requires a destination qualifier which identifies the Netezza test database. In the editor shown in Figure 36, you can map the destination tables to the source tables.
    Figure 36. Table Map Editor for Create utility
    This figure shows how the table map requires a destination qualifier which identifies the Netezza test database
  4. The Create window will show the status of the objects on the Netezza test database. Based on the object status, you will have the ability to select which objects to create or drop, as shown in Figure 37.
    Figure 37. Object Creation
    This figure shows how the create window will show the status of the objects on the Netezza test database
  5. The Optim Create utility will generate the Netezza DDL statements that will be executed on the Netezza database, as shown in Figure 38. You have the ability to modify the DDL if needed.
    Figure 38. Generated Netezza DDL
    This figure shows how the Optim Create utility will generate the Netezza DDL statements that will be executed on the Netezza database
  6. Once the DDL is executed against the Netezza database, you will be able to view the status of the objects on the create window, as shown in Figure 39.
    Figure 39. Updated object creation
    This figure shows how, once the DDL is executed against the Netezza database, you will be able to view the status of the objects on the create window

Optim Insert Service

Optim Insert Service is used to populate the Netezza test environment with the masked version of the source data.

  1. A new Insert Service needs to be defined through Optim Designer, as shown in Figure 40, and then click Next.
    Figure 40. New Optim Insert Service
    this figure shows how a new Insert Service needs to be defined through Optim Designer
  2. The converted extract file needs to be selected as the source data store alias for the Insert Service, as shown in Figure 41.
    Figure 41. Select Data Source Alias for Insert Service
    This figure shows how the converted extract file needs to be selected as the source data store alias for the Insert Service
  3. A table map is required to define where the masked data will be inserted into. As shown in Figure 42, the target data store alias is selected to the Netezza test environment. The table map also shows the existence of the required tables on the Netezza test database.
    Figure 42. Table Map Editor for Insert Service
    This figure shows how a table map is required to define where the masked data will be inserted into
  4. Once a table map is associated with an Insert Service, it is ready for execution through Optim Manager, as shown in Figure 43.
    Figure 43. Execute Insert Service
    This figure shows how, once a table map is associated with an insert service, it is ready for execution through Optim Manager
  5. Click the Service Monitoring tab of the Optim Manager to analyze the Insert Service status and process report output. Figure 44 shows a sample of a successful Insert Service.
    Figure 44. Insert Service status via Optim Manager
    This figure shows a sample of a successful Insert Service

Conclusion

This article gave you a high-level overview of the Optim Test Data Management and Data Masking process used to move a subset of data from a Netezza production database to a test database.

The Optim Test Data Management and Data Masking process consisted of using Optim Designer to do the following.

  • Extract a subset of data and its database objects from the Netezza production database.
  • Mask sensitive customer data using Optim masking functions.
  • Audit the masked data using Optim Compare Request.
  • Create a Netezza test database using Optim Create utility.
  • Insert masked data to the Netezza test database.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=859144
ArticleTitle=Use the InfoSphere Optim 9.1 Test Data Management and Data Masking solution for Netezza Data Warehouse
publish-date=02282013