Netezza for data analytics
IBM Netezza is known as the simple data warehouse appliance for serious analytics, which can plug and play into any industry to provide speedy answers to complex and interrelated business questions. Everything around us is changing fast, and technology is opening the doors for solution miracles for all sorts of small, unique, and diverse problems that once seemed impossible to solve. Businesses today are looking at ways to expand laterally while increasing revenue, and Netezza advanced analytics will be able to give businesses the ability to get answers they need to make smarter decisions to stay on top of their competitors and increase their market footprint.
The IBM Netezza appliance will be able to provide quick and effective answers to the most sophisticated business questions and demands. IBM Netezza has made it easier for enterprises to solve their business questions by allowing the questions and answers to be built and deployed right where the data resides, in the data warehouse. By doing this, Netezza is able to build and deploy analytic models much quicker. Netezza provides a simple, smarter, and quicker solution compared to the traditional method, where analytics are built and deployed across separate analytical servers. All-in-all, Netezza's warehouse appliance (combining server, database, and storage) into a single out-of-the-box solution will allow enterprises the following advantages.
- Serious analytics: Ability to answer questions that were previously too complex, required too much data, or took too much time to analyze.
- On-demand analytics: Quickly respond to dynamic business conditions to choose the best course of action.
- Simplicity: An appliances stack that has low maintenance with high availability.
- Pure performance: Using IBM Netezza's data warehouse Asymmetric Massively Parallel Processing (AMPP) architecture to execute large data sets quickly.
Test Data Management (TDM) and Data Masking (DM) with InfoSphere Optim 9.1
Testing applications can be, and usually is a very time consuming and expensive phase in any application development cycle. Organizations continually search for ways to improve their application testing processes to build quality and reliable applications within schedule, on budget, and with allocated resources.
Copying or generating data, masking confidential data, creating or refreshing test environments, executing test runs, and validating the test executions are steps within the testing cycle that are time consuming. IBM InfoSphere Optim provides capabilities that help to speed up these steps using its Test Data Management and Data Masking capabilities.
InfoSphere Optim Test Data Management and Data Masking capabilities provide testers, developers, and DBAs a mechanism to do the following with ease.
- Create right-sized, referentially intact data sets for test scenarios.
- Mask confidential data.
- Create test databases on the fly.
- Edit data before test execution to meet certain conditional checks in the test scenario.
- Compare the before and after data to validate the test executions.
These capabilities allow a faster and easier mechanism to access data for test environments from production databases. InfoSphere Optim is also capable of doing subsets of data and eliminates costly cloning processes, and also keeps the data relationally intact.
Netezza with InfoSphere Optim 9.1
Now that you understand both Netezza and InfoSphere Optim Test Data Management and Data Masking capabilities, you can better understand how Netezza can utilize Optim to manage and protect the data movement from production to non-production environments.
Testing and validating new and updated BI reports against data in Netezza is an essential part for enterprises to optimize their business decisions. To accomplish the movement of analytical data from production to development, InfoSphere Optim 9.1 now has native connection to directly plug and play with Netezza. This article will provide step-by-step instructions on the following capabilities.
- Installing and configuring Netezza Client on Optim Server.
- Configuring Optim for Netezza.
- Test Data Management and Masking process for Netezza that lets you do
InfoSphere Optim 9.1 TDM and DM integration for Netezza using IBM Federation Server
In previous releases, InfoSphere Optim relied on the IBM InfoSphere Federation Server to manage the data in Netezza. The diagram in Figure 1 shows the connectivity between InfoSphere Optim and Netezza using the IBM InfoSphere Federation Server.
Figure 1. InfoSphere Optim and Netezza integration prior to InfoSphere Optim 9
To understand the details of InfoSphere Optim and Netezza integration using the IBM InfoSphere Federation Server, read the "Data transformation with IBM Netezza appliance using IBM InfoSphere Data Masking solution" developerWorks article from the Resources section.
InfoSphere Optim 9.1 now supports a native connection to Netezza. Figure 2 shows the new native connectivity between InfoSphere Optim and Netezza.
Figure 2. InfoSphere Optim 9.1 and Netezza native integration
The following systems with the components specified are assumed to be installed, configured and running for this article.
- Netezza Data Warehouse Appliance
- Optim Client Components (Optim Designer, Repository Services, Optim for Linux, UNIX, and Windows)
- Optim Server Components (Optim Manager, Optim Proxy, WAS-CE)
- Optim Repository VM
The following section will show you how to set up communication between Netezza Data Warehouse appliance and InfoSphere Optim using ODBC and JDBC.
Install and configure Netezza ODBC driver
The Netezza ODBC executable can be obtained from your Netezza Administrator (nzodbcsetup.exe). During installation of the driver do the following.
- As Optim is a 32 bit application, it requires the Netezza 32-bit
driver. Select the 32-bit option and click
Next, as shown in Figure 3.
Figure 3. Netezza 32-bit driver
- The 32-bit Netezza driver is installed in the destination folder
C:\windows\SysWOW64, as shown in Figure 4.
Figure 4. Netezza ODBC 32-bit driver install directory
In order to create and configure the ODBC data source for Netezza, launch the odbcad32.exe from C:\windows\SysWOW64 directory.
- Select NetezzaSQL driver from the list of drivers and
click Finish, as shown in Figure 5.
Figure 5. NetezzaSQL driver selection
- Enter a data source name, a Netezza server IP or hostname, a Netezza
port, a Netezza database name, and the database login credentials, and
then click Test Connection, as shown in Figure 6.
Once you have a successful connection, click OK.
Figure 6. Netezza database configuration parameters for ODBC
- Once the data source has been successfully created, it will appear in
the System DSN section of the ODBC Data Source
Administrator. This article uses two System DSNs, one for the source
database and one for the test database (NZSource and NZTest), as shown
in Figure 7.
Figure 7. NZSource and NZTest Data Source
Install and configure Netezza JDBC driver
InfoSphere Optim Designer 9.1 requires the Netezza JDBC driver to communicate with Netezza. In order to install the Netezza JDBC Driver you should obtain nzjdbcsetup.exe from the Netezza Admin. Once you launch the executable, it will take you through the Netezza JDBC install wizard for installation. In this wizard, you will be required to define the install path for the Netezza JDBC Driver. The default path will be C:\JDBC, as shown in Figure 8.
Figure 8. Netezza JDBC driver install directory
Create and configure Netezza data store alias
Launch InfoSphere Optim Designer to create and configure your Netezza data store aliases using the JDBC driver that has been installed.
- Open the Designer Explorer, under Optim Repository,
right-click on Data Store Aliases, and select
New Data Store Alias, as shown in Figure 9.
Figure 9. New Data Store Alias
- Enter a data source name, description, database type and version,
connection string, and username/password login credentials, which are
the required information for your database client connection, as shown
in Figure 10. Click Next.
Figure 10. Database client connection properties
- The JDBC connection properties require the Netezza hostname, port
number, database name, and the location to nzjdbc.jar. The nzjdbc.jar
can be uploaded by clicking the Upload button. Click
Test Connection to verify the connection, as
shown in Figure 11.
Figure 11. JDBC connection properties
- Once the Netezza data store aliases have been created, it can be
viewed in the Optim Designer. Figure 12 shows the data store aliases
for NZSource and NZTest.
Figure 12. NZSource and NZTest data store aliases
TDM and DM using InfoSphere Optim Designer 9.1
In a test data management process, Optim is capable of extracting a relationally-intact subset of data from the production environment using Optim's Extract feature. This subset of data can be masked using Optim Convert and compared to the original production database with Optim's Compare utility for correctness. It can then either be loaded or inserted into a non-production environment using Optim Load or Insert, respectively.
You can then use Optim's Edit utility to modify the test data for boundary scenarios and special cases. Another extract can be performed to create a gold extract file, which captures the state of the testing environment before any application testing is executed against it. After application testing is performed, customers can then compare the state of the testing environment with the gold extract file. Once the comparison is complete, Optim can then refresh the test database using the gold extract file upon user demand.
This article will cover the extract, convert, compare, create and insert steps through the Optim Designer for Netezza. Figure 13 shows the end-to-end test data management and data masking process.
Figure 13. TDM process
You need to create a folder to store the Optim definitions for the Netezza project within Optim Designer. Right-click Optim Repository, select New, and click Folder, then specify the name of the project folder, as shown in Figure 14.
Figure 14. Create and define TDM/DM project folder
Optim extract service
Under the Netezza project folder, a new extract service needs to be defined.
- Right-click Services and click New
Services, as shown in Figure 15.
Figure 15. Define extract service
- For an extract service, an Optim Access Definition (AD) is required.
For the purpose of this article, a local AD is used and the
OPTIM_CUSTOMERS sample table will be selected as the start table, as
shown in Figure 16.
Figure 16. Select data store alias and start table
- The Find related tables option is used as the table
selection method to populate the AD. All related tables can be found
by clicking the Find related tables button, as shown
in Figure 17.
Figure 17. Table Selection method and Find related tables
- Enter the extract file name to store the extracted Netezza data, and
then click Finish, as shown in Figure 18.
Figure 18. Target file data store name
- Optim provides options to extract a subset of data which can be
defined through an AD. Figure 19 highlights an AD defined to extract
every 5th record to max of 500 records from the OPTIM_CUSTOMERS table.
Figure 19. Access Definition Editor
(View a larger version of Figure 19.)
- Once an extract service is aligned with an AD, this service can be
executed from the Optim Designer. Figure 20 shows an extract service
with an AD that is ready to be executed.
Figure 20. Execute Extract Service
- Optim Designer has an embedded Optim Manager component that allows you
to manage and monitor the Optim services. Click the Service
Management tab of the Optim Manager to see that the
extract service is ready for execution. This service can be started by
clicking the RUN button, as shown in Figure 21.
Figure 21. Optim Manager – Service Management
(View a larger version of Figure 21.)
- The results of the extract service can be viewed through the
Service Monitoring tab of Optim Manager. Figure
22 displays the status of the extract service and the output of the
Optim process report.
Figure 22. Optim Manager – Service Monitoring
(View a larger version of Figure 22.)
Optim convert service
Once the subset of data has been extracted for the Netezza production environment, it will need to be masked to protect the sensitive data before inserting this data into the Netezza test environment. This section will show how to mask the data using the convert service.
- A new Optim convert service needs to be defined through Optim
Designer. You need to enter a service name and select a Service, as
shown in Figure 23, and then click Next.
Figure 23. New convert service
- Select the extract file that consists of the Netezza production data,
which will be the source in the convert service, and click
Next, as shown in Figure 24.
Figure 24. Source data for convert service
- Define the target extract file that will store the masked version of
the source data, and click Finish, as shown in Figure
Figure 25. Convert extract file
- In an Optim convert process, a table map and column map are required.
This article uses local table and column map definitions. Figure 26
shows a table map editor which maps the source tables to target
tables. In the table map editor, a column map can be added by
highlighting a table and clicking on the Add Column
Figure 26. Table Map Editor for convert service
- The Column Map Editor allows you to select Optim masking functions to
help mask the data in the sensitive columns. Figure 27 shows examples
of a few Optim masking functions.
Figure 27. Column Map Editor with Optim Masking Functions
A full list of the Optim masking functions can be found in the CommonElmts.pdf, which is located in your Optim install path under \rt\DOC directory.
- Once the table and column map have been assigned to a convert service,
it is ready to be executed. The execution and monitoring of the
convert service can be performed from the Optim Manager by clicking
Run Service, as shown in Figure 28.
Figure 28. Execute Convert Service
Optim Compare request
Optim compare service can be used to verify that sensitive data has been masked according to privacy standards.
- Right-click on Compare, and then lick New
Compare, as shown in Figure 29.
Figure 29. New Optim Compare Request
- In the Compare Request editor, you need to define the compare file
name, browsing options, and the type of comparison, as shown in Figure
Figure 30. Compare Request Editor
- You can click the Source tab to specify the sources
that you want to compare. The Netezza source data (Source 1) is being
compared with the masked data (Source 2). Click the running
man icon to execute the compare request, as shown in
Figure 31. Sources for comparison
- The compare file can be browsed to view the results of the compare
request. In the browse compare file output, users can view unmatched,
equal, and different row counts, as shown in Figure 32.
Figure 32. Browse Compare file
- Exact details of the differences can be viewed by double clicking on
the tables in the compare file output. Figure 33 shows the compare
results between extracted customers data and masked customers data
from Netezza. Once the output is audited, the data is ready for the
Netezza test environment.
Figure 33. Audit Masked Data
(View a larger version of Figure 33.)
Creating a Netezza test environment using Optim
You can use the Optim Create utility to create the database objects on the Netezza test environment. The database object definitions are stored with the data in the Optim Extract files.
Now you will see how Optim will use these extract files to create database objects on the Netezza environment.
- Click Optim Create utility from Optim Designer, as
shown in Figure 34.
Figure 34. Create button on the designer
- Selected an extract file with a table map in the Create
Options window, and then click OK, as
shown in figure 35.
Figure 35. Create Options
- The table map requires a destination qualifier which identifies the
Netezza test database. In the editor shown in Figure 36, you can map
the destination tables to the source tables.
Figure 36. Table Map Editor for Create utility
- The Create window will show the status of the objects on the Netezza
test database. Based on the object status, you will have the ability
to select which objects to create or drop, as shown in Figure 37.
Figure 37. Object Creation
- The Optim Create utility will generate the Netezza DDL statements that
will be executed on the Netezza database, as shown in Figure 38. You
have the ability to modify the DDL if needed.
Figure 38. Generated Netezza DDL
- Once the DDL is executed against the Netezza database, you will be
able to view the status of the objects on the create window, as shown
in Figure 39.
Figure 39. Updated object creation
Optim Insert Service
Optim Insert Service is used to populate the Netezza test environment with the masked version of the source data.
- A new Insert Service needs to be defined through Optim Designer, as
shown in Figure 40, and then click Next.
Figure 40. New Optim Insert Service
- The converted extract file needs to be selected as the source data
store alias for the Insert Service, as shown in Figure 41.
Figure 41. Select Data Source Alias for Insert Service
- A table map is required to define where the masked data will be
inserted into. As shown in Figure 42, the target data store alias is
selected to the Netezza test environment. The table map also shows the
existence of the required tables on the Netezza test database.
Figure 42. Table Map Editor for Insert Service
- Once a table map is associated with an Insert Service, it is ready for
execution through Optim Manager, as shown in Figure 43.
Figure 43. Execute Insert Service
- Click the Service Monitoring tab of the Optim Manager
to analyze the Insert Service status and process report output. Figure
44 shows a sample of a successful Insert Service.
Figure 44. Insert Service status via Optim Manager
This article gave you a high-level overview of the Optim Test Data Management and Data Masking process used to move a subset of data from a Netezza production database to a test database.
The Optim Test Data Management and Data Masking process consisted of using Optim Designer to do the following.
- Extract a subset of data and its database objects from the Netezza production database.
- Mask sensitive customer data using Optim masking functions.
- Audit the masked data using Optim Compare Request.
- Create a Netezza test database using Optim Create utility.
- Insert masked data to the Netezza test database.
- Learn more about IBM InfoSphere Optim Test Data Management Solution and accelerate your application delivery
- Watch the Application Quality is Critical: InfoSphere Optim Test Data Management Solution video for more information.
- Learn more about InfoSphere Optim , which is part of an overall information integration and governance strategy.
- Explore Netezza as the simple data warehouse appliance for serious analytics.
- Learn more about data masking in the "Data transformation with IBM Netezza appliance using IBM InfoSphere Data Masking solution" developerWorks article.
- Secure and harden the Netezza data warehouse appliance with InfoSphere Guardium.
- Watch the IBM Netezza video IBM Netezza: Simplicity Drives Performance to learn more about the product.
- Visit the developerWorks Information Management zone to find more resources for DB2 developers and administrators.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.