Introduction

Target audience

This package is intended to help ML developers and data scientists to run in-database analytics (ranging from data exploration functions to complex ML algorithms) through simple Python module invocations.
Note: The nzpyida package was previously called ibmdbpy4nps.

Need for the nzpyida package

Netezza in-database analytics (INZA) is a powerful and comprehensive analytics package that provides several SQL routines that are sufficient for handling most ML steps. However, users are limited with the choice of what’s available in the package. For example, you might want to use the latest algorithm available in your favorite Python ML library rather than the available INZA algorithms for solving an ML problem. Or you might want to apply some custom transformation on the data set, which is not available in INZA. That’s where the nzpyida package comes in. Built on top of the Netezza Analytics Executable (AE) technology, it allows you to run custom ML code directly inside the database through simple Python function calls. There is flexibility (you are not constrained in terms of what to run inside the database) and speed (you don’t run it at the client end but inside the database). Also, with the nzpyida package, you can connect to database tables with Pandas dataframe style abstraction and use them for data exploration (SQL translations for several standard Pandas dataframe operations are already included).

While it is possible to directly use AE framework to obtain most of the benefits of the nzpyida package, it is a laborious process and has a steep learning curve. First, you must get familiar with all the low-level AE abstractions to write an AE, and then run a series of commands on the console (deploy, register, run) to start the AE. This also means you cannot build and run the AE from your favorite Python IDE or a Notebook environment. This severely limits you in interspersing your visualization code with model building code and hinders the development of an end to end scenario - analyzing the data set, building and running the models and analyzing the model results all in one single place.

The nzpyida invocation.
Architecturally, there are three layers that are used to provide a seamless experience for pushing the client code as AEs inside the database. You need to interact only with the AE client layer through the Python module invocations that are provided in the nzpyida package.
AE client layer
Provides the Python modules that you can import and invoke with the required parameters.
AE generation layer
Translates your Python invocation into an AE-based code, and handles the SQL that’s needed for invoking the preexisting user-defined functions.
AE execution layer
Provides the preexisting user-defined functions, executes the SQL code and runs the AE launcher that handles AE execution.

Features

With the nzpyida package, you can:
  1. Connect to database tables with a Pandas style dataframe abstraction.
  2. Do data exploration in-database by using the built-in SQL translations (by using Pandas dataframe style).
  3. Run custom ML code directly in the database (through simple Python module invocations).