IBM PureData-Netezza Developer Network (NDN)
Many a times we come across situations where in we want to accomplish a task for which there is an already built in function or a procedure. Most of the database vendors provide numerous such functions and procedures to achieve such tasks. But there are also occasions that none of the existing functions meet the actual goal that one desires. In such cases a user ends up in writing a function. Herein this article, I try to highlight as to how one can come up with his/her own C++ function which can be invoked using the Netezza SQL.
A user can create custom functions, aggregates, and shared libraries using the user-defined extensions feature. The custom functions can be run on the Netezza Performance Server (NPS) systems and used to perform specific types of analysis for business reporting and data queries. These objects leverage the massively parallel processing (MPP) environment to accelerate analysis of data. User-defined functions enable data processing directly on the system, leading to elimination or reduction of data movement to other systems for analysis, which ensures performance gains.
In other words, User defined Extensions (UDX) enhances Netezza functionality by enabling a user to write a C++ coded modules which can be invoked using the SQL queries.
There different types of UDX’s supported by Netezza are –
A typical UDX development cycle will include –
Let’s take a look on how UDF is created and used. UDF takes 0 or more input parameters and returns one output value. It is called once for every row and can be used in all SQL queries where built in functions are supported.
How to create the C++ file for UDF?
The file must include the header file udxinc.h in addition to any standard header file. The UDX classes and functions for API version 2 are defined in a namespace called nz:udx_ver2.
using namespace nz::udx_ver2;
Create a class object derived from the udf base class. The code will look something like this :
virtual nz::udx_ver2::ReturnValue evaluate()
// Function Logic
The instantiate method creates the object dynamically.
This CPP file will have to be compiled using the nzudxcompile utility.
The following object files are created
Once the UDF is compiled, it needs to be registered with the NPS system. The same utility can be used
Alternatively, the UDF can be registered using SQL, For ex –
These registered functions can be invoked like any function in a SQL statement.
Apoorv Kapse 270003FSX5 Tags:  connectivity for system odbc jdbc client oledb analytics puredata 15,105 Views
This post brings the insights on client connectivity for ‘PureData System for Analytics’ at one place. It does not cover the information on installation and configuration of client drivers, details of which can be found in:
Let us categorize the details as follows
Before moving on to the details, here’s a figure that shows the analogy between different types of drivers like ODBC, JDBC and OLEDB.
Coming back to the categories, let’s discuss in details
A session is a handshake protocol for negotiation between the client and server that starts with authentication. Session parameters like database, user, password, protocol etc, can be configured while creating the Data sources in these drivers or can be provided as part of connection string. There are few advanced options available which can be configured while initiating the connection. To take ODBC as example, refer below link
On the server (PureData System for Analytics) side, nzsession command can be used to view and manage sessions. Refer link below for details
Transactions refer to unit of work that operates on database-related objects, data or both.
Transactions provide the following benefits:
The transaction processing task contains five steps:
Client drivers provide explicit APIs to start/end transactions (e.g. SQLTransact, SQLEndTran), control on isolation level( e.g. SQLSetStmtAttr, SQLSetConnectAttr), query cancellation(e.g. SQLCancel), etc.
Once again, nzsession on server is handy to show and manage transactions.
Query execution from client driver involves multiple phases as listed below:
Minimal parsing is done is client drivers just to identify the type of query (parameter markers, multi-statement batch), normalization (white spaces, comments) and processing of escape sequences.
Preparation of statement involves processing of vendor specific escape clauses, translation based on server supported syntax, creating the access plan and getting some metadata information for result-sets (column information – types, lengths and others). These steps are relevant for SELECT queries and no-op for rest of the SQL.
This step deals with binding of data and actual execution of the query on the server. In case of multi-row operations (supported for INSERT/UPDATE/DELETE), column-wise or row-wise binding is done. This also involves any codepage conversion and fetching the result-sets (if generated) on client. Once the result-sets are available, it can be fetched in the application into bound variables.
Client drivers provides APIs to execute (e.g. SQLExecute, SQLExecDirect), fetch (e.g. SQLFetch, SQLFetchScroll), getting the row-counts (e.g. SQLRowCount) etc.
Everything is executed synchronously as there is wait/block for response from server. Also the server does not support for multiple queries to be active on a single session.
Metadata information refers to data about data. It describes the specific characteristics of an individual data item like table, column, stored procedure, etc which is usually stored in the system catalogs.
Client drivers provide APIs to retrieve metadata information, e.g.
There is no client API for creating/modifying the metadata. Server provides it’s own set of system views like _v_odbc_tables1/2/3, _v_jdbc_tables1/2/3 and _v_oledb_tables1/2/3 to be utilized by client having different version suffix for cross db and schema support.
As far as linguistics are concerned, client drivers provide full unicode support for queries and data with UTF8, UTF16 or UTF32 encoding though internally everything is in UTF8 (same as server encoding).
For locale specific encoding to unicode, drivers make use of ICU libraries (except JDBC).
Error messages are in english only which means that drivers are fully internalized but not localized.
Protocol can be talked about in two ways here
This refers to the custom protocol for initial negotiations while creating sessions.
This refers to protocol for any information to sent/received within a session. State transitions are part of it. It is partly inherited from postgres protocol and consists of two parts
Both the types of protocols are versioned for extensibility.
Hope this helps – questions/comments are welcome.