Usually, for moving data into a Data Warehouse system like Netezza, there is an extract, transform and loading phase involved commonly referred to as ETL.
Extraction is extracting data from various transactional sources, transformation involves operations like computing new values based on some source value or translating coded values, selecting subset of data and loading is moving the data to the Data Warehouse.There are several tools available in the market to do ETL, however if extraction of data is already complete from source systems and the requirement is to perform transformations and loading to Netezza, Netezza provides a very flexible capability called External Tables.
External Tables provide a SQL interface to the data that exists in a flat file or a pipe. Since SQL is a very rich language with several transformational capabilities, External Table inherits most of the transformational capabilities of SQL. For example using External Tables, users can select only certain columns of data from the file, perform operations like casting, truncation, mathematical operations on the selected data and even join this data with other existing table before loading it into the data warehouse.
Further External Tables in conjunction with User Defined Functions(UDF) can be used to manipulate the data read in from the data file with custom user code. This custom user code can be similar to the rules specified in the ETL tools to transform the data.Because of these strong transformational capabilities, External Table is a very good choice to do the Transformation & Loading of an ETL process. In no way External Tables can solve all the ETL problems , however it may satisfy the needs of several applications which may find the ETL tools an overkill.
Let us consider an example of selecting certain columns, performing decoding transformations on that data, deriving new values from some of the columns and then loading this data into a Netezza Table.
CREATE EXTERNAL TABLE customer_ET(NAME CHAR(20),AGE INT4,GENDER CHAR(1),BALANCE INT4,CITY CHAR(20),ADDRESS CHAR(10))USING (dataobject('/home/mycustomer.dat'));
/* Load the data into Netezza Table customer_DW from External Table customer_ET */
insert into customer_DW select name , generation(age) , rewards(balance), citycode(city) from customer_ET;
Below is a sample of how the evaluate method for citycode() UDF which returns city code based on city name will look like. Similarly UDFs for generation() and rewards() can be defined. Refer to Netezza UDF development guide for more details.
These UDFs can do complex pre-processing on your data before its loaded into Netezza.
/* evaluate method for citycode() UDF. The code under this evaluate method is executed for invocation of citycode UDF. */
virtual nz::udx_ver2::ReturnValue evaluate()
str = stringArg(0);
int lengths = str->length;
char *datas = str->data;
int32 citycode = 0;
if (memcmp("Chicago",datas,7) == 0)
citycode = 105;
if (memcmp("Dallas",datas,6) == 0)
citycode = 207;
if (memcmp("Washington",datas,10) == 0)
citycode = 306;
If your requirement is to perform some transformations on the data before loading it into Netezza, then I would recommend you to explore External Tables with UDF to perform these transformations along with loading the data. You would be surprised to see how powerful a simple SQL based interface can be for doing such transformations.