In this following entry I am going to walk you through how to write a User Defined Table Function ( UDTF ) in C . A User Define Function ( UDF ) allows a database user to extend the capability of their DB2 installation by writing custom functions which can then be used in queries . If the value returned by a UDF is a table, they are User Defined Table Function.
As a step one we have to come up with a code for the Table Function. I also thought of picking up an example from TPC-DS. Query 96 in TPC-DS is as follows
The above query computes the sales count in a specific store for a given 30 minute time window. So we have to join store_sales to time_dim table to be able to achieve this. Lets take a look at how to rewrite that query with a UDTF . Before we do that let us understand the time_dim table.
The following is the definition of time_dim table
If we count the number of entries in this table we notice that there are 86400 which corresponds to the number of seconds in a day. So if we took a look at this table we can find that the t_time_sk value changes from 0 to 86399. So the above query would be equivalent to a table function which generates a sequence of integers starting at 30600 for a total of 1800 values. Let us define a UDTF generate_times(BIGINT,BIGINT) that will take two values
- A starting point
- A range
and then return a table of integers. Lets define the function as follows.
The above command defines a UDTF in the catalogs . It defines among other things
- The signature of the function
- The return value
- The language
- The external name.
Note that the external name has two parts seperated by ! . The first part gives the binary that contains the function. The binary should be available under 'function' folder under the instance root. For eg. If your instance root is /home/user1/sqllib/ then the binary should be copied under /home/user1/sqllib/function folder.
So far so good. Except we need to implement the function that would then allow us to invoke it in the query. Lets have a look at the implementation.
Lets look at the code in a little detail. The first thing you need to do is to be include the header file sqludf.h. This file would be located under the 'include' folder under your instance root.
All the arguments to the function are all pointers to the input values to the function. We have three arguments, two input arguments , one to hold the starting value of time, the second to hold the number of iterations and the third to hold the output. In our case these are
Corresponding to each of these you would require an indicator to indicate to your code if any of these are Null. The type of SQLUDF_BIGINT and SQLUDF_SMALLINT are defined in sqludf.h
You can use the macro SQLUDF_NOTNULL to check if they are null. Ideally the input indicators should be check for validity before processing. Since I wanted to keep the example code very minimal , I have skipped this step in the code above.
#define SQLUDF_NOTNULL(nullptr) (*nullptr >= 0)
A table function is treated just like a Table. You can open the table, fetch rows from the table and close the table. You have to preserve the state of the computation between successive fetch. So the way to do it would be to make use of the routine infrastructure provided scratch pad. It is a 100 byte scratch pad provided to you to keep some state between fetch. If additional memory is allocated , care should be taken to free them upon closing the table function or upon error.
Upon entering the body of the function we initialize a pointer to the scratch area . Then we enter the processing step where we determine the call type . Upon opening or closing the table function we initialize the counter to the value pointed by start. With each fetch you copy the current counter value to the output variable , increment the counter value and check for the terminating condition. If we reached the terminating condition we set the SQLUDF_STATE to "02000" which indicates an end of table. Other values for SQLUDF_STATE are
Now we are ready to compile the program and copy the program to ~/sqllib/function folder. The program can be compiled and linked as follows
gcc -m64 -fpic -I$DB2PATH/include -c $tpcds.c -D_REENTRANT
Now we have all the pieces ready to rewrite the query above.
The results should be same !