The power of user-defined table functions
Writing and using UDTFs
Within a SELECT statement, multiple tables can be joined together, columns can be selected, new columns can be generated, and functions and global variables can be used. It is even possible to nest multiple SELECT statements or to add common table expressions (CTEs).
In this way, SELECT statements get more and more complex, but the same SELECT statement must be run from within different programs written in different programming languages. Instead of duplicating the same SQL code over and over again, those SQL statements should be stored within a single place that can be reused. SELECT statements can be reused by embedding them in a SQL view or stored procedure.
A UDTF is another mechanism that allows SELECT statements to be reused by many programs and also provides the ability for multiple statements to be performed before returning the final result set.
A UDTF can also help to work around the limitation that traditional SELECT statements can only directly access data stored in DB2 tables. If data in non-DB2 objects, such as data areas or stream files located within the IFS must be processed, programs written in a high-level language (HLL) can be registered as an external UDTF.
DB2 supports two methods of user-defined functions:
- User-defined scalar function
- User-defined table function
A scalar user-defined function (UDF) can be written with SQL or can be an HLL program that is registered with the CREATE FUNCTION statement.
A scalar UDF can be used within SQL statements (SELECT, UPDATE, INSERT, DELETE, SET, and so on) similar to any built-in scalar function, such as SUBSTR, ABSVAL, or DAYOFWEEK_ISO.
User-defined table functions
A user-defined table function is implemented similar to a UDF but can return a set of values instead of a single value. This returned set of values can consist of multiple columns and multiple rows similar to a DB2 table.
A UDTF is also created or registered with the CREATE FUNCTION statement. Input parameters may or may not be defined. The RETURNS TABLE clause is used to define the attributes of the output columns.
A UDTF can be written either in SQL or HLL which means that there are two UDTF types:
- SQL UDTF
The UDTF is created with the CREATE FUNCTION statement and can consist of either a single or multiple SQL statements. A SQL UDTF example is shown in Listing 3.
The UDTF result set is determined by the SELECT statement that is specified within the RETURN statement. The number of columns specified within the SELECT statement must match the number of columns defined in the RETURNS TABLE clause. The data types of the selected columns must also be at least compatible with the output column data types.
Note: If all data can be accessed with a single SELECT statement (independent of the number of views, CTEs, or nested sub-selects that are involved) creating a view instead of a UDTF is the better choice. The running of a UDTF produces some overhead compared with accessing a view.
- External UDTF:
An external UDTF is a HLL program that is registered with the CREATE FUNCTION statement as a UDTF.
A UDTF must be specified within the FROM clause of an SELECT statement in composition with the TABLE-function clause.
In Listing 1 the USERS UDTF provided by IBM and located in the QSYS2 schema is run. The USERS UDTF is invoked without any parameters, and therefore, the function name must be followed by an empty parenthesis. When using the TABLE-function clause, specifying a correlation name is mandatory. The correlation name used in Listing 1 is
Listing 1: Basic UDTF call
SELECT * FROM TABLE( USERS() ) u;
The USERS UDTF returns a table containing all user profiles and profile descriptions in a second column. An excerpt of the result table returned by the USERS UDTF is shown in Figure 1.
Figure 1: Excerpt from the result table of the USERS UDTF
Because a UDTF is specified within the FROM clause, its result set can be handled similar to any table or view. For example, a UDTF result set can be joined with other tables or views, columns can be selected, WHERE clauses can be added for reducing the data to be returned, data can be condensed by using aggregate functions, and data can be sorted in any sequence.
The examples in Listing 2 show different invocations of UDTFs.
In the first example, the USERS UDTF is run again, but only the user profiles beginning with "TM" are returned.
In the second example, the
ListMember_Fnc UDTF is called.
ListMember_Fnc is an external UDTF that returns all members located in a source file. The source file and the library are both passed as parameter values to the UDTF. Depending on the WHERE conditions, only members with the member types RPGLE and SQLRPGLE as well as the member name containing LIST are returned. The source code for the
ListMember_Fnc UDTF is published and it is explained later in this article see (External UDTF returning data out of a user space filled by a System API).
In the third example, the
ListMember_Fnc UDTF is called twice. With the first execution, all members in the QRPGLESRC and the HSCOMMON10 library are returned. With the second execution, all members in the QRPGLESRC file and the HSCOMMON05 library are returned. The results of both UDTF calls are joined together with a FULL OUTER JOIN by connecting the member names to find out the members that are in one source file and not in the other one.
Listing 2: Advanced UDTF calls
-- 1. Add Where Condition SELECT * FROM TABLE(USERS()) u WHERE ODOBNM LIKE 'TM%'; -- 2. Calling a UDTF with Parameters: ListMember_Fnc (List Member) SELECT * FROM TABLE(ListMember_Fnc('QRPGLESRC', 'HSCOMMON10')) a WHERE MbrType LIKE '%RPGLE%' AND Mbr LIKE '%LIST%' ; -- 3. Joining UDTFs SELECT * FROM TABLE(ListMember_Fnc('QRPGLESRC', 'HSCOMMON10')) a FULL JOIN TABLE(ListMember_Fnc('QRPGLESRC', 'HSCOMMON05')) b ON a.Mbr = b.Mbr WHERE a.MbrDescr LIKE '%File%' OR b.MbrDescr LIKE '%File%';
Writing a SQL UDTF
The easiest way to explain something is to present an example.
SQL UDTF – Log users running the UDTF
You may have some critical queries, where you need to capture who is running a specific query and when the query is used. So, one solution to this requirement is to write a UDTF for each of these queries that inserts an audit row into a log file before running the query. Instead of running the query directly, the user can run the appropriate UDTF either directly or through a view that includes the UDTF call.
Listing 3 contains an example of this type of UDTF. The
LogUser_Fnc UDTF returns a subset of columns from the address master table (2). Before running the query, an INSERT statement is performed that writes information about the user and run time into the LOGFILE audit table (1).
Listing 3: LocUser_Fnc UDTF – Log users running the query / UDTF
CREATE FUNCTION LogUser_Fnc () RETURNS TABLE (CustNo Char(15) , CustName1 Char(35) , CustName2 Char(35) , Contact Char(35) , Street Char(35) , ZipCode Char(9 ) , City Char(35) , Country Char(4 ) ) LANGUAGE SQL MODIFIES SQL DATA BEGIN INSERT INTO LogFile (1) VALUES(Default, Session_User, 'ADDRESSX', 'LOGUSER_FNC', Current_Timestamp) WITH NC; RETURN SELECT CustNo, CustName1, CustName2, Contact, (2) Street, ZipCode, City, Country FROM AddressX; END;
SQL UDTF – Display owned objects by user
Another need may be analyzing information provided by control language (CL) commands. A lot of CL commands provide an option for returning the selected information into an output file (*OUTFILE) that itself can be processed easily with SQL.
The DSPOBJD (display object description), DSPUSRPRF (display user profile), DSPJOBLOG (display job log) and DSPFFD (display field file description) commands are only a few of them.
A SQL UDTF allows you to first transparently run the CL command writing the required information into an output file. The content of this temporary output file can then be processed by coding a SELECT statement that accesses the temporary table in conjunction with the RETURN statement.
Listing 4 shows the SQL for the
DspObjOwn_Fnc UDTF. This UDTF returns all objects owned by the user profile that is passed as an input parameter value.
DspObjOwn_Fnc UDTF returns a subset of the columns out of the generated output file. The columns to be returned are defined in the RETURNS TABLE clause.
The UDTF first checks the input user profile value. If a value is not passed, the UDTF uses the session user (current user profile) (1). The DSPUSRPRF command is run by calling the QCMDEXC (Execute Command) stored procedure (2). The result out of the CL command is written to the TMPOBJOWN file located in the QTEMP library. The values are retrieved from in the temporary table by running the RETURN statement with the associated SELECT statement (3). The numeric date and time column values from the command output file are converted into SQL timestamp values (4).
Listing 4: DspObjOwn_Fnc UDTF – Display owned objects by user
CREATE FUNCTION DspObjOwn_Fnc(ParUser VarChar(10)) RETURNS TABLE (UsrPrf Char(10) , Obj Char(10) , ObjLib Char(10) , ObjType Char(8) , AutHldr Char(10) , System Char(8) , ObjASP Char(10) , DspTime Timestamp ) LANGUAGE SQL MODIFIES SQL DATA BEGIN DECLARE CLCmd VarChar(256) Not NULL Default ''; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION BEGIN DECLARE ERROR_HIT INTEGER; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET ERROR_HIT = 1; CREATE TABLE QTEMP/TMPOBJOWN ( OODDAT CHAR(6), OOUSR CHAR(10), OOOBJ CHAR(10), OOLIB CHAR(10), OOTYPE CHAR(8), OOAHLR CHAR(1), OOSYST CHAR(8), OODTIM CHAR(6), OOASPL CHAR(10)); END; SET ParUser = CASE WHEN LENGTH(TRIM(ParUser)) = 0 (1) THEN SESSION_USER ELSE UPPER(TRIM(ParUser)) END; SET CLCmd = 'DSPUSRPRF USRPRF(' concat ParUser concat ') '(2) Concat ' TYPE(*OBJOWN) ' Concat ' OUTPUT(*OUTFILE) ' Concat ' OUTFILE(QTEMP/TMPOBJOWN) ' Concat ' OUTMBR(*FIRST *REPLACE) '; CALL QCMDEXC(CLCmd, LENGTH(CLCmd)); RETURN SELECT OOUSR, OOOBJ, OOLIB, OOTYPE, (3) OOAHLR, OOSYST, OOASPL, TIMESTAMP('20' concat OODDAT concat OODTIM)(4) FROM QTEMP/TMPOBJOWN; END;
In Listing 5, the
DspObjOwn_Fnc is run to return all objects owned by the HAUSER user profile. Based on the WHERE conditions only, data areas, binding directories, and modules for the input user are returned.
Listing 5: Running the DspObjOwn_Fnc UDTF
SELECT ObjLib, Obj, ObjType, ObjASP, DspTime FROM TABLE(DspObjOwn_Fnc('HAUSER')) x WHERE ObjType in ('*DTAARA', '*BNDDIR', '*MODULE');
The result table returned by this query is shown in Figure 2.
Figure 2: Result table from the DspObjOwn_Fnc UDTF
Running the CL commands that populate data into an output file can add some overhead to the system. So you might need to consider creating an external UDTF to directly use data from system APIs instead of CL command output files.
Writing an external UDTF
Even though UDTFs can be easily written with pure SQL, a SQL UDTF can only access data stored in externally described DB2 tables.
Accessing data in non-DB2 objects, such as data areas or program-described files cannot be done easily with SQL.
On the other hand, programming languages might provide functions for accessing and processing data in non-database objects. With External UDTFs these additional features provided by the IBM i programming languages can be wrapped and for using them from within SQL.
When writing programs to be registered as external UDTFs, there are two things that must be considered:
- Callback processing
- Parameter style SQL
In computer programming, a callback is a piece of executable code that is passed as an argument to other code, which is expected to call back (run) the argument at some convenient time.
The chart in Figure 3 demonstrates how callback processing works.
The PGM program calls the
ReadFile procedure located in the SC service program and passes a reference (procedure pointer) to the
SingleRow procedure located in the SCB service program as a parameter value.
The ReadFile procedure in the SC service program reads the MyFile file in a loop. For each record that is accessed, the procedure whose reference was passed as parameter is run (the
SingleRow procedure in this example, but might be any other procedure with the same parameter definition). A data structure containing the previously read record is passed to the called procedure (
After the called procedure (
SingleRow) is ended, control goes back to the invoking procedure (
ReadFile) that continues reading the next record. The
ReadFile procedure is terminated after the last record is processed.
Figure 3: Callback Processing
IBM uses callback processing methods for activating external UDTFs.
The program registered as an external UDTF is called repeatedly, but at least for three times:
- At the beginning (open call)
- For each row to be returned (fetch call)
- After all rows have been returned (close call).
Registering external UDTFs
A program or procedure located in a service program can be registered as an external UDTF by running the CREATE FUNCTION statement.
When registering an external UDTF with the CREATE FUNCTION statement, the following information must be specified:
- Input parameters
All input parameters must be defined.
This definition must exactly match the data type, length, and sequence of the input parameters within the HLL program.
Note: The SQL interpretation of data types and lengths may differ from the definitions allowed within the HLL. For example the maximum length of a character field within RPG is 64 KB (release V5R4M0) and 16 MB (higher releases), while the maximum length for a character variable accepted by SQL statements is 32 KB.
There may be also some issues when passing date values. For example, in RPG a date format other than ISO can be used, while SQL always expects ISO formatted dates to be passed. To avoid any problems with date parameters, the DATFMT(*ISO) keyword should be added to all date parameter definitions within the appropriate prototypes and procedure interfaces.
- Output parameters – Table definition
For a UDTF, all columns to be returned must be defined (name and data type and length depending on the data type) in composition with the RETURNS TABLE keyword.
The column definitions within the RETURNS TABLE clause must be an exact match (to the data type, length, and sequence) of the output parameters defined in the HLL program.
The programming language in which the program to be registered is written must be specified, for example RPGLE, COBOLLE, or CLLE.
If a program containing embedded SQL is registered, the programming language value to be specified is still RPGLE or CLE and not SQLRPGLE or SQLCLE.
- External name
Within the EXTERNAL NAME clause, the program or service program to be registered must be specified. If Java is used, the EXTERNAL NAME clause would contain the Java class and method.
Qualifying the program or service program with the library is not required. If no library was specified when registering the function, the SQL path is searched when running the UDTF. The SQL path is set with the SET PATH statement. The special value *LIBL is allowed when setting the SQL path.
A procedure located within a service program must be specified surrounded with parenthesis immediately after the service program name:
EXTERNAL NAME 'MYSRVPGM(MYPROCEDURE)'
- Parameter style SQL
Parameter style SQL is mandatory for all external UDTFs that are not written in Java.
The parameter style SQL not only requires all input and output parameters to be defined in the parameter list of the HLL program, but also requires NULL indicator variables for all input and output parameters. The SQL parameter style also requires several other parameters for exchanging information between the invocation program and the UDTF, among them the SQLSTATE and the call type.
- Final call / No final call
Normally, a UDTF is called at least three times with three different call types, that is, OPEN, FETCH (that can be repeated x times) and CLOSE.
When specifying the FINAL CALL option, two additional invocations will occur. The first one before the open call (as first call) and the second one after the close call (as final call). The FINAL CALL option is used if additional operations must be performed before the open or after the close call.
For example, before opening your files that are accessed with native I/O, you might intent to set the library list first. Or, when using List APIs, you might plan to create and populate the user space with the first call and delete the user space with the final call.
The SCRATCHPAD attribute provides the ability to define a memory area that can be shared across calls to the registered program.
The SCRATCHPAD attribute is not mandatory. Instead, the information can be stored in either global variables or local static variables within procedures to keep data persistent between different program calls.
When using the SCRATCHPAD attribute, the appropriate parameter must be defined as data structure in the HLL consisting of an integer sub-field containing the length of the second sub-field, a fixed length character variable.
Listing 6: Reference data structure of a scratchpad
D RefDSScratch DS Qualified Template D Length 10I 0 Inz(%Size(Info)) D Info 256A
Listing 7shows the registration of the LISTMEMBER procedure located within the LISTMBR service program in the HSUDTF library (8) as
ListMember_FncUDTF expects two varying length input parameters representing the file and library names (2) and returns a table consisting of seven columns (3). The procedure is written in RPG (4) without using any SQL (5). When running the procedure, a first and final call (6) must be performed. The DISALLOW PARALLEL and NOT FENCED attributes should be specified if the procedure is written in a programming language that is not multithreading-capable (7).
Listing 7: Registering the ListMember_Fnc UDTF
CREATE FUNCTION ListMember_Fnc (1) (ParFile VarChar(10), (2) ParLib VarChar(10)) RETURNS TABLE (MbrFile Char(10) , (3) MbrFileLib Char(10) , Mbr Char(10) , MbrType Char(10) , MbrDescr Char(50) , MbrCrtTS Timestamp, MbrChgTS Timestamp) LANGUAGE RPGLE (4) NO SQL (5) FINAL CALL (6) DISALLOW PARALLEL (7) NOT FENCED EXTERNAL NAME 'HSUDTF/LISTMBR(LISTMEMBER)' (8) PARAMETER STYLE SQL ; (9)
Parameter definition in HLL programs to be registered as external UDTFs
Besides all input and output column parameters that are specified in the CREATE FUNCTION statement, the SQL parameter style requires additional parameters to be included in the procedure interface or entry parameter list. These additional parameters are passed by the DB2 invocation program for exchanging information with the HLL program.
The following additional parameters are required with the SQL parameter style:
- Indicator variables for all input and output parameters
An indicator variable, passed as 2-byte binary value (5I 0) must be defined for each input and output parameter for indicating whether a NULL value was passed.
- SQLSTATE – Output parameter
SQLSTATE is a 5-character output parameter that can be set within the HLL program or procedure for indicating the success or failure of the execution. The programmer can set the SQLSTATE parameter to any valid value.
As soon as the last row is consumed, the SQLSTATE output parameter must be set to '02000' for indicating to the invocation program that all data retrieval is complete and the close call can be initiated.
You can find detailed information about the SQLSTATE concepts and messages at the IBM i information center.
- Function name – Input parameter
The function name parameter is a 517-character varying length input parameter representing the fully qualified function name.
- Specific name – Input parameter
The specific name parameter is a 128-character varying length input parameter representing the specific name.
- Message text – Output parameter
The message text parameter is a 1000-character varying length output parameter for returning an individual error message text in conjunction with an individual SQLSTATE.
- Scratchpad – Output parameter
If the SCRATCHPAD attribute is set for storing information between the different program calls, the SCRATCHPAD parameter must be defined after the message text parameter. Otherwise the call type parameter follows immediately after the message text parameter.
- Call type – Input parameter
The call type input parameter must be defined as an integer parameter and is the most important parameter. It is set by the invocation program immediately before calling the HLL program/procedure to one of the following constant values:
- -2 = First call will be passed only if the FINAL CALL option is explicitly set within the CREATE FUNCTION statement.
- -1 = Open call is the first invocation of the program if the FINAL CALL option was not specified. When using global or static variables for keeping information persistent between the program or procedure calls, the open call is a good place for initializing them. If the SCRATCHPAD attribute is used, memory is allocated and initialized before the open call.
When using record level access (RLA) within the external UDTF, the files to be accessed should be defined as user opened. The files should be opened and positioned at the beginning of the data with the open call and closed if the close call is performed.
- 0 = Fetch call will be run repeatedly until the SQLSTATE parameter is set for indicating the end of processing (SQLSTATE 02000) or for indicating an error.
- 1 = Close call is run after all data is processed and it allows to perform any external end processing, for example closing user opened files, deleting temporary objects such as work files or user spaces, and so on.
- 2 = Final call will only be performed if the FINAL CALL attribute is explicitly set within the CREATE FUNCTION statement.
To make the source code easier to read, I defined named constants for the call types and stored them in a separate source member (QSRC,CALLTYPE) that is integrated as copy member in all of the subsequent examples.
Listing 8: Copy member constant values for call types
D ConstFirstCall C const(-2) D ConstOpenCall C const(-1) D ConstFetchCall C const(0) D ConstCloseCall C const(1) D ConstFinalCall C const(2)
Listing 9 displays the prototype (parameter definition) for the
ListMember procedure in the LISTMBR service program. The CREATE FUNCTION statement for registering this HLL procedure is shown in Listing 7.
Within the prototype, first the two input parameters are defined (1), followed by the output parameter definition, to match the columns specified in the RETURNS TABLE clause.
In the CREATE FUNCTION statement, only the input and output parameters are defined. When using the SQL parameter style, the DB2 invocation program passes additional parameters for communicating with the HLL program. These additional parameters must be defined in the parameter list.
For each input and output parameter, a NULL indicator parameter variable (3) must be included.
Additional parameters for the SQLSTATE parameter (4), function name (5), specific name (6), message text (7) and call type (8) must be defined.
Listing 9: Prototype ListMember procedure to be registered as UDTF
D ListMember PR // Input Parameters D ParFile 10A Varying (1) D ParLib 10A Varying // Columns to be returned D POutFile 10A (2) D POutLib 10A D POutMbr 10A D POutMbrType 10A D POutDescr 50A D POutCrtTS Z D POutUpdTS Z // NULL Indicators for all input and output parameters D ParFileInd 5I 0 (3) D ParLibInd 5I 0 D POutFileInd 5I 0 D POutLibInd 5I 0 D POutMbrInd 5I 0 D POutMbrTypeInd... D 5I 0 D POutDescrInd 5I 0 D POutCrtTSInd 5I 0 D POutUpdTSInd 5I 0 // SQL Function Parameters D POutSQLState 5A (4) D ParFctName 517A varying Const (5) D ParSpecName 128A varying Const (6) D POutMsgText 1000A varying (7) D ParCallType 10I 0 (8)
After having explained all components for writing external UDTFs, let's review the source code for several external UDTF examples.
To keep the source code as simple as possible, prototypes, reference fields, data structures, and call types are externalized in copy members and are integrated by using either the /COPY or /INCLUDE compiler directive.
The copy members are all available in the attached downloadable source code.
External UDTF consuming a program-described file
The following screen capture displays the content of the program-described file SPLITF located in the HSUDTF schema. This file consists of three character columns, representing the first name, last name, and the city. The zip code is stored between the last name and the city values in a 5-digit packed format. The last value in the record is an 8-digit packed value representing a numeric date (birthday) in the format YYYYMMDD.
Figure 4: Display program-described file, SPLITF
Accessing this table directly, splitting the row information into columns, and converting the data into character, numeric, and date values with pure SQL is rather complex. With programming languages such as RPG and COBOL on the other hand, the records can be easily read into internally described data structures and the data is magically returned as columns.
Listing 10 shows the source code for the SPLITFU program in which the program-described file is read sequentially into an internally described data structure and the content of the data structure sub-fields is moved into the output parameters. The SPLITFU program is registered with a function name,
The program-described file SPLITF is defined as a user-opened internally described input file within the global F Specs. (1)
In this UDTF example, there wasn't a requirement for input parameters, and therefore, none are defined. (2).
An output parameter is defined for all of the columns to be returned (First Name, Last Name, Zip Code, City, Birthday). Even though the character fields and the data structure subfields are defined as fixed length character fields, the output columns are defined as varying length variables. Leading and trailing blanks are trimmed off the subfield values. Converting fixed length character values into varying length character values is not required, but used in this example, mainly because varying length values can be concatenated easier.
The zip code is returned as a packed value.
The numeric birthday is converted and returned as SQL date value. To prevent problems with different date formats, the DATFMT(*ISO) keyword is added to the birthday's parameter definition.(3)
Five NULL indicator parameters for indicating NULL values are defined for the five output parameters.
The SQLSTATE, function name, specific name, message text, and call type parameters follow the indicator parameters and are defined as described before (5).
DSSplitF data structure for receiving the data is defined as a qualified data structure. The data structure subfields are defined by only specifying the data types and data lengths (6).
With each invocation, all output parameters are initialized independent of the current call type (just to make sure that no invalid data from any earlier call is returned) (7).
When the open call is performed, the SPLITF file is (user controlled) opened and positioned before the first record. The program ends after the file is opened (RETURN opcode) (8).
Each time the fetch call is performed, the next record is read into the
DSSplitF data structure (9). As long as the end of file is not yet reached, the content of the data structure subfields is moved into the output parameters. Because the character columns are defined as varying length parameters, leading and trailing blanks are trimmed off the fixed format subfield values using the %TRIM built-in-function. The numeric date is converted into a date value by using the %DATE built-in-function. For invalid numeric dates, the default value (0001-01-01) is returned (10).
If the end of the file is reached or the close call is performed, the SQLSTATE output parameter value is set to '02000'. The SPITF file is closed and the program is ended (11).
Listing 10: Program SPLITFU consuming a program-described file
FSplitF IF F 70 Disk UsrOpn (1) *--------------------------------------------------------------------- * Prototype * --> Release 7.1: Prototypes for programs that are not called from * other RPG programs/procedures are no longer required /INCLUDE QUDTFSRC,CALLTYPE * Procedure Interface D SPLITFU PI * Input Parameter (2) * Output Parameter - Return Table Columns (3) D POutFirstName 50A Varying D POutName 50A Varying D POutZipCode 5P 0 D POutCity 50A Varying D POutBirthDay D DatFmt(*ISO) * Null Indicators for Input Parameters * Null Indicators for Return Table Columns (4) D POutFirstNameInd... D 5I 0 D POutNameInd 5I 0 D POutZipCodeInd... D 5I 0 D POutCityInd 5I 0 D POutBirthDayInd... D 5I 0 * SQL Function Parameters D POutSQLState 5A (5) D ParFunction 517A varying Const D ParSpecName 128A varying Const D POutMsgText 1000A varying D ParCallType 10I 0 *---------------------------------------------------------------------- * Data Structure – program-described file D DSSplitF DS 70 Qualified Inz (6) D FirstName 15A D Name 15A D ZipCode 5P 0 D City 15A D BirthDay 8P 0 ************************************************************************ /Free Monitor; //Initialize Output Parameters POutSQLState = *Zeros; Clear POutMsgText; (7) clear POutFirstName; -------- 10 lines excluded. ------- //1. Open Call --> Open program defined file If ParCallType = ConstOpenCall; (8) Open(E) SplitF; SetLL *Start SplitF; Return; EndIf; //2. Fetch Calls --> Read next record from program defined file If ParCallType = ConstFetchCall; Read SplitF DSSplitF; (9) If Not %EOF(SplitF); POutFirstName = %Trim(DSSplitF.FirstName); (10) POutName = %Trim(DSSplitF.Name); POutZipCode = DSSplitF.ZipCode; POutCity = %Trim(DSSplitF.City); Monitor; POutBirthDay = %Date(DSSplitF.BirthDay: *ISO); On-Error; Clear POutBirthDay; EndMon; Return; EndIf; EndIf; //3. Close Call --> Close File / End Program If ParCallType = ConstCloseCall or %EOF(SplitF); (11) Close(E) SplitF; POutSQLState = '02000'; EndIf; On-Error; POutSQLState = '38X11'; POutMsgText = 'Error occurred!'; EndMon; Return; /End-Free
After the SPLITFU program is compiled successfully, it is registered as an external UDTF by running the following CREATE FUNCTION statement.
The RETURNS TABLE clause must match the output parameter definition within the HLL program. Because the character columns are defined as varying length character fields within the RPG program (15A Varying), the VARCHAR data type must be used. The packed data type defined in RPG is identical to the DECIMAL data type.
Listing 11: Registering the SPLITFU program as an external UDTF
CREATE FUNCTION HSUDTF/SPLITF_FNC ( ) RETIURNS TABLE (ParFirstName VarChar(50) , ParName VarChar(50) , ParZipCode Decimal(5, 0), ParCity VarChar(50) , ParBirthDay Date) LANGUAGE RPGLE NOT DETERMINISTIC NO SQL DISALLOW PARALLEL NOT FENCED EXTERNAL NAME 'HSUDTF/SPLITFU' PARAMETER STYLE DB2SQL;
An invocation of the
SplitF_Fnc UDTF is shown in Listing 12. While running the UDTF, the SPLITFU program-described file is read sequentially and the record data is extracted into columns.
SplitF_Fnc UDTF is based on a traditional RPG program that means the library list in the environment where the UDTF is run must be set correctly. In environments where SQL naming conventions are used, the default or the current schema is searched to find unqualified, specified tables or views in SQL statements. However, the current schema is not used to find files accessed with native I/O in HLL programs.
Listing 12: Running an external SplitF_Fnc UDTF
SELECT * FROM TABLE(SplitF_Fnc()) x WHERE ParZipCode > 75000 AND ParBirthDay > '1975-01-01';
The following example shows the result table returned by the previous query. By specifying a WHERE clause, the result is limited to rows with a zip code higher than 75000 and a birthday after January 1, 1975.
Figure 5: Result table from the SplitF_Fnc UDTF
External UDTF reading data from a comma-separated file
Accessing data located in stream files such as (comma separated values) CSV directly with SQL might not be easy, because the content is normally not stored in columns with an identical length. The data has to be read sequentially and the data splits depending on the embedded separators. Most HLL include predefined functions for reading and writing stream files and even SQL provides a method by using file reference variables for accessing data located within the integrated file system.
Figure 6 shows the Sales1.csv file located within the /home/Hauser directory within the integrated file system. Each row of the Sales1.csv file consists of five columns (Customer No, Item No, Item Description, Sales Date, and Amount) separated by a semi colon (;). Each row ends with carriage return-line feed (CRLF). Text information and date values are enclosed within double quotes. Trailing blanks have been trimmed so the column information does not always start in the same position.
Figure 6: Sales1.csv file contents
Listing 13 contains the source code of the SalesCsv procedure located in the READCSV service program. This procedure will be registered as the SalesCsv_Fnc UDTF.
Within the SalesCsv procedure, the IFS file, /home/Hauser/Sales1.csv is accessed. The data between the current position and the next CRLF is read, split into columns, and the surrounding double quotes removed. The sales date is converted into an SQL date value, while the amount is converted into a numeric value.
In this example, the data in the IFS file is accessed with embedded SQL by using a character large object (CLOB) file reference variable.
Instead of using embedded SQL, stream file APIs might have been used for accessing the IFS file data. The stream file APIs might be more flexible than the file reference variables and may also perform better. But, file reference variables are easier to implement and understand because pointer handling is not required.
Because there is no native data type for CLOB file variables in RPG, the variable must be defined with the SQLTYPE(CLOB_File) keyword. The SQL precompiler converts a file reference variable into a data structure consisting of four subfields, all beginning with the name of the declared variable but followed by different suffixes:
- Suffix _NL Length of the IFS file name (Integer = 10I 0)
- Suffix _DL Length of the data returned (output) (Integer = 10I 0)
- Suffix _FO File operation (must be set within the program) (Integer = 10I 0)
- Suffix _Name IFS file name (255 byte character with fixed length = 255A)
CLOB file variables can be used within (embedded) SQL statements similar to any character variable. In this way, a file reference variable (or better the content of the file where it points to) can be scanned with the LOCATE scalar function to find a specific string or the SUBSTR function can be used for moving a part of the stream file content into a host variable.
Within the source code, the
GblCsvFile global variable is defined as the CLOB_File (1) variable to be used to get access to the IFS file. The variable is defined globally because the information must remain persistent across different procedure calls.
Defining a local static CLOB_FILE variable within the procedure will not work, because the SQL precompiler ignores the STATIC keyword.
LocStrPosData variable is defined as a local static variable (2), because it contains the start position of the next data block to be read. Instead of using static variables, this information could have been stored in the scratchpad memory by using the SCRATCHPAD attribute and defining a SCRATCHPAD parameter variable.
When the open call is performed, the CLOB_File variable and the respective data structure subfields are initialized with the IFS file name (GblCsvFile_Name), the length of the IFS file Name (GblCsvFile_NL), and the file operation (GblCsvFile_FO).
SQFRD is a constant value that is included by the SQL precompiler indicating that the IFS file is read only. The
LocStrPosData static variable is set to 1, resulting in the pointer being set at the beginning of the IFS file data. (3)
When the fetch call is performed, the position of the next CRLF (Hex value x'0D25') is determined by using the LOCATE scalar function. (4)
If a CRLF is found, the data length between the current start position and the position of the next CRLF is determined and the data beginning with the current start position and the calculated length is read into a host variable by using the SUBSTR scalar function. If no CRLF is found, all data beginning with the current start position is read into a host variable by using the SUBSTR scalar function (5).
After the data is transferred into a host variable, it can be processed with the native RPG operation codes and built-in functions. In this example, the next separator is searched by using the %SCAN function (6). The column data is transferred into the output parameters. Blank spaces and double quotes are removed with the %TRIM function (7).
The date character value is converted into a SQL date value by using the %DATE function (8). For invalid date values, a NULL value is returned by setting the appropriate indicator parameter variable to -1 (9).
The character representation of the amount is converted into a numeric value with the %DEC function (10). For invalid numeric values, a NULL value is returned.
On the close call, the SQLSTATE is set to '02000' and the procedure is ended.
Listing 13: Procedure SalesCsv – Accessing a csv file in the IFS
H NoMain *---------------------------------------------------------------------- * Prototypes and other Copy Member /Include QIDTFSRC,ReadCsvPro /Include QUDTFSRC,CallType *---------------------------------------------------------------------- * Defining *.csv file to be consumed D GblCsvFile S SQLType(CLOB_File) (1) *********************************************************************** * Read Data from *.csv file *********************************************************************** P SalesCsv B Export D SalesCsv PI * Input Parameters for UDTF * Output Paramters - Columns to be returned D POutCustNo 15A Varying D POutItemNo 22A Varying D POutSalesDate D DatFmt(*ISO) D POutAmount 11P 2 * NULL Indicators for all input parameters * NULL Indicators for output parameters D POutCustNoInd 5I 0 D POutItemNoInd 5I 0 D POutSalesDateInd... D 5I 0 D POutAmountInd 5I 0 * SQL Function Parameters D POutSQLState 5A D ParFctName 517A varying Const D ParSpecName 128A varying Const D POutMsgText 1000A varying D ParCallType 10I 0 * Local Variables D LocIndex S 3U 0 D LocData S 256A Varying D LocDataLen S 10I 0 D LocNextCRLF S 10I 0 D LocColData S 256A Varying D LocColDataLen S 10I 0 D LocNextSep S 10I 0 D LocStrPosSep S 10I 0 inz(1) D LocStrPosData S 10I 0 Static (2) *---------------------------------------------------------------------- /Free Monitor; //Initialize Output Parameters POutSQLState = *Zeros; Clear POutMsgText; clear POutCustNo; ----- 8 lines excluded. ----- //1. Open Call --> Initialize CLOB File and Start Position If ParCallType = ConstOpenCall; (3) LocStrPosData = 1; Clear GblCsvFile; GblCsvFile_Name = '/home/Hauser/Sales1.csv'; GblCsvFile_NL = %Len(%Trim(GblCsvFile_Name)); GblCsvFile_FO = SQFRD; Return; EndIf; //2. Fetch Calls - Sub-Sequent Calls --> Read data from *.csv file If ParCallType = ConstFetchCall; Clear LocNextCRLF; Clear LocData; // 2.1. Determine next CRLF (4) Exec SQL Set :LocNextCRLF = Locate(x'0D25', :GblCsvFile, :LocStrPosData); // 2.2. Retrieve Data between the current position // and next CRLF or End of File If LocNextCRLF > *Zeros; (5) LocDataLen = LocNextCRLF - LocStrPosData; //Data Length Exec SQL Set :LocData = Substr(:GblCsvFile, :LocStrPosData, :LocDataLen); Else; Exec SQL Set :LocData = Substr(:GblCsvFile, :LocStrPosData); EndIf; LocStrPosData += LocDataLen + 2; //Start position next data // 2.3. Split retrieved Data into columns // (depending on the ';' separators) If %Len(%Trim(LocData)) > *Zeros; ) LocStrPosSep = 1; For LocIndex = 1 to 5; LocNextSep = %Scan(';': LocData: LocStrPosSep); (6) LocColDataLen = LocNextSep - LocStrPosSep; If LocNextSep > *Zeros; LocColData = %Trim(%Subst(LocData: LocStrPosSep: (7) LocColDataLen): '" '); Else; LocColData = %Trim(%Subst(LocData: LocStrPosSep): '" '); EndIf; LocStrPosSep += LocColDataLen + 1; Select; When LocIndex = 1; POutCustNo = LocColData; When LocIndex = 2; POutItemNo = LocColData; When LocIndex = 3; Iter; When LocIndex = 4; Monitor; POutSalesDate = %Date(LocColData: *ISO); (8) On-Error; POutSalesDateInd = -1; (9) EndMon; When LocIndex = 5; Monitor; POutAmount = %Dec(LocColData: 11: 2); (10) On-Error; POutAmountInd = -1; EndMon; EndSL; EndFor; Return; EndIf; EndIf; //3. End Processing If ParCallType = ConstCloseCall or %Len(%Trim(LocData)) = *Zeros; POutSQLState = '02000'; Return; EndIf; On-Error; POutSQLState = '38X011'; POutMsgText = 'Error occurred when executing the UDTF'; EndMon; Return; /End-Free P SalesCsv E
Because SalesCsv is an exported procedure within a service program, a two-step compilation is required. The READCSV module will be bound to the READCSV service program.
The SalesCsv procedure in the READCSV service program is registered as an external UDTF with the CREATE FUNCTION statement in Listing 14.
Because embedded SQL is used within the procedure, READS SQL DATA must be specified.
Both, procedure name and the service program name must be specified with the EXTERNAL NAME option. The procedure name must follow the service program name and must be embedded in parenthesis (1). In an RPG procedure, names are internally converted into uppercase (except when a case-sensitive procedure name is specified in the prototype with the EXTPROC keyword), and therefore, the procedure name must be specified in capital letters.
Listing 14: Registering SalesCsv as an external UDTF
CREATE FUNCTION HSUDTF/SalesCsv_Fnc () RETURNS TABLE (CustNo VarChar(15) , ItemNo VarChar(22) , SalesDate Date , Amount Dec(11, 2)) LANGUAGE RPGLE READS SQL DATA DISALLOW PARALLEL NOT FENCED EXTERNAL NAME 'HSUDTF/READCSV(SALESCSV)' (1) PARAMETER STYLE DB2SQL ;
When running the SELECT statement in Listing 15, the Sales1.csv file is directly read with the
SalesCsv_Fnc UDTF and returned as a table. The result table returned by the
SalesCsv_Fnc UDTF is joined with the Address Master table to get the customer number and name to return the total amount by customer.
Listing 15: Running an external UDTF - SalesCsv_Fnc
SELECT CustNo, CustName1, Sum(Amount) TotalSales FROM TABLE(SalesCsv_Fnc()) x join AddressX USING (CustNo) GROUP BY CustNo, CustName1 ORDER BY CustNo, CustName1
Figure 7: Result table from SalesCsv_Fnc
External UDTF returning data out of a user space filled by a System API
A lot of APIs provide the same information as CL commands. Quite often, however, those APIs provide information and functions that are not available through CL commands.
Instead of writing the data into a physical file, the List APIs return information to user spaces. A user space is an object consisting of a collection of bytes that can be used for storing any user-defined information.
User spaces are created, accessed, and deleted by using the following APIs:
- QUSCRTUS – Create User Space API
- QUSPTRUS – Retrieve Pointer to User Space API
- QUSRTVUS – Retrieve User Space API
- QUSDLTUS – Delete User Space API
Before calling a List API, the user space has to be created with the QUSCRTUS API.
To provide a consistent design, List APIs use a general data structure. This general data structure starts with generic header information that provides among others the offset of the list data section, the number of entries, and the size of each list entry.
Note: Within the API and data structure descriptions, the offset of the data structure subfields is used. The offset position is 1 before the beginning of the data structure subfield, that is, Offset 124 means, the appropriate data structure subfield must begin in position 125.
The example in Listing 16 shows the source code for the ListMember procedure located in the LISTMBR service program to be registered as a UDTF.
Within the ListMember procedure, the QUSLMBR (List Database File Members) API is called for returning the information provided by the MBRL0200 format in a user space.
According to the API description, the MBRL0200 format returns the following information that can be read from the user space and retuned as result from the UDTF:
- Member name
- Source type (for example, RPGLE or CLLE)
- Creation date and time (in the CYYMMDDHHMMSS format)
- Last source change date and time (in the CYYMMDDHHMMSS format)
- Member text description
- Member text description CCSID
For receiving the data out of the user space, global data structures based on reference data structures (located within the copy members) are created and associated with pointers. The generic header information will be stored within the
GblDSListhdr data structure (2). The list data will be read into the
GblDSMBRL0200 data structure (3). Those data structures are defined globally for keeping the information persistent between the different procedure calls.
GblDSUsrSpc data structure contains the name and library for the user space to be generated. (4).
GblDSFileQual (5) data structure, the incoming parameter values (file and library) will be stored, because the QUSLMBR API requires the information being passed in a single 20-character string.
GblDSErrCde (6) is a data structure that is required in a lot of APIs for returning information if an error occurs.
ListMember procedure or better the UDTF requires two input parameters, the file name and the library name (7). Both parameters are defined as varying length character fields, which is the best practice for passing input strings.
Contrary to the previous examples, the UDTF is registered with the FINAL CALL option.
When the first call is performed, the user space will be created by running the QUSCRTUS API and will be filled with the list member information by calling the QUSLMBR API. The counter variable (
LocCount) is initialized. (8)
At the open call, a pointer on the user space is resolved by running the QUSPTRUS API and the data is read into the
GblDSGenHdr generic header data structure (9).
When the fetch call is performed for the first time, the pointer is positioned at the beginning of the list data based on the information returned in the generic header data structure. For all subsequent fetch calls, the pointer is set to the next entry by adding the length of the data to the current pointer position and the data set is read into the
GblDSMBRL0200 data structure. (10)
The subfield values in
GblDSMBRL0200 are transferred to the output parameters (11).
The 13-character date values for the creation and change date are converted into SQL timestamp values. If an invalid date or time value is encountered, a NULL value is returned. (12).
On the close call invocation or as soon as all list entries are processed, SQLSTATE '02000' is returned.
When the final call is performed, the user space is deleted by running the QUSDLTUS API. (13)
Listing 16: Procedure ListMember – Member information returned by the QUSLMBR API
H Debug NoMain *********************************************************************** * Prototypes and Copy Members (1) /Include QUDTFSRC,ListMbrPro /Include QUDTFSRC,ProUSAPI /Include QUDTFSRC,CallType *********************************************************************** * Global Variables * Data Structure Generic Header for List APIs D GblDSListHdr DS LikeDS(RefDSGenHdr) (2) D Based(GblPtrDSGenHdr) * Data Structure for List API QUSLMBR – Format MBRL0200 D GblDSMBRL0200 DS LikeDS(RefDSMBRL0200) (3) D Based(GblPtrDSMBRL0200) D GblDSUsrSPc DS Qualified (4) D Name 10A Inz('MBRLIST') D Lib 10A Inz('QTEMP') D GblDSFileQual DS LikeDS(RefDSObjQual) (5) D Inz(*LikeDS) * Error Data Structure – Format ERRC0100 (6) D GblDSErrCde DS LikeDS(RefDSErrCde) Inz(*LikeDS) ********************************************************************** * UDTF based on API QUSLMBR (List Database Member) and Format MBRL0200 *********************************************************************** P ListMember B Export D ListMember PI * Input Parameters for UDTF (7) D ParFile 10A Varying D ParLib 10A Varying * Output Parameters: Columns to be returned D POutFile 10A D POutLib 10A D POutMbr 10A D POutMbrType 10A D POutDescr 50A D POutCrtTS Z D POutUpdTS Z * NULL Indicators for all input parameters D ParFileInd 5I 0 D ParLibInd 5I 0 * Null Indicators for all output parameters D POutFileInd 5I 0 D POutLibInd 5I 0 D POutMbrInd 5I 0 D POutMbrTypeInd... D 5I 0 D POutDescrInd 5I 0 D POutCrtTSInd 5I 0 D POutUpdTSInd 5I 0 // SQL Function Parameters D POutSQLState 5A D ParFctName 517A varying Const D ParSpecName 128A varying Const D POutMsgText 1000A varying D ParCallType 10I 0 * Local Variables D LocDSAPIDate DS 13 Qualified Inz D Cty 1A D YMD 6A D HMS 6A D LocDate S D DatFmt(*ISO) D LocCount S 10I 0 Static *---------------------------------------------------------------------- /Free Monitor; //Initialize Output Parameters POutSQLState = *Zeros; Clear POutMsgText; clear POutFile; ----- 13 lines excluded. ----- //1. First Call --> Create/Fill User Space If ParCallType = ConstFirstCall; (8) QUSCRTUS(GblDSUsrSpc: *Blanks: 256: x'00': '*USE': 'Member List': '*YES': GblDSErrCde); GblDSFileQual.Name = ParFile; GblDSFileQual.Lib = ParLib; QUSLMBR(GblDSUsrSpc: 'MBRL0200': GblDSFileQual: '*ALL': *Zeros: GblDSErrCde); Clear LocCount; Return; EndIf; //2. Open Call --> Read Header Information If ParCallType = ConstOpenCall; QUSPTRUS(GblDSUsrSPc: GblPtrDSGenHdr: GblDSErrCde); (9) Return; EndIf; //3. Fetch Calls - Sub-Sequent Calls --> Read User Space Entries if ParCallType = ConstFetchCall; LocCount += 1; If LocCount = 1; (10) GblPtrDSMBRL0200 = GblPtrDSGenHdr + GblDSListHdr.OffsetData; elseIf LocCount <= GblDSListHdr.NbrListEntry; GblPtrDSMBRL0200 += GblDSListHdr.SizeListEntry; endif; POutFile = ParFile; POutLib = ParLib; POutMbr = GblDSMBRL0200.Member; (11) POutMbrType = GblDSMBRL0200.SourceType; POutDescr = GblDSMBRL0200.TextDescr; Monitor; LocDSAPIDate = GblDSMBRL0200.CrtDate; (12) LocDate = %Date(LocDSAPIDate.YMD: *YMD0); POutCrtTS = LocDate + %Time(LocDSAPIDate.HMS: *HMS0); On-Error; POutCrtTSInd = -1; EndMon; Monitor; LocDSAPIDate = GblDSMBRL0200.UpdDate; (12) LocDate = %Date(LocDSAPIDate.YMD: *YMD0); POutUpdTS = LocDate + %Time(LocDSAPIDate.HMS: *HMS0); On-Error; POutUpdTSInd = -1; EndMon; If LocCount < GblDSListHdr.NbrListEntry; Return; Endif; EndIf; //4. Close Call - End Processing If ParCallType = ConstCloseCall or LocCount >= GblDSListHdr.NbrListEntry; POutSQLState = '02000'; Return; EndIf; //5. Final Call – Delete User Space If ParCallType = ConstFinalCall; QUSDLTUS(GblDSUsrSpc: GblDSErrCde); Return; EndIf; On-Error; POutSQLState = '38X11'; POutMsgText = 'Error Occurred'; EndMon; Return; /End-Free P ListMember E
The service program containing the exported procedure
ListMember is created successfully. The CREATE FUNCTION statement shown in Listing 7 can be run for registering the
ListMember procedure as an external UDTF.
In Listing 17, the
ListMember_Fnc UDTF is performed for returning all the members with a member type that includes RPGLE that are modified on the current date. The result table returned by running this query is shown in Figure 8.
Listing 17: Running the ListMember_Fnc external UDTF
SELECT MbrFileLib, MbrFile, Mbr, MbrChgTS, MbrDescr FROM TABLE(ListMember_Fnc('QSRC', 'HSUDTF')) x WHERE MbrType like '%RPGLE%' AND DATE(MbrChgTS) = CURRENT_DATE;
Figure 8: Result table from the ListMember_Fnc UDTF
Based on the examples presented in this article, you should now be able to write your own UDTFs. The combination of SQL and external UDTF support makes it easy for you to access data in traditional DB2 objects and data in non-database objects, such as program-described files, comma separated files, or user spaces.
And now, have fun in writing and using your own UDTFs.
- IBM i – DB2 for i SQL Reference - 7.1
- IBM i – Database SQL programming - 7.1
- Embedded SQL Programming
- Stored Procedures, Triggers, and User-Defined Functions on DB2 Universal Database for iSeries
- UDTFs: The Unsung DB2 Function
- Rational Development Studio for i - ILE RPG Language Reference – 7.1
- List APIs overview
- IBM DeveloperWorks DB2 for i - Forum
- IBM DeveloperWorks - IBM i Technology Updates