Data Web Services on DB2 for z/OS, Part 2: Integrate z/OS data sets using user-defined functions

Transform files to information

Learn by example how to create an external user-defined function (UDF) on DB2® for z/OS® that can read z/OS Unix® System Services (USS) files and z/OS data sets. You can then use existing SQL built-in functions and your own UDFs to perform data manipulation on the information stored in the data sets and files. You can also expose these "file-reading" SQL statements as IBM Data Web Services, which results in the data sets and files being available to Web service operations.

Josef Klitsch (Josef.Klitsch@ch.ibm.com), IT Specialist, IBM

Josef Klitsch photoJosef Klitsch is an IT Specialist with IBM Software Services, currently assigned to work for IBM PSSC Montpellier, New Technology Center. Josef provides consulting services related to DB2 for z/OS. You can reach him at josef.klitsch@ch.ibm.com.



09 July 2009

Also available in Spanish

Overview

With IBM Data Web Services you can easily integrate database operations in a Service Oriented Architecture (SOA) by transforming single SQL operations into Web services. Valid SQL operations that you can transform include SQL DML-initiated user-defined function (UDF) invocations.

This article provides an introduction to DB2 for z/OS external UDFs and shows how they help to integrate z/OS file and data set information with Data Web Services. The article then shows:

  • Examples of external UDFs, written in C and implemented in DB2 for z/OS, that read z/OS UNIX System Services (USS) files and z/OS data sets
  • How to integrate the UDF results with SQL operations
  • How to use the power of SQL to map the file record field structure to result table columns

While reviewing the ReadFile C program, you may be pleasantly surprised to see how complex programming tasks such as dynamic data set allocation and QSAM and VSAM access method selection can be transparently handled by the z/OS XL C/C++ runtime environment.

The article then demonstrates how to use IBM Optim Development Studio (formerly Data Studio Developer) to expose SQL statements that read z/OS data set and USS file information as Web services.


Prerequisites

This article assumes that you know what IBM Data Web Services are and that you are familiar with the basics of using IBM Optim Development Studio (or Data Studio, or Data Studio Developer) to create, test, and deploy IBM Data Web Services. If you are unfamiliar with either of these topics, refer to the Resources section for links to additional information.

You may also want to consider reading "Making Operating System Calls from SQL." This article provides useful information on how to read files from external user-defined functions.


Introduction to DB2 for z/OS user-defined functions

A user-defined function (UDF) is an extension to the SQL language and is similar to a host language program or function. UDFs are invoked from within SQL statements, thus providing integration with SQL DML. DB2 for z/OS supports the following types of UDFs:

  • Sourced UDFs are based on existing user-defined functions and built-in functions.
  • SQL UDFs are defined in SQL and can return an SQL expression.
  • External UDFs can be written in different programming languages including Assembler, C, COBOL, PL/1, and Java®.

UDFs are divided into two categories:

  • User-defined scalar functions return a single-value answer each time they are invoked.
  • User-defined table functions return a table to the invoking SQL statement. Table UDF functionality in DB2 for z/OS is only available with external UDFs.

DB2 for z/OS external UDFs

DB2 for z/OS external UDFs can be developed as scalar or as table functions. They have many attributes in common with external stored procedures. Like external stored procedures they:

  • Are server-side applications that can be written in different programming languages
  • Can access non-DB2 resources such as QSAM and VSAM data sets, message queues, CICS programs, IMS transactions, and databases like any other application program executing in the z/OS environment
  • Can make use of RACF security to control access to non-DB2 resources
  • Can reuse existing business logic by calling existing applications as subprograms
  • Are executed in an environment managed by z/OS Workload Manager (WLM)
  • Can apply external UDF changes using DB2 for z/OS and WLM interfaces without impacting service availability
  • Require DB2 EXECUTE authority to be held by the UDF client

In contrast to stored procedures, DB2 for z/OS external UDFs provide integration with SQL DML operations. For instance:

  • External UDFs can be invoked from SQL DML operations that are executed by tools such as DB2I SPUFI, DSNTEP2, DSNTIAD, DSNTIAUL, DB2 for z/OS CROSSLOADER cursor load, and by the DB2 LUW EXPORT utility.
  • For table population, an external table UDF can be referenced in a fullselect or a common table expression embedded in an SQL DML insert operation.
  • A table UDF does not require a DB2 table for building the result table. The result table is handled internally by the DB2 for z/OS DBM1 address space.
  • SQL language elements such as expressions, predicates, functions, and subselect clauses (WHERE, GROUP BY, HAVING, ORDER BY, FETCH FIRST) can be applied to the result table returned by user-defined table functions.
  • DB2 for z/OS built-in functions and user-defined functions can be applied to results returned by external UDFs and vice versa.

Reading files through an external UDF (ReadFile UDF)

Significant amounts of unstructured information are stored in z/OS files and data sets that are unavailable for SQL DML processing. You can provide SQL DML integration of these z/OS files and data sets by either loading them into DB2 tables or by providing either file or data set access from SQL through external UDFs. With the latter option you avoid the overhead of creating and loading additional DB2 tables, which can help to save time and resources especially if you want to perform ad hoc analysis on information that is only stored in files or data sets and not yet available in DB2 tables. With files and data sets integrated with SQL DML through UDFs, you can easily apply the power of SQL to information that is stored in z/OS data sets and USS files. Throughout this article, the term ReadFile UDF refers to an external UDF that reads a USS file or z/OS data set.

The ReadFile UDF provides two ways in which the file content can be returned by the UDF and subsequently available to the SQL operation:

  • Scalar value— the external scalar UDF used in this article returns a CLOB(1M) scalar value.
  • Result table— the file to be read by the UDF has a well known record structure, and this record structure will be mapped by the ReadFile UDF to a result table. The ReadFile table UDF used in this article returns a result table that contains one result table row for each file record. Each result table row has one VARCHAR (32600) column containing one file record. The table UDF does not perform any mapping of record fields to result table columns. The mapping will be performed outside the UDF within the same SQL statement. The external table UDF in that sense provides a generic interface for reading file records from SQL. This means that you can use the external table UDF for sequentially accessing file records of different sources such as QSAM, VSAM, and USS files and then perform record field-to-column mapping individually via SQL.

ReadFile scalar UDF

This scenario addresses a situation in which you want to perform ad hoc analysis on XML data that is stored in a file (the term file refers to z/OS data sets or USS files). The file that you want to analyze is illustrated in Listing 1.

Listing 1. ReadFile scalar UDF XML file
<zNTC>
     <member>
         <fstName>Eric</fstName><name>Cichiello</name><Area>Manager</Area>
     </member>
     <member>
         <fstName>Cedrine</fstName><name>Madera</name><Area>BI Specialist</Area>
     </member>    
     <member>
         <fstName>Michael</fstName><name>Schapira</name><Area>Cognos System z</Area>
     </member>
     <member>
         <fstName>Josef</fstName><name>Klitsch</name><Area>DB2 for z/OS</Area>
     </member>    
</zNTC>

Apply pureXML to file content

For this scenario, you perform the analysis using DB2 for z/OS pureXML. This requires that the XML document is accessible as a scalar value in the select list of the SELECT clause of an SQL statement. The pureXML query and its result are shown in Figure 1. The scenario uses a DB2 for z/OS common table expression (CTE) to invoke the ReadFile scalar UDF from SQL. The CTE uses pureXML to map XML elements to DB2 for z/OS result table columns.

Figure 1. ReadFile scalar UDF scenario overview
Diagram showing the pureXML query and results as described above

ReadFile scalar UDF execution environment

The ReadFile UDF execution environment involves the DB2 client with its SQL query, the DB2 for z/OS DBM1 and DDF address spaces, the WLM application environment WLMENV, the external ReadFile UDF program, the z/OS XL C/C++ runtime environment, and the external file storing the XML document that is to be analyzed.

A WLM application environment (WLM APPLENV) is required for scheduling and executing external UDF programs. A WLM APPLENV is normally created at DB2 installation as part of the DB2 for z/OS installation verification procedure (IVP). For example, the IVP job DSNTEJ2U prepares and executes the C/C++ sample UDFs. Upon successful IVP completion, a WLM APPLENV with a name of WLMENV exists in your DB2 for z/OS environment. You can use that WLM APPLENV for executing the external UDFs that are used in this article. For more information on setting up WLM for external UDFs, see "Chapter 4, Setting up and managing Workload Manager" of Redbook SG24-7604, DB2 9 for z/OS Stored Procedures: Through the CALL and Beyond, which is linked to in the Resources section.

Figure 2 illustrates the chronological processing flow of the external UDF execution environment components.

Figure 2. ReadFile scalar UDF scenario execution environment
Diagram depicting the 8 steps of the processing flow described in detail below
  1. A local or remote DB2 for z/OS client issues an SQL query that invokes the ReadFile UDF by referencing the UDF in the select list of the SELECT clause. For remote IBM Data Server Client connections the following additional processing is performed by DB2:
    1. The IBM Data Server Client connects to DB2 for z/OS DDF (DB2 for z/OS distributed data facility) through TCP/IP and DRDA. The DB2 for z/OS DDF address space implements the TCP/IP – DRDA interface in DB2 for z/OS.
    2. DB2 for z/OS DDF passes the SQL request for SQL processing to the DB2 for z/OS DBM1 address space (DB2 for z/OS database manager). The enclave that is created and used by the DDF address space for processing the SQL request runs in a WLM service class that corresponds with the service classification specified in the WLM policy.
  2. The DB2 for z/OS DBM1 address space determines which WLM application environment to use for scheduling the ReadFile UDF program by obtaining the WLM APPLENV name from the DB2 catalog and directory. The WLM APPLENV name in this scenario is WLMENV.
  3. The DB2 for z/OS DBM1 address space interfaces with WLM to execute the UDF program in the WLMENV WLM application environment.
  4. The WLM policy contains the WLMENV application environment attributes such as invocation parameters and the name of the JCL procedure that is to be used by WLM for address space creation. For the ReadFile UDF, the JCL procedure name is WLMENV.
  5. The WLMENV JCL procedure references the DB2.DBLN.RUNLIB.LOAD load module library in the STEPLIB library DD statement. There is no need to define a JCL DD-statement in the WLMENV JCL procedure for the JOSEF.XML data set because the external UDF program uses the z/OS XL C/C++ runtime environment. The z/OS XL C/C++ runtime environment transparently performs dynamic data set allocation during data set open, and data set deallocation during data set close processing.
  6. The DB2.DBLN.RUNLIB.LOAD library contains the ReadFile UDF executable, which is loaded by z/OS prior to UDF program execution.
  7. The ReadFile UDF program receives control. It uses the z/OS XL C/C++ functions fopen, fread, and fclose to open, read, and close the file JOSEF.XML. Then it returns the content of the file JOSEF.XML as a scalar value to the DB2 for z/OS DBM1 address space.
  8. The DB2 for z/OS DBM1 address space uses the scalar value returned by the ReadFile UDF to continue with its SQL processing and returns the SQL result to the DB2 client application.

Building user-defined functions

The external ReadFile UDFs described in this article are written in C. To build the UDF load modules you have to compile the C source files and link-edit the load modules into the load library that is used by the ReadFile WLM application environment. DB2 package binds are not required because these UDFs do not contain SQL operations. You can use the JCL procedure DSNHC that is provided by DB2 for building the ReadFile UDFs. In a subsequent job step you can use the DB2 IVP program DSN8ED6 to refresh the WLMENV application environment. This is required in order to reflect the up-to-date level of the ReadFile UDF load module in the WLM application environment. An example of the JCL is shown in Listing 2.

Listing 2. UDF compile and link-edit JCL sample
//PH02US03 EXEC DSNHC,MEM=READFLES,COND=(4,LT),
//         PARM.PC=(HOSTC,CCSID(1047),MARGINS(1,72),STDSQL(NO)
//             SOURCE,XREF),
//         PARM.C=SOURCE RENT XREF MARGINS(1,72),
//         PARM.LKED=MAP,RENT,REUS,AMODE=31,RMODE=ANY
//PC.DBRMLIB   DD DISP=SHR,DSN=JOSEF.DBRMLIB.DATA(READFLES)
//PC.SYSLIB    DD DISP=SHR,DSN=JOSEF.SOURCE.DATA
//PC.SYSIN     DD DISP=SHR,DSN=JOSEF.SOURCE.DATA(READFLES)
//LKED.SYSLMOD DD DISP=SHR,DSN=DB2.DBLN.RUNLIB.LOAD(READFLES)
//LKED.SYSIN   DD *
INCLUDE SYSLIB(DSNRLI)
NAME READFLES(R) 
//* ----------------------------------------------------
//* Refresh WLMENV WLM APPLENV
//* Parameters: WLMENV  --> WLM APPLENV to be refreshed
//*                       DBLN         --> DB2 subsystem ID 
//*                       DB2GRP    --> RACF Group to use for refresh
//* ----------------------------------------------------
//WLMREFR  EXEC PGM=IKJEFT01,DYNAMNBR=20,COND=(4,LT) 
//SYSTSPRT DD  SYSOUT=*
//SYSPRINT DD  SYSOUT=*
//SYSUDUMP DD  SYSOUT=*
//SYSTSIN  DD  *
DSN SYSTEM(DBLN)
 RUN PROGRAM(DSN8ED6) PLAN(DSN8ED6) -
 LIB(DSN910.RUNLIB.LOAD) -
 PARMS(WLMENV DBLN DB2GRP)
END
//

For more information on preparing C/C++ external UDFs and using DB2 sample program DSN8ED6 for refreshing UDF WLM application environments, refer to the DB2 9 for z/OS Installation Guide, which is linked to in the Resources section.

Defining the ReadFile scalar UDF to DB2 for z/OS

The ReadFile scalar UDF returns the entire file content as a single CLOB value. The UDF program reads and stores the entire file in a LOB value, which is then returned to the DB2 for z/OS DBM1 address space. The input for that function is a file name that adheres to the z/OS XL C/C++ file naming rules. Therefore, a z/OS data set name is embedded in single quotes and preceded by // (for example, //'JOSEF.XML'). A USS file name is an absolute file name (for example, /tmp/filename.txt). For more details on z/OS XL C/C++ file naming rules, refer to one of the “Opening Files” sections in the z/OS XL C/C++ Programming Guide, which is linked to in the Resources section.

You can use the SQL DDL statement shown in Listing 3 to create the ReadFile scalar UDF in DB2 for z/OS.

Listing 3. ReadFile scalar UDF SQL DDL
CREATE FUNCTION C.READFILESCALAR (DSN VARCHAR(255))  /*1*/
RETURNS CLOB(1M)  /*2*/
SPECIFIC “0001 READFILE_SCALAR”  /*3*/
LANGUAGE C  /*4*/
DETERMINISTIC  /*5*/
NO SQL  /*6*/
EXTERNAL NAME READFLES  /*7*/
PARAMETER STYLE DB2SQL
RETURNS NULL ON NULL INPUT
NO EXTERNAL ACTION
NO SCRATCHPAD
NO COLLID 6
ASUTIME LIMIT 5  /*8*/
STAY RESIDENT YES  /*9*/
PROGRAM TYPE SUB  /*10*/
WLM ENVIRONMENT WLMENV  /*11*/
SECURITY USER  /*12*/
DBINFO
DISALLOW PARALLEL

The external UDF attributes used in the SQL DDL statement shown in Listing 3 are explained below:

  1. The input parameter contains a file name that adheres to the file naming rules of the z/OS XL C/C++ runtime environment.
  2. The function returns a CLOB value that is a maximum of 1 megabyte in length.
  3. The numeric part of the UDF specific name is used by the UDF program as a runtime parameter to determine the maximum size of the return area. In the above DDL example, the begin value of 0001 tells the UDF program that the return area can be up to 1 megabyte in size. This value has to correspond with the length information provided in the RETURN expression.
  4. The external function is written in C/C++.
  5. Repetitive function invocations with the same input parameter return the same scalar value.
  6. The external function contains no SQL and therefore has no DB2 package collection ID.
  7. The external UDF load module name is READFLES.
  8. The ReadFile scalar UDF has an ASUTIME limit of 5 service units (SUs). This is the total amount of processor time that the external UDF can use. If the limit is exceeded, DB2 for z/OS sends message DSNX908I to the console and terminates the UDF.
  9. The external UDF program is to stay in memory once it has been loaded.
  10. PROGRAM TYPE SUB UDF C-programs do not have a main function. The z/OS Language Environment (LE) will not perform program re-initialization. The UDF program has to take care of variable initialization.
  11. The name of the WLM application environment that the external UDF is to be executed in is WLMENV.
  12. DB2 uses z/OS external security (RACF) for controlling access to non-DB2 resources. For the ReadFile UDF RACF DATASET, access control is exercised for the user that has been authenticated to DB2 for z/OS.

The C code that implements the ReadFile scalar function is shown in Listing 4.

Listing 4. ReadFile scalar UDF C code
#pragma linkage(READFLES,fetchable)
#include <stdio.h>
#include <stdarg.h>
#include <sqludf.h>
void SQL_API_FN READFLES(
SQLUDF_VARCHAR *fileName,  /*1*/
SQLUDF_CLOB    *fileData,
SQLUDF_NULLIND *fileName_ind,
SQLUDF_NULLIND *fileData_ind,
SQLUDF_TRAIL_ARGS_ALL)
{
 int rc = 0;long fileSize = 0; long maxSize = 0;
 size_t readCnt = 0; FILE *f = NULL;

 f = fopen(fileName,”rb,byteseek”);  /*2*/
if (!f) {
   strcpy(SQLUDF_MSGTX, “Could not open file “);
   strncat(SQLUDF_MSGTX, fileName,
   SQLUDF_MSGTEXT_LEN - strlen(SQLUDF_MSGTX)-1);
   strncpy(SQLUDF_STATE, “38100”, SQLUDF_SQLSTATE_LEN);
   return;
    }
 rc = fseek(f, 0, SEEK_END);  /*3*/
 if (rc) {
   sprintf(SQLUDF_MSGTX, “fseek() failure rc = %d”, rc);
   strncpy(SQLUDF_STATE, “38101”, SQLUDF_SQLSTATE_LEN);
   return;
    }
 fileSize = ftell(f);  /*4*/
 maxSize = atoi(SQLUDF_FSPEC)*1024000 ;
 if (fileSize > maxSize ) {
   sprintf(SQLUDF_MSGTX,
   “File too large,max = %d,size=%d”, maxSize,fileSize);
   strncpy(SQLUDF_STATE, “38102”, SQLUDF_SQLSTATE_LEN);
   fclose(f);
   return;
    }
 rc = fseek(f, 0, 0);  /*5*/
 if (rc) {
   sprintf(SQLUDF_MSGTX, “fseek() failed with rc = %d”, rc);
   strncpy(SQLUDF_STATE, “38103”, SQLUDF_SQLSTATE_LEN);
   fclose(f);
   return;
    }
 readCnt = fread(fileData->data, 1, fileSize, f);  /*6*/
 if (readCnt != fileSize) {
   sprintf(SQLUDF_MSGTX, “Could not read entire file “
   “(%d vs %d)”, readCnt, fileSize);
   strncpy(SQLUDF_STATE, “38104”, SQLUDF_SQLSTATE_LEN);
   *fileData_ind = -1;
    }
 else {
   fileData->length = readCnt;
   *fileData_ind = 0;
    }
 fclose(f);  /*7*/
}

The external UDF program shown in Listing 4 performs the following processing steps:

  1. Receive the file name in z/OS XL C/C++ file name notation.
  2. Use the file name to open the file. The z/OS XL C/C++ runtime environment implicitly handles data set allocation.
  3. Position current record pointer at file end.
  4. Determine the file size to ensure no data truncation occurs. The numeric part of the function-specific name is used to determine the maximum size that is supported by the return parameter.
  5. Position current record pointer at file begin.
  6. Read the entire file into the CLOB return column and provide CLOB length information.
  7. Close the file. The z/OS XL C/C++ runtime environment implicitly handles data set deallocation.

You can install the UDF program in DB2 for z/OS by executing the JCL procedure that is explained in the Building user-defined functions section of this article.

Further usage scenarios

The scenario described in this section applied pureXML to the file content that was returned as a scalar value by the ReadFile UDF. pureXML is just one of several options you have. For example, you could also use any of the rich set of DB2 for z/OS provided built-in functions and your own existing scalar UDFs in your ad hoc analyses.

Besides ad hoc analysis, you can use the ReadFile scalar UDF to insert file content from within dynamically prepared SQL insert statements directly into LOB or XML columns. The scenario shown in Listing 5 directly inserts the file content illustrated in Listing 1 into an XML table column and subsequently uses pureXML to re-read the document for validation purposes.

Listing 5. ReadFile scalar UDF for document insert
CREATE TABLE RF.NTCTAB (XMLDOC XML); 
---------+---------+---------+---------+---------+---------+--
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+--
Insert into rf.ntctab
values ( c.readFileScalar(//JOSEF.XML));
---------+---------+---------+---------+---------+---------+--
DSNE615I NUMBER OF ROWS AFFECTED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+--
SELECT zNTCR.* FROM RF.NTCTAB,
XMLTABLE ($d/zNTC/member passing XMLDOC as ”d”
COLUMNS 
   ”1stName”           Varchar(15)       PATH fstName/text()
  ,”Name”              Varchar(15)       PATH Name/text()
  ,”Title”             Varchar(15)       PATH Area/text()
) AS zNTCR ;
---------+---------+---------+---------+---------+---------+----
1stName          Name             Title 
---------+---------+---------+---------+---------+---------+----
Eric             Cichiello        Manager
Cedrine          Madera           BI Specialist
Michael          Schapira         Cognos System z
Josef            Klitsch          DB2 for z/OS
DSNE610I NUMBER OF ROWS DISPLAYED IS 4

Using the insert technique shown in Listing 5 can be useful because file reference variable support for XML and LOB columns is unavailable in DB2 for z/OS tools such as DB2I SPUFI, DSNTEP2, DSNTIAD, and DSNTIAUL.

RACF security consideration

The ReadFile UDF accesses data sets and files that are protected by RACF. With SECURITY USER activated for the ReadFile UDF, you use RACF to control access to files and data sets by the primary authorization ID of the UDF invoker. The ReadFile UDF receives a RACF access failure if it tries to open a file for which the UDF invoker does not have the appropriate access privilege. A corresponding RACF message detailing the security violation is sent to the z/OS SYSLOG (see Listing 6). In this situation, the ReadFile UDF is not able to open the file and returns SQLSTATE 38100 to DB2 for z/OS, which in turn provides SQLCODE -443 to the DB2 client application indicating a processing failure (see Listing 7).

Listing 6. ReadFile UDF RACF message security violation
------ Data Set ---------------------------------
ICH408I USER(SYSADM  ) GROUP(ADMIN   ) NAME( DB2                )
JOSEF.XML CL(DATASET ) VOL(DMTU03)                             
INSUFFICIENT ACCESS AUTHORITY
FROM JOSEF.** (G) 
ACCESS INTENT(READ ) ACCESS ALLOWED(NONE )

------ Unix System Services file --------------- 
ICH408I USER(SYSADM  ) GROUP(ADMIN   ) NAME( DB2                )
 /u/josef/zntc_bi_team.xml   
 CL(DIRSRCH ) FID(01C4D4E3E4F0F200042F000000000003)             
 INSUFFICIENT AUTHORITY TO OPEN                                 
 ACCESS INTENT(--X)  ACCESS ALLOWED(OTHER      ---)             
 EFFECTIVE UID(0000000607)  EFFECTIVE GID(0000000011)
Listing 7. SQLCODE -443 RACF security violation
SQL0443N  Routine “READFILESCALAR” (specific name “0001 READFILE_SCALAR”) has returned an
error SQLSTATE with diagnostic text “Could not open file //JOSEF.XML”.  SQLSTATE=38100

ReadFile table UDF

This scenario addresses a situation in which you want to perform ad hoc analysis on data that is stored in file records that have a well known record structure. During analysis you use existing SQL built-in functions or UDFs to map file record field information to result table columns. The files that you want to analyze are of two different record structures:

  • Fixed — each field can be found at fixed record locations
  • Delimited — fields are separated by delimiters and their location within each record varies

Map fixed file record structure to result table columns

This scenario uses the ReadFile UDF together with SQL built-in functions to map file record fields to result table columns. Listing 8 shows the file records that are used in the scenario.

Listing 8. Table UDF fixed record structure file content
----+----1----+----2----+----3----+----4----+----5----+----6
Eric               Cicchiello          zNTC Manager         
Cedrine            Madera              zNTC BI specialist   
Michael            Schapira            zNTC Cognos System z 
Josef              Klitsch             zNTC DB2 for z/OS

The structure of these file records is illustrated in Table 1.

Table 1. ReadFile table UDF fixed record structure
Column Start Column End Description
01 19 First name
20 39 Name
40 44 Department
45 60 Job role

For the analysis part of this scenario, you use DB2 for z/OS built-in functions to unstring the file records. The query that you run and its query result are shown in Figure 3. In the scenario illustrated below, you use a DB2 for z/OS common table expression (CTE) to invoke the ReadFile table UDF from SQL. Within the same CTE, you use DB2 built-in functions SUBSTR, CHAR, and STRIP to map the structure of the UDF result table rows to result table columns.

Figure 3. ReadFile table UDF scenario overview
Diagram showing ReadFile table UDF and results as described above

The ReadFile table UDF returns the file records in a result table with one variable length result table row for each file record. You may have noticed that the UDF program itself does not perform any column mapping. Instead, it reads and returns file records as they are to the SQL query. The record field structure to column mapping is performed by DB2 for z/OS built-in functions outside the ReadFile table UDF and within the same DB2 for z/OS common table expression (CTE). The ReadFile table UDF provides a generic interface for reading file records from SQL. Therefore, you can use the ReadFile table UDF for reading file records coming from a variety of sources including QSAM, VSAM, and USS files.

ReadFile table UDF execution environment

The ReadFile table UDF uses the same infrastructure as the ReadFile scalar UDF. Therefore, the information provided in ReadFile scalar UDF execution environment applies accordingly, except for the following differences:

  • The table UDF is invoked via a table function reference that you specify in the FROM clause of the SQL select statement.
  • Rather than a scalar value, DB2 for z/OS returns a result table to the UDF invoker.
  • DB2 for z/OS and the UDF program interact with each other using function call types. For example, DB2 for z/OS invokes the ReadFile table UDF program once for the call types FIRST, OPEN, CLOSE, and FINAL and invokes the external table UDF repeatedly for the call type FETCH until the UDF program returns SQLSTATE 02000 signaling end of result table processing. The ReadFile UDF program then translates these call types into fopen, fread, and fclose API invocations.
  • The ReadFile table UDF returns SQLSTATE 02000 once the file read operation has reached an end of file (EOF) condition.
Figure 4. ReadFile table UDF execution environment
Diagram depicting the 8 steps of the processing flow. Text explanation of steps follows figure .

Defining the ReadFile table UDF to DB2 for z/OS

The input for the ReadFile table function is a filename with the same format as the input parameter of the ReadFile scalar UDF. The information provided in Defining the ReadFile scalar UDF to DB2 for z/OS also applies to this scenario.

You can use the SQL DDL statement shown in Listing 9 to create the ReadFile table UDF in DB2 for z/OS.

Listing 9. ReadFile table UDF DDL
CREATE FUNCTION C.READFILE(DSN VARCHAR(255))  /*1*/
RETURNS TABLE
(FILERECORD VARCHAR(32600 ))  /*2*/
SPECIFIC READFILE 
PARAMETER VARCHAR STRUCTURE  /*3*/
LANGUAGE C   /*4*/
DETERMINISTIC  /*5*/
NO SQL  /*6*/
EXTERNAL NAME READFILE  /*7*/
PARAMETER STYLE DB2SQL
RETURNS NULL ON NULL INPUT
NO EXTERNAL ACTION 
SCRATCHPAD  /*8*/
FINAL CALL
NO COLLID 6
ASUTIME LIMIT 100  /*9*/  
STAY RESIDENT YES  /*10*/
PROGRAM TYPE SUB  /*11*/
WLM ENVIRONMENT WLMENV  /*12*/
SECURITY USER  /*13*/
DBINFO
DISALLOW PARALLEL

The external UDF attributes used in the SQL DDL statement shown in Listing 9 are explained below:

  1. The input parameter contains a file name that adheres to the file naming rules of the z/OS XL C/C++ runtime environment.
  2. The UDF returns a result table. The maximum length of a result table row is 32600 bytes. This value is derived from the maximum total length of columns of a query operation requiring sort and evaluating column functions (MULTIPLE DISTINCT and GROUP BY).
  3. PARAMETER VARCHAR STRUCTURE is required to hexadecimal '00' values returned in VARCHAR result table columns. This is a valid requirement because a file record can contain integer or packed-decimal field values stored in internal binary format.
  4. The function is written in C/C++.
  5. Repetitive function invocations with the same input parameter return the same scalar value.
  6. The external function contains no SQL and therefore has no DB2 package collection ID.
  7. The UDF program load module name is READFILE.
  8. The UDF uses a scratchpad to save and pass the file pointer reference that was obtained by the fopen API between UDF calls.
  9. The ReadFile table UDF has an ASUTIME limit of 100 service units (SUs). This is the total amount of processor time that the external UDF can use. If the limit is exceeded, DB2 for z/OS sends message DSNX908I to the console and terminates the UDF.
  10. The external UDF program is to stay in virtual storage once it has been loaded.
  11. PROGRAM TYPE SUB UDF C-programs do not have a main function. z/OS Language Environment (LE) will not perform program re-initialization. The UDF program has to take care of variable initialization. This setting is required for the ReadFile table UDF in order for it to have full control over file closure and deallocation. PROGRAM TYPE MAIN causes the LE to close the file between UDF program invocations.
  12. The name of the WLM application environment that the external UDF is to be executed in is WLMENV.
  13. DB2 uses z/OS external security (RACF) for controlling access to non-DB2 resources. RACF performs DATASET access control for the user that has been authenticated to DB2 for z/OS. The information in the RACF security consideration section applies accordingly.

The C code that implements the ReadFile table function is shown in Listing 10.

Listing 10. ReadFile table UDF C code
#pragma linkage(READFILE,fetchable)
#include <stdio.h>
#include <stdarg.h>
#include <sqludf.h>
struct SCRATCHDATA {  FILE *p;};
void SQL_API_FN  READFILE(SQLUDF_VARCHAR_FBD *fileName,  /*1*/
SQLUDF_VARCHAR_FBD *clob32600,
SQLUDF_NULLIND *fileNameInd,
SQLUDF_NULLIND *clob32600IND,
SQLUDF_TRAIL_ARGS_ALL)
{ long i = 0; int num = 0; char errMsg[0100]; char fileNameTmp[0255]; 
  struct SCRATCHDATA *sp; sp = (struct SCRATCHDATA *) SQLUDF_SCRAT->data;
  switch (SQLUDF_CALLT)
  {  case SQLUDF_TF_FIRST:
      if (*fileNameInd == -1)
       {strcpy( SQLUDF_STATE, “38700”);
        sprintf (errMsg, “file name missing”);
        strcpy( SQLUDF_MSGTX, errMsg);
        break;}
      strncpy(fileNameTmp,fileName->data,fileName->length);
      memcpy( fileNameTmp+fileName->length,”\0”,1);
      sp->p = fopen(fileNameTmp,”rb,type=record”);  /*2*/
      if (sp->p == NULL)
       {strcpy( SQLUDF_STATE, “38701”);
        sprintf (errMsg, “data set cannot be opened”, fileName);
        strcpy( SQLUDF_MSGTX, errMsg);
        break; }
     case SQLUDF_TF_OPEN:
      break;
     case SQLUDF_TF_FETCH: 
      num = fread(clob32600->data ,1, 32600, sp->p) ;  /*3*/
      if (num == 0) 
       {strcpy( SQLUDF_STATE, “02000”);break; }  /*4*/
        clob32600->length  = num   ;
        *clob32600IND = 0;
        break;
     case SQLUDF_TF_CLOSE:
      fclose(sp->p); break;  /*5*/
     case SQLUDF_TF_FINAL:
     break;
  }
}

The external UDF program shown in Listing 10 performs the following main processing steps:

  1. DB2 for z/OS treats VARCHAR parameter values for C-language UDF programs as NULL terminated strings. For the ReadFile table UDF, this behavior is undesirable because file records can contain packed-decimal or integer field values that are stored in internal binary format. The ReadFile table UDF is therefore defined with the attribute PARAMETER VARCHAR STRUCTURE. This attribute causes DB2 for z/OS to treat external C-UDF VARCHAR parameters like real VARCHAR structures for which small integer length information is maintained by DB2 for z/OS for input VARCHAR parameters and by the C-UDF program for output VARCHAR parameters. The SQLUDF_VARCHAR_FBD (for bit data) structure definition is therefore used for VARCHAR parameters.
  2. Open the file when invoked with call type SQLUDF_TF_FIRST. The fopen API opens the file referred to by the fileName variable. fopen mode "rb,type=record" opens the file in read binary mode for record processing. The binary mode was chosen to suppress character translations. Data set allocation is transparently performed by the z/OS XL C/C++ runtime environment during fopen processing.
  3. Read the next record and provide the VARCHAR length information when invoked with call type SQLUDF_TF_FETCH.
  4. Indicate end of result table processing when there is no more data to read.
  5. Close the file when invoked with call type SQLUDF_TF_CLOSE. Data set deallocation is transparently performed by the z/OS XL C/C++ runtime environment during fclose processing.

You can install the UDF program in DB2 for z/OS by executing the JCL procedure that is explained in the Building user-defined functions section of this article.

Map delimited file record structure to result table columns

This scenario shows you how to use ReadFile table UDF together with the COBOL RF.GETCOL UDF to map character delimited file record fields to result table columns. The file records that you process in this scenario are shown in Listing 11.

Listing 11. Table UDF delimited record structure file content
Josef!Klitsch!DB2 for z/OS!zNTC          
Michael!Schapira!Cognos on System z!zNTC 
Cedrine!Madera!BI Specialist!zNTC        
Eric!Cicchiello!Manager!zNTC

The file record structure in this case is not fixed. The field information contained in the file records are at varying record locations and are delimited by a single character delimiter — the "!" character. The challenge in this scenario is to use SQL to locate the exclamation point and extract the word that is to be mapped to a result table column. Achieving this by using DB2 for z/OS built-in functions is rather complex. To ease this complexity, this scenario implements the RF.GETCOL function.

The RF.GETCOL scalar function is an external UDF written in COBOL. The COBOL code is provided in the DWSReadFile.zip file that is linked to in the Downloads section. You can define the RF.GETCOL function using the SQL DDL shown in Listing 12.

Listing 12. RF.getCol scalar UDF DDL
CREATE FUNCTION RF.getCol
(inString VARCHAR(2048),
inDelim  VARCHAR(0001),
inWordPos   INTEGER  )
RETURNS VARCHAR(2048)
EXTERNAL NAME GETCOL
LANGUAGE COBOL
DETERMINISTIC
PARAMETER STYLE DB2SQL
FENCED
RETURNS NULL ON NULL INPUT
NO EXTERNAL ACTION
NO SCRATCHPAD
NO FINAL CALL
ALLOW      
DBINFO
NO COLLID
WLM ENVIRONMENT WLMENV3
ASUTIME NO LIMIT
STAY RESIDENT YES
PROGRAM TYPE SUB
SECURITY DB2
STOP AFTER SYSTEM DEFAULT FAILURES
INHERIT SPECIAL REGISTERS

Usage of the RF.GETCOL function is illustrated in Listing 13.

Listing 13. RF.getCol function usage sample
select                                                           
rf.GETCOL(word1!word2!word3!word4,!,3) from sysibm.sysdummy1;
---------+---------+---------+---------+---------+---------+-----
word3                                                            
DSNE610I NUMBER OF ROWS DISPLAYED IS 1                           
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

The function usage is simple and straightforward:

  • Parameter 1 contains the input string that contains the word that is to be extracted.
  • Parameter 2 contains a one-byte delimiter.
  • Parameter 3 contains the word position (for example, if a value of "4" is specified, the fourth word delimited by parameter 2 is extracted from parameter 1 and returned in a scalar value).

Once you have installed the RF.GETCOL UDF, you are ready to run the query shown in Figure 5. In this scenario you use a DB2 for z/OS common table expression (CTE) to invoke the ReadFile table UDF from SQL. Within that CTE you use the RF.GETCOL scalar UDF to map the delimited record structure of the result table rows returned by the ReadFile table UDF to result table columns.

Figure 5. ReadFile delimited record structure scenario overview
Diagram depicting the processing flow for the ReadFile delimited record scenario

The execution environment in this scenario is similar to the execution environment illustrated in Figure 4. The information provided in section ReadFile table UDF execution environment applies accordingly.


Example: HTTPD access log analysis

This scenario addresses a situation in which you want to perform ad hoc analysis on an HTTPD access log that is stored in the USS file system. The scenario requires that the ReadFile table UDF and the RF.GETCOL scalar UDF are implemented and available for use. You will see that these two UDFs are sufficient to perform a more complex analysis on the HTTPD access log file. An extract of the HTTPD access log file that is to be analyzed is illustrated in Listing 14.

Listing 14. HTTPD access log
10.254.1.10 - - [07/Nov/2008:10:39:42 +0100] "GET /images/backgr.gif HTTP/1.1" 200 183099
10.254.1.10 - - [07/Nov/2008:10:39:45 +0100] "GET /favicon.ico HTTP/1.1" 404 281
127.0.0.1 - - [12/Nov/2008:14:56:42 +0100] "POST /cognos8/cgi/cognos.cgi HTTP/1.1" 404 294
10.254.1.10 - - [12/Nov/2008:17:09:31 +0100] "GET /http_srv_styles.css HTTP/1.1" 200 1018
10.254.1.10 - - [12/Nov/2008:17:09:31 +0100] "GET /images/odot.jpg HTTP/1.1" 404 285

The record layout of the above log entries is known in the Apache HTTP Server as the Common Log Format (CLF). For further details on the Apache CLF standard, use the Apache HTTP Server Version 2.0 Log Files link in the Resources section.

For the analysis of the HTTP access log, you map the log entry fields to result table columns. Because the CLF format provides the field information at varying locations, you use the RF.GETCOL UDF for field extraction. The fields you want to extract, the one byte field delimiter and the record field's relative word position inside the log entries, are shown in Table 2.

Table 2. HTTPD access log fields and delimiters
Field Name Delimiter Word position Remark
Server blank 1 IP Address / Domain Name
User blank 3 User ID, can contain -
DateTime ] 2 Date, time, time offset
Status Blank 9 Status code
Size Blank 10 Size of object returned
Request X'7f' --> “ 2 Method, resource, protocol

The scenario illustrated in Figure 6 uses a DB2 for z/OS common table expression (CTE) to invoke the ReadFile table UDF from SQL. Within that CTE, the RF.GETCOL scalar UDF is used to locate and map log entry fields to result table columns. For accessing the log entry field information, the RF.GETCOL UDF is invoked and passes the delimiters and the relative word positions as input parameters.

Figure 6. ReadFile table UDF HTTPD access log scenario overview
Diagram depicting the processing flow for the ReadFile table UDF HTTPD access log scenario

SQL native procedure for multiple query analysis

The scenario shown in Figure 6 illustrates just the base reporting capability. Because you can use the ReadFile table UDF to have the access log file integrated with SQL processing, you can now extend the analysis by any report that can be created using SQL. For example, you can determine:

  • Time of the day or day of the week the Web server was most active
  • IP address that issued the most requests to the Web server
  • Most popular URL on the Web server
  • Number of hits within a given period, per URL or IP address
  • Total number of bytes sent to remote browsers

The HTTPD access log scenario illustrated in Figure 7 invokes the HTTPD_ACCESS_LOG SQL native procedure to create a declared global temporary table (DGTT), invokes the ReadFile table UDF from within an SQL INSERT statement for DGTT population, and uses SQL cursors to return the following information to the DB2 client application:

  1. The top three log entries with the highest number of bytes sent to the Web client
  2. The top three client IP addresses with the highest number of bytes sent to the Web client
  3. The top three HTTP requests with the highest number of bytes sent to the Web client
Figure 7. HTTPDLOG SQL native procedure scenario
Diagram depicting the processing flow for the HTTPDLOG SQL native procedure scenario

The DDL for defining the HTTPD_Access_Log SQL native procedure is provided in the DWSReadFile.zip file, which is linked to in the Downloads section.

Multiple result set SQL procedure as a Data Web Service

For the next step, you expose the HTTP_ACCESS_LOG SQL procedure as a Web service using the methodology described in Part 1 of this article series. Use Data Studio Developer to perform the implementation tasks outlined below:

  1. Set up and establish a database connection.
  2. Create the UDFReadFile data development project.
  3. Create the HTTPD_Access_Log Web Services project.
  4. Drag and drop the HTTPD_Access_Log SQL procedure into the Web Services project.
  5. Build a Web Application Archive file (WAR file).
  6. Install the WAR-file in your J2EE application server environment.

Upon successful implementation of the HTTPD_Access_Log WAR file, you can use a Web service client of your choice to invoke the Data Web service. The XML response document shown in Figure 8 was created using a REST-style HTTP RPC request that was issued through the curl utility. The access_log file that was used in the REST-style HTTP RPC request is provided in the DWSReadFile.zip file, which is linked to in the Downloads section.

Figure 8. HTTP REST RPC XML response
Screen shot from Data Studio Developer showing the HTTP REST RPC XML response document

HTML-based reports with IBM DWS and XSLT transformation

Next, use the XSLT style sheet shown in Listing 15 to transform the XML response illustrated in Figure 8 into HTML. This turns the HTTPD_Access_Log Web service into a HTML-based browser application. The XSLT style sheet generates three HTML table tags, one for each stored procedure result set.

Listing 15. HTTPD Access Log XSLT style sheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="html" encoding="UTF-8" media-type="text/html"/>
    <xsl:template match="/*">
	<html>
       	  <head><title>HTTPD Access Log Analysis</title></head>
          <body>
            <h1>HTTPD Access Log Analysis</h1>
            <h2>Top 3 Log Entries by Bytes sent</h2>
            <table border="1"> <tr bgcolor="#9acd32">
                 <td>IPAddr</td><td>Userid</td><td>Date-Time</td>
                 <td>Response Code</td><td>Byte sent</td><td>Request</td>
               </tr>
               <xsl:for-each select="rowset/row">
                  <tr>
                    <td><xsl:value-of select="IPADDR/text()"/></td>
                    <td><xsl:value-of select="USERID/text()"/></td>
                    <td><xsl:value-of select="DATETIME/text()"/></td>
                    <td><xsl:value-of select="STATUS/text()"/></td>
                    <td><xsl:value-of select="SIZE"/></td>
                    <td><xsl:value-of select="REQUEST/text()"/></td>
                  </tr>
               </xsl:for-each>
            </table>
            <h2>Top 3 Client IP addresses by Bytes sent</h2>
            <table border="1"> <tr bgcolor="#9acd32">
               	 <td>Hits</td><td>Byte sent</td><td>Client IP-Address</td>
               </tr>
               <xsl:for-each select="rowset2/row">
                  <tr>
                    <td><xsl:value-of select="HITS"/></td>
                    <td><xsl:value-of select="TOTSIZE"/></td>
                    <td><xsl:value-of select="IPADDR/text()"/></td>
                  </tr>
               </xsl:for-each>
            </table>
            <h2>Top 3 HTTP Requests by Bytes Sent</h2>
            <table border="1"> <tr bgcolor="#9acd32">
               <td>Hits</td><td>Byte sent</td><td>AVG Byte sent</td><td>Request</td>
               </tr>
               <xsl:for-each select="rowset3/row">
                  <tr>
                    <td><xsl:value-of select="HITS"/></td>
                    <td><xsl:value-of select="TOTSIZE"/></td>
                    <td><xsl:value-of select="AVGSIZE"/></td>
                    <td><xsl:value-of select="REQUEST/text()"/></td>
                  </tr>
               </xsl:for-each>
            </table>
          </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

To activate the style sheet shown in Listing 15 for XSLT output message transformation, use the methodology explained in Part 1 of this article series. This involves the following implementation tasks:

  1. Assign the XSLT style sheet shown in Listing 15 to the HTTPD_ACCESS_LOG Web service operation for output message transformation.
  2. Build and re-deploy the HTTPD_ACCESS_LOG WAR file.

When you invoke the HTTPD_Access_Log Web service from your browser using a REST-style HTTP RPC request, you see the screen shown in Figure 9.

Figure 9. HTTPD_Access_Log report in Browser
Screen shot of the access log report displayed in a browser

AccountInquiry DWS accessing a VSAM data set

This part of the article shows you how to use the ReadFile table UDF to provide random access to VSAM data. For illustration purposes, the sample banking scenario that was introduced in Part 1 of this article series was changed to store the account detail information in a VSAM data set. The ListCustomer and AccountSummary information remained in DB2 tables. Therefore, the AccountDetail stored procedure had to be changed to invoke the ReadFile table UDF to random-read the VSAM data set from SQL and retrieve the account detail information of a given account number. The AccountDetail stored procedure interface remained unchanged allowing for the AccountInquiry DWS to be reused as far as the rest of the DWS infrastructure was concerned. An AccountInquiry overview reflecting the infrastructure change is provided in Figure 10.

Figure 10. AccountInquiry DWS with VSAM access
DWS runtime environment overview with VSAM access

ReadFile table UDF VSAM key-sequential read capability

In a typical customer environment you are likely to encounter VSAM data sets that are big in size. Reading such data sets sequentially is not advisable because of the cost associated with the lengthy response times. This part of the article shows you a way to use a key range for key-sequential reading of VSAM data by the ReadFile UDF. Besides the data set name, the UDF program simply receives two additional parameters (start-key and end-key) and uses these for the key-sequential read. An extract of the ReadFile C source program is shown in Listing 16 and used to explain the additional changes that were performed in the ReadFile UDF program. The complete C source program is provided in the DWSReadFile.zip file, which is linked to in the Downloads section.

Listing 16. ReadFileUDF key-sequential read capability
struct SCRATCHDATA
{
  FILE *p;
  long vsamkeylen;  /*4*/
  long vsamRKP   ;
};

void SQL_API_FN READFILK(SQLUDF_CHAR *fileName,
              SQLUDF_CHAR *VSAMkey,  /*1*/
              SQLUDF_CHAR    *Keyend,
              SQLUDF_VARCHAR_FBD *clob32600,
              SQLUDF_NULLIND *fileNameInd,
              SQLUDF_NULLIND *VSAMkeyInd,
              SQLUDF_NULLIND *KeyendInd,
              SQLUDF_NULLIND *clob32600IND,
              SQLUDF_TRAIL_ARGS_ALL)
{
  int num = 0; char errMsg[0100]; char filenameTmp[0100];
  fldata_t fileinfo; struct SCRATCHDATA *sp;
  sp = (struct SCRATCHDATA *) SQLUDF_SCRAT->data;
  switch (SQLUDF_CALLT)
  {case SQLUDF_TF_FIRST:
      sp->p = fopen(fileName,"rb,type=record");
      if (sp->p == NULL)
      {  strcpy( SQLUDF_STATE, "38700");
         sprintf (errMsg, "data set cannot be opened", fileName);
         strcpy( SQLUDF_MSGTX, errMsg);
         break;}
      if (fldata(sp->p, filenameTmp, &fileinfo))  /*2*/
      {  strcpy( SQLUDF_STATE, "38701");
         strcpy( SQLUDF_MSGTX,"fldata() failed");
         fclose(sp->p);
         break; }
      if ((fileinfo.__dsorgVSAM == 1 &&  /*3*/
           (fileinfo.__vsamtype != __KSDS_PATH  &&
            fileinfo.__vsamtype != __ESDS_PATH  &&
            fileinfo.__vsamtype != __KSDS)) ||
          ( fileinfo.__dsorgVSAM != 1))
      {  strcpy( SQLUDF_STATE, "38702");
         strcpy( SQLUDF_MSGTX,"No VSAM KSDS");
         fclose(sp->p);
         break; }
      sp->vsamkeylen = fileinfo.__vsamkeylen;  /*4*/
      sp->vsamRKP    = fileinfo.__vsamRKP   ;
      if (flocate(sp->p,VSAMkey,sp->vsamkeylen,__KEY_EQ))  /*5*/
      {  strcpy( SQLUDF_STATE, "38703");
         sprintf (SQLUDF_MSGTX,"VSAMkey %s length %d not found",
                           VSAMkey,sp->vsamkeylen);
         fclose(sp->p); break; }
    case SQLUDF_TF_OPEN:
      break;
    case SQLUDF_TF_FETCH:
      num = fread(clob32600->data ,1, 32600, sp->p) ;
      if ((num == 0) ||
       (memcmp(clob32600->data+sp->vsamRKP,Keyend,sp->vsamkeylen)>0))  /*6*/
        {  strcpy( SQLUDF_STATE, "02000"); break; }
      clob32600->length  = num   ; *clob32600IND = 0;
      break;
    case SQLUDF_TF_CLOSE:
       fclose(sp->p); break;
    case SQLUDF_TF_FINAL:
       break;
  }
}

For VSAM random read processing, the changes described below were implemented in the ReadFile table UDF:

  1. The UDF interface provides two additional parameters: start-key and end-key.
  2. The fldata() API retrieves data set information including VSAM attributes.
  3. If the data set type does not support key-sequential access (only KSDS and VSAM PATH are supported), SQLSTATE 38702 is returned to DB2 for z/OS.
  4. Key length and key record offset information obtained by the fldata API are saved in the scratchpad area.
  5. Position current record pointer at start-key.
  6. If end-of-file (EOF) is reached or the record read is beyond the end-key value, SQLSTATE 02000 is returned to DB2 for z/OS.

You can install the UDF program in DB2 for z/OS by executing the JCL procedure that is explained in the Building user-defined functions section of this article.

The VSAM key-sequential enabled version of the ReadFile table UDF was defined with the parameters shown in Listing 17.

Listing 17. DDL VSAM ReadFile table UDF
CREATE FUNCTION
   C.READFILE(DSN CHAR(255),   /*1*/
              KEYSTART DECIMAL(15,0),   /*2*/  
              KEYEND   DECIMAL(15,0))   /*3*/
 RETURNS TABLE
    (FILERECORD VARCHAR(32600 ))   /*4*/

Following is an explanation of the parameters used in Listing 17:

  1. DSN— input parameters to provide the z/OS data set or USS file name
  2. KEYSTART— start key value for VSAM random read
  3. KEYEND— end key value for VSAM random read
  4. FILERECORD— result table structure

Requirement for additional scalar UDFs

As illustrated in Figure 11, the VSAM data set stores DECIMAL and SMALLINT values in internal binary format (see fields AccountID, Debit, Credit, and Description VARCHAR length information).

Figure 11. Account detail VSAM record structure
A breakdown of the fields in VSAM record

You can create and populate the VSAM data set shown in Figure 11 by running the IDCAMS VSAM Access Method Services utility. An IDCAMS job control language (JCL) sample as well as the account detail data used in this scenario is provided in the DWSReadFile.zip file, which is linked to in the Downloads section.

The AccountDetail SQL procedure transforms the internal binary formats shown in Figure 11 to their corresponding DB2 for z/OS DECIMAL and SMALLINT column formats. Because DB2 for z/OS does not provide casting functions from CHAR/VARCHAR values that are stored in internal binary format to DB2 for z/OS INTEGER/DECIMAL column format, the scalar UDFs are used in the AccountDetail SQL procedure as shown in Table 3.

Table 3. Additional scalar UDFs
UDF Name Input Parameter Scalar Return Value
RF.BIGINT VARCHAR(8) BIGINT
RF.DEC0 VARCHAR(8) DECIMAL(15,0)
RF.DEC2 VARCHAR(8) DECIMAL(15,2)

The RF.BIGINT, RF.DEC0, and RF.DEC2 scalar functions are implemented by C-program READCAST. The C source code of the UDF program is provided in the DWSReadFile.zip file, which is linked to in the Downloads section.

AccountDetail stored procedure modifications

The AccountDetail stored procedure that was used in Part 1 of this article series was changed as shown in Listing 18.

Listing 18. AccountDetail SQL procedure with ReadFile invocation
CREATE PROCEDURE RF.ACCOUNTDETAIL
(INOUT ACCOUNTID DECIMAL(15, 0), OUT DOB DATE, OUT CUSTNAME VARCHAR(32) )
VERSION V001 ISOLATION LEVEL UR RESULT SETS 1 LANGUAGE SQL
P1:BEGIN
 DECLARE DOB_TMP DATE DEFAULT 2009-03-20;
 DECLARE CUSTNAME_TMP VARCHAR(32) DEFAULT  ;
 DECLARE ACCOUNTID_TMP DECIMAL(15,0);



DECLARE CURSOR1 CURSOR WITH RETURN FOR
   WITH
   QRY1 (RECORD) AS
   (SELECT FILERECORD FROM
    TABLE (C.READFILK  /*1*/
          (CHAR(//JOSEF.DWSUDFRD.ACCTNO.PATH),  /*2*/
           ACCOUNTID,  /*3*/
           ACCOUNTID)) A),  /*4*/
   QRY2 (ACCTNO, BOOK_DATE, DEBIT, CREDIT, DSC) AS
   (SELECT  /*5*/
      RF.DEC0(SUBSTR(RECORD,01,08)) 
     ,DATE(SUBSTR(RECORD,10,10))
     ,RF.DEC2(SUBSTR(RECORD,21,08)) 
     ,RF.DEC2(SUBSTR(RECORD,30,08))
     ,VARCHAR(SUBSTR(RECORD,41,INTEGER(RF.BIGINT(SUBSTR(RECORD,39,02))))) 
    FROM QRY1)
    SELECT * FROM QRY2;

 -- Cursor left open for client application
 OPEN CURSOR1;
 SELECT
     STRIP(SUBSTR(NM,1,POSSTR(NM,,)-1)) CONCAT , CONCAT
     STRIP(SUBSTR(NM,POSSTR(NM,,)+1,LENGTH(NM)-POSSTR(NM,,))),
     BRTH_DT   INTO CUSTNAME_TMP,DOB_TMP
     FROM ACCT.IP A
     WHERE A.IP_ID IN
     (SELECT IP_ID FROM ACCT.AR WHERE AR_ID = ACCOUNTID FETCH FIRST ROW ONLY);
 SET ACCOUNTID_TMP = ACCOUNTID;
 SET DOB = DOB_TMP;
 SET CUSTNAME = CUSTNAME_TMP;
 SET ACCOUNTID = ACCOUNTID_TMP;
END P1

Following are the highlights of the changes that were made:

  1. A DB2 for z/OS common table expression (CTE) invokes the ReadFile table UDF for VSAM random read processing.
  2. The VSAM data set name is passed as input parameter.
  3. The start-key is passed as input parameter.
  4. The end-key is passed as input parameter.
  5. Within the same CTE, DB2 for z/OS built-in functions and scalar UDFs are used to map the data set record field structure to result table columns.

Creating and deploying the AccountInquiry DWS

Now with the ReadFile table UDF enabled for VSAM key-sequential read operations, and the AccountDetail SQL procedure modified to use that ReadFile table UDF, you can recreate the AccountInquiry DWS, and you can re-build and re-deploy the WAR-file in your J2EE application server environment. To do this, perform the following implementation tasks in Optim Development Studio (or Data Studio):

  1. In the existing UDFReadFile project, create the AccountInquiry_VSAM Web Services project.
  2. Drag and drop the existing AccountSummary and ListCustomer SQL procedures that were used in Part 1 of this article series into the AccountInquiry_VSAM Web Services project.
  3. Drag and drop the AccountDetail SQL procedure shown in Listing 18 into the AccountInquiry_VSAM Web Services project.
  4. Use the XSLT style sheets provided in part 1 of this article to activate XSLT output message to HTML transformation for the ListCustomer, AccountSummary, and AccountDetail Web service operations.
  5. Build the AccountInquiry_VSAM Web Application Archive file (WAR file).
  6. Install the AccountInquiry_VSAM WAR file in your J2EE application server environment.

Running the HTML based AccountInquiry_VSAM application

After you have successfully installed the AccountInquiry_VSAM application, it uses the modified AccountDetail SQL procedure to access the account detail VSAM data set through the ReadFile table UDF.

You can invoke the ListCustomer Web service operation using a REST style http URL in the same way as described in Part 1 of this article series. Even though the account detail information now is retrieved from a VSAM data set, you will not notice any difference — except for the HTTP URL. The invocation returns the HTML-based Web page shown in Figure 12.

Figure 12. ListCustomer XSLT transformed HTML Web page
Screen shot of a browser showing two records in a list of banking customers

From the page shown in Figure 12, click Customer Number 36505633552 to view account summary information for that customer, as shown in Figure 13.

Figure 13. AccountSummary XSLT transformed HTML Web page
Screen shot of a browser showing the record detail for banking customer number 36505633552

From the page shown in Figure 13, click Account Number 9000000002 to view the account detail information for that account number, as shown in Figure 14.

Figure 14. AccountDetail XSLT transformed HTML Web page
Screen shot of a browser showing the detail for banking account number 9000000002

Conclusion

User-defined functions like the ones described in this article provide an easy way of accessing files and data sets from SQL. Once the files and data sets are accessible from SQL, you can use existing SQL built-in functions and your own UDFs to perform data manipulation on the information stored in the data sets and files. You can also expose these "file-reading" SQL statements as IBM Data Web Services, which results in the data sets and files being available to Web service operations.


Downloads

DescriptionNameSize
Sample code for this articleDWSReadFile.zip209KB
Data Studio project for this articleUDFReadFile.zip505KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=407390
ArticleTitle=Data Web Services on DB2 for z/OS, Part 2: Integrate z/OS data sets using user-defined functions
publish-date=07092009