Author and developer Knut Stolze shows you how to use DB2 user-defined functions to perform tasks outside of the database server. In particular, he uses C language functions to show you how to access files and to invoke another executable.

Knut Stolze, DB2 Information Integration, IBM Germany

Knut Stolze started his work with DB2 when he joined IBM as a visiting scientist at the Silicon Valley Lab where he worked on the DB2 Image Extender. He moved on to the DB2 Spatial Extender Version 8 and was responsible for several enhancements to improve the usability, the performance, and the standard-conformance of the Extender for more than two years.

Currently, he works as a teaching assistant at the University of Jena, Germany, and continues his work for IBM in the area of federated databases. He can be reached through the newsgroups comp.databases.ibm-db2 and ibm.software.db2.udb.spatial or at stolze@de.ibm.com.



27 March 2003

Introduction

Sometimes you need to access operating system functionality from within an SQL statement. For example, you may want to write some information to a file, establish a communication link with a special process, or execute another program residing on the database server. Conveniently, IBM® DB2® Universal DatabaseTM (UDB) has user-defined functions (UDFs) that enable you to do these tasks. You can implement UDFs in a variety of different programming languages, including SQL, JavaTM, or C. In this article, I will show you some examples of how UDFs, written in C, can be used to perform "external" tasks directly on the database server. In particular, I'll show you how to:

  • Access files residing on the database server
  • Invoke another executable.

There are basically two ways to deal with files in the file system: reading from a file, or writing to a file. Both of these are easily made available in SQL with UDFs. For example, to read a file you can specify the name of the file as an expression in an SQL statement, and a UDF can take that filename as input and return the contents of the file as its result. This result can be treated as any other value in the SQL statement: It can be inserted into a table, returned to the client in the result set, or further processed by other functions. Likewise, the writing of a string to a file can be initiated from SQL by specifying the filename and the string itself. The function opens the file, writes the string, and then returns the success or error status.

Calling an executable or a script on the database server from SQL might come in handy. For example, you might want to start some tasks based on events or conditions in the database. For instance, there could be too many rows in a table and you may want to kick off a backup, or you may want to use another program to perform a certain task such as sending an email to the administrator. Another possible option is to execute a SQL script in a separate transaction.

Before I show you these UDFs, let's briefly go over how you can compile and link the C/C++ code of a UDF into a shared library so that DB2 can later use the function.


Building user-defined functions

To build the function described in this article, compile the source file os_calls.c (see Download) and link it into a shared library. Optionally, you can define an export file that defines the entry points for the library. Depending on the platform, you have to use a different style for the export file. Windows® systems expect the following content in an export file library.def as shown in Listing 1.

Listing 1. Export file on Windows systems

LIBRARY systemCallUDF 
EXPORTS 
	readFileToClob 
	readFileToTable 
	writeFileFromParams 
	systemCall

UNIX® systems use the extension .exp for the export file library.exp, and the structure in the file is very simple. It only lists the entry points for the functions to be exported and no further information. Listing 2 shows such an example.

Listing 2. Export file on UNIX and Linux systems

readFileToClob 
readFileToTable 
writeFileFromParams 
systemCall

Use the respective export file and the source file to compile and link the code. To complete this task, you have to ensure that a C compiler and the DB2 Application Development Client are installed on the system where you develop the functions. Note that you do not need a C compiler or the DB2 Application Development Client on the production system where you want to use the functions. To simplify the build process, you can use the script sqllib/samples/c/bldrtn in your instance directory (bldrtn stands for build routine). If you use the script, then you must provide an export file.

Note: For version DB2 UDB 7.x, use the bldudf tool on UNIX and the bldmudf tool on Windows systems. The syntax is slightly different; refer to the description in the file for more details.

To build the UDFs in the file os_calls.c, simply execute:

INSTHOME/sqllib/samples/c/bldrtn os_calls

where INSTHOME is the path to the instance home directory. For example:

	c:\program files\ibm\sqllib\samples\c\bldrtn os_calls

or

	/home/stolze/sqllib/samples/c/bldrtn os_calls

The final step of the script copies the shared library to the sqllib/function directory in your instance directory.


Reading from files

When reading files, there are several options how the content of the file will be represented in SQL. For example, the entire file content can become a scalar VARCHAR or CLOB value (binary files can be transformed to BLOBs), or the file has a well-defined structure, and this structure is to be mapped to a relational table. In the first case, an external scalar function can do the job; in the second case we use DB2 table functions to do perform the mapping from the file's internal structure to the structure of a table in the database system.

First, let's deal with the basic case where the entire content of the file is to be represented as a single CLOB value. This means that the entire content of the file is read and stored in a LOB value, which is then returned to the database engine. The input for that function is the absolute file name, and the output is the CLOB that contains the result. Use the SQL statement in Listing 3 to create the function in the database.

Listing 3. Create function to read a file into a CLOB

CREATE FUNCTION readFile(fileName VARCHAR(255))  
   RETURNS CLOB(1M)  
   SPECIFIC readFileToClob  
   EXTERNAL NAME 'os_calls!readFileToClob'  
   LANGUAGE C  
   PARAMETER STYLE SQL  
   DETERMINISTIC  
   NOT FENCED  
   RETURNS NULL ON NULL INPUT  
   NO SQL  
   NO EXTERNAL ACTION  
   NO SCRATCHPAD  
   ALLOW PARALLEL  
   NO FINAL CALL;  
    
GRANT EXECUTE ON FUNCTION readFile TO PUBLIC;

Note that the GRANT EXECUTE statement is only supported by DB2 UDB Version 8. Similarly, the PARAMETER STYLE SQL clause must be replaced with PARAMETER STYLE DB2SQL for DB2 UDB Version 7.

The corresponding C code that implements the function is also rather straightforward as you can see in Listing 4. It opens the specified file, verifies the size of the file to make sure all the data can be copied to the CLOB, and then actually copies the data to the CLOB before closing the file.

Listing 4. C code to read a file into a CLOB

#include <stdio.h>  
#include <sqludf.h>  
 
void SQL_API_FN readFileToClob(  
        SQLUDF_VARCHAR *fileName,  
        SQLUDF_CLOB    *fileData,    /* output */ 
        /* null indicators */  
        SQLUDF_NULLIND *fileName_ind,  
        SQLUDF_NULLIND *fileData_ind,  
        SQLUDF_TRAIL_ARGS)  
{  
    int rc = 0;  
    long fileSize = 0;  
    size_t readCnt = 0;  
    FILE *f = NULL;  
     
    f = fopen(fileName, "r");  
    if (!f) {  
        strcpy(SQLUDF_MSGTX, "Could not open file ");  
        strncat(SQLUDF_MSGTX, fileName,  
                SQLUDF_MSGTEXT_LEN - strlen(SQLUDF_MSGTX)-1);  
        strncpy(SQLUDF_STATE, "38100", SQLUDF_SQLSTATE_LEN);  
        return;  
    }  
     
    rc = fseek(f, 0, SEEK_END);  
    if (rc) {  
        sprintf(SQLUDF_MSGTX, "fseek() failed with rc = %d", rc);  
        strncpy(SQLUDF_STATE, "38101", SQLUDF_SQLSTATE_LEN);  
        return;  
    }  
     
    /* verify the file size */  
    fileSize = ftell(f);  
    if (fileSize > fileData->length) {  
        strcpy(SQLUDF_MSGTX, "File too large");  
        strncpy(SQLUDF_STATE, "38102", SQLUDF_SQLSTATE_LEN);  
        return;  
    }  
    /* go to the beginning and read the entire file */  
    rc = fseek(f, 0, 0);  
    if (rc) {  
        sprintf(SQLUDF_MSGTX, "fseek() failed with rc = %d", rc);  
        strncpy(SQLUDF_STATE, "38103", SQLUDF_SQLSTATE_LEN);  
        return;  
    }  
     
    readCnt = fread(fileData->data, 1, fileSize, f);  
    if (readCnt != fileSize) {  
        /* raise a warning that something weird is going on */  
        sprintf(SQLUDF_MSGTX, "Could not read entire file " 
                "(%d vs %d)", readCnt, fileSize);  
        strncpy(SQLUDF_STATE, "01H10", SQLUDF_SQLSTATE_LEN);  
        *fileData_ind = -1;  
    }  
    else {  
        fileData->length = readCnt;  
        *fileData_ind = 0;  
    }  
}

After the compilation and installation of the shared library that contains the UDF, let's perform some very basic tests for the success and error cases:

/home/stolze $ echo "test" > test_file  
/home/stolze $ db2 "VALUES readFile( '/home/stolze/test_file' )"  
 
1  
--------------------------------------------------------  
test  
 
  1 record(s) selected.  
   
/home/stolze $ db2 "VALUES readFile( '/home/stolze/test_file123' )"  
 
1  
--------------------------------------------------------  
SQL0443N  Routine "READFILE" (specific name "READFILETOCLOB") has  
returned an error SQLSTATE with diagnostic text "Could not open  
file /home/stolze/test_file123".  SQLSTATE=38100

Subsequently, the UDF can be used like any other function in an SQL statement, for example in the following INSERT:

/home/stolze $ db2 "CREATE TABLE test_table ( text CLOB(2M) )"  
DB20000I  The SQL statement completed successfully.  
 
/home/stolze $ db2 "INSERT INTO test_table VALUES 
( readFile( '/home/stolze/test_file' ) )"  
DB20000I  The SQL statement completed successfully.  
 
/home/stolze $ db2 "SELECT SUBSTR(text, 2, 3) FROM test_table"  
 
1  
---  
est  
 
  1 record(s) selected.

Using table functions

The next step illustrates how to represent a file with a well-defined structure as a relational table. In this case, I implemented a mapping of the file's content to a table, as opposed to the simple copying of the entire file into a single LOB, treating it as opaque data.

For our example, the following structure in the file is assumed:

  • The file consists of a sequence of uniformly structured records.
  • Each record contains a total of 60 bytes.

A straightforward mapping of the file's structure to a relational structure is to return all the records in the file as a table with two columns. The first column is of type INTEGER, and the second of type VARCHAR(50). Each record in the file will become a single row in the table. Note that other files might follow different file formats, and each format requires its own specialized mapping. You would need to implement separate functions to support each mapping.

With the mapping defined, let's create the function in the database system. Listing 5 shows you how to do this.

Listing 5. Create a function to read from a file and represent it as a table

CREATE FUNCTION readFileTable(fileName VARCHAR(256))  
   RETURNS TABLE ( intCol INTEGER, string VARCHAR(50) )  
   SPECIFIC readFileToTable  
   EXTERNAL NAME 'os_calls!readFileToTable'  
   LANGUAGE C  
   PARAMETER STYLE SQL  
   DETERMINISTIC  
   NOT FENCED  
   RETURNS NULL ON NULL INPUT  
   NO SQL  
   NO EXTERNAL ACTION  
   SCRATCHPAD  
   NO FINAL CALL  
   DISALLOW PARALLEL;  
    
GRANT EXECUTE ON FUNCTION readFileTable TO PUBLIC;

Listing 6 shows the C code that implements the mapping of the file format to a relational table.

Listing 6. C code to represent the content of a file as table

#include <stdio.h>  
#include <sqludf.h>  
#include <sqlstate.h> 
  
void SQL_API_FN readFileToTable(  
        SQLUDF_VARCHAR *fileName,    /* input */  
        SQLUDF_INTEGER *intCol,      /* output */  
        SQLUDF_VARCHAR *charCol,     /* output */  
        /* null indicators */  
        SQLUDF_NULLIND *fileName_ind,  
        SQLUDF_NULLIND *intCol_ind,  
        SQLUDF_NULLIND *charCol_ind,  
        SQLUDF_TRAIL_ARGS_ALL)  
{  
    size_t readCnt = 0;  
    char intValue[10+1] = { '\0' };  
    char strValue[50+1] = { '\0' };  
     
    struct scratchMap {  
        FILE *f;  
        int rowNumber;  
    }; 
      
    /* map the scratchpad */  
    struct scratchMap *scratch =  
        (struct scratchMap *)SQLUDF_SCRAT->data;  
     
    *intCol_ind = -1;  
    *charCol_ind = -1; 
      
    switch (SQLUDF_CALLT) {  
      case SQLUDF_TF_OPEN:  
          /* open file and store the pointer on the scratchpad */  
          scratch->f = fopen(fileName, "r");  
          if (!scratch->f) {  
              strcpy(SQLUDF_MSGTX, "Could not open file ");  
              strncat(SQLUDF_MSGTX, fileName, SQLUDF_MSGTEXT_LEN - 
                      strlen(SQLUDF_MSGTX)-1);  
              strncpy(SQLUDF_STATE, "38200", SQLUDF_SQLSTATE_LEN);  
              return;  
          }  
          scratch->rowNumber = 0; 
            
          break; 
            
      case SQLUDF_TF_FETCH:  
          /* count the row */  
          scratch->rowNumber++; 
            
          /* read the integer */  
          readCnt = fread(intValue, 1, 10, scratch->f);  
          if (readCnt == 0) {  
              /* end of file reached */  
              strncpy(SQLUDF_STATE, SQL_NODATA_EXCEPTION,  
                      SQLUDF_SQLSTATE_LEN);  
              return;  
          }  
          else if (readCnt != 10) {  
              sprintf(SQLUDF_MSGTX, "Could not read int value "  
                      "in line %d", scratch->rowNumber);  
              strncpy(SQLUDF_STATE, "38201", SQLUDF_SQLSTATE_LEN);  
              return;  
          }  
          intValue[10] = '\0';  
          if (sscanf(intValue, "%d", intCol) != 1) {  
              sprintf(SQLUDF_MSGTX, "Invalid integer value %s "  
                      " in row %d", intValue, scratch->rowNumber);  
              strncpy(SQLUDF_STATE, "38202", SQLUDF_SQLSTATE_LEN);  
              return;  
          }  
          *intCol_ind = 0;  
           
          /* read the string (allow truncations at EOF) */  
          readCnt = fread(strValue, 1, 50, scratch->f);  
          strValue[readCnt] = '\0';  
          strcpy(charCol, strValue);  
          *charCol_ind = 0; 
            
          break; 
            
      case SQLUDF_TF_CLOSE:  
          /* close the file */  
          fclose(scratch->f);  
          scratch->f = NULL;  
          scratch->rowNumber = 0;  
    }  
}

Now let's construct a file that adheres to the format specified above. With such a file, we can verify the proper functioning of the UDF. Note that we must not append line terminators in the file, because this might not conform to the required file format. The -n option used for the echo command means we do not terminate the strings. Depending on the operating system you use, the way to generate the file might be different.

/home/stolze $ echo -n "1234567890" > test_file  
/home/stolze $ echo -n "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx" 
>> test_file  
/home/stolze $ echo -n "    987654" >> test_file  
/home/stolze $ echo -n "12345678901234567890123456789012345678901234567890" 
>> test_file  
/home/stolze $ echo -n "        -2" >> test_file  
/home/stolze $ echo -n "   test string   " >> test_file

Table functions are called with a different syntax than scalar functions. Here is a SELECT statement that shows how to use it in the FROM clause. Following that, you will find some very basic tests for the function.

/home/stolze $ db2 "SELECT * FROM TABLE 
( readFileTable( '/home/stolze/test_file' ) ) AS t" 
  
INTCOL      STRING  
----------- --------------------------------------------------  
 1234567890 abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx  
     987654 12345678901234567890123456789012345678901234567890  
         -2    test string 
           
  3 record(s) selected.  
   
/home/stolze $ db2 "SELECT * FROM TABLE 
( readFileTable( '/home/stolze/test_file123' ) ) AS t"  
 
INTCOL      STRING  
----------- --------------------------------------------------  
SQL0443N  Routine "READFILETABLE" (specific name "*LETOTABLE") has  
returned an error SQLSTATE with diagnostic text "Could not open  
file /home/stolze/test_file123".  SQLSTATE=38200  
   
/home/stolze $ echo -n "123" > test_file  
/home/stolze $ db2 "SELECT * FROM TABLE 
( readFileTable( '/home/stolze/test_file' ) ) AS t"  
 
INTCOL      STRING  
----------- --------------------------------------------------  
SQL0443N  Routine "READFILETABLE" (specific name 
"READFILETOTABLE") has returned an error SQLSTATE with diagnostic 
text "Could not read int value in line 1".  SQLSTATE=38201  
   
/home/stolze $ touch empty_test  
/home/stolze $ db2 "SELECT * FROM TABLE 
( readFileTable( '/home/stolze/empty_test' ) ) AS t" 
  
INTCOL      STRING  
----------- --------------------------------------------------  
 
  0 record(s) selected.

Writing files

The article A UDF for File Output and Debugging from SQL shows a common situation where it is very helpful to write data to a file from inside an SQL statement. That article uses the concepts presented in this section.

Writing to a file is basically just the reverse process of reading from one. The mapping between the file format and the information to be written has to be implemented. Again, user-defined functions allow a very convenient way to perform the tasks. In the following example, I assume the same definition for the file format that we already used above.

  • The file consists of a sequence of uniformly structured records.
  • Each record contains a total of 60 bytes.

The user-defined function that implements the write operation takes two input parameters. The first parameter is of type INTEGER and the second of type VARCHAR(50). This gives us the SQL statement from Listing 7 to register the function in the database.

Listing 7. Create a function to write data to a file

CREATE FUNCTION writeFile( fileName VARCHAR(255), 
      intValue INTEGER, strValue VARCHAR(50) )  
   RETURNS INTEGER 
   SPECIFIC writeFile  
   EXTERNAL NAME 'os_calls!writeFileFromParams' 
   LANGUAGE C  
   PARAMETER STYLE SQL  
   DETERMINISTIC  
   NOT FENCED  
   RETURNS NULL ON NULL INPUT  
   NO SQL  
   EXTERNAL ACTION  
   SCRATCHPAD  
   DISALLOW PARALLEL  
   FINAL CALL;  
    
GRANT EXECUTE ON FUNCTION writeFile TO PUBLIC;

Use a scratchpad to carry the open file descriptor from one invocation of the UDF to the next. That way, we can save the time for repeated opening and closing of the file if multiple records are to be written in a single SQL statement. The C code to implement the write operation is also straightforward. Listing 8 shows the details. Special consideration has to be given for the call type, but beyond that it is only necessary to write the data to the file in the correct format.

Listing 8. C code to write data to a file

#include <stdio.h> 
#include <string.h> 
#include <sqludf.h> 
 
void SQL_API_FN writeFileFromParams( 
        SQLUDF_VARCHAR *fileName,    /* input */ 
        SQLUDF_INTEGER *intCol,      /* input */ 
        SQLUDF_VARCHAR *charCol,     /* input */ 
        SQLUDF_INTEGER *result,      /* output */ 
        /* null indicators */ 
        SQLUDF_NULLIND *fileName_ind,  SQLUDF_NULLIND *intCol_ind, 
        SQLUDF_NULLIND *charCol_ind,   SQLUDF_NULLIND *result_ind, 
        SQLUDF_TRAIL_ARGS_ALL) 
{ 
    int rc = 0; 
    int writtenLen = 0; 
    char row[60+1] = { '\0' }; 
 
    struct scratchMap { 
        FILE *f; 
    }; 
 
    /* map the scratchpad */ 
    struct scratchMap *scratch = 
        (struct scratchMap *)SQLUDF_SCRAT->data; 
    *result_ind = -1; 
 
    switch (SQLUDF_CALLT) { 
      case SQLUDF_FIRST_CALL: 
          /* open file and store the pointer on the scratchpad */ 
          scratch->f = fopen(fileName, "wb"); 
          if (!scratch->f) { 
              strcpy(SQLUDF_MSGTX, "Could not open file "); 
              strncat(SQLUDF_MSGTX, fileName, SQLUDF_MSGTEXT_LEN - 
                      strlen(SQLUDF_MSGTX)-1); 
              strncpy(SQLUDF_STATE, "38300", SQLUDF_SQLSTATE_LEN); 
              rc = -1; 
              goto cleanup; 
          } 
 
      case SQLUDF_NORMAL_CALL: 
          if (*intCol < -999999999) { 
              /* a number of -1000000000 or smaller needs 
                 11 bytes, which is too long for our format */ 
              sprintf(SQLUDF_MSGTX, "The number %d is too long.", 
                      *intCol); 
              strncpy(SQLUDF_STATE, "38301", SQLUDF_SQLSTATE_LEN); 
              rc = -2; 
              goto cleanup; 
          } 
          if (strlen(charCol) > 50) { 
              /* strings must not be longer than 50 characters */ 
              sprintf(SQLUDF_MSGTX, "The string %s is too long.", 
                      charCol); 
              strncpy(SQLUDF_STATE, "38302", SQLUDF_SQLSTATE_LEN); 
              rc = -3; 
              goto cleanup; 
          } 
 
          /* write the given integer and string */ 
          writtenLen = fprintf(scratch->f, "%10d%-50s", *intCol, 
                  charCol); 
          if (writtenLen != 60) { 
              sprintf(SQLUDF_MSGTX, "Did not write 60 bytes " 
                      "(wrote %d).", writtenLen); 
              strncpy(SQLUDF_STATE, "38303", SQLUDF_SQLSTATE_LEN); 
              rc = -4; 
              goto cleanup; 
          } 
           
          *result = 0; 
          *result_ind = 0; 
          break; 
    } 
 
 cleanup: 
    switch (SQLUDF_CALLT) { 
      case SQLUDF_FIRST_CALL: 
          if (rc == 0) { 
              break; 
          } 
          /* we must close the file descriptor upon an error in 
             the FIRST call; so we fall through to the final call 
             processing below */ 
      case SQLUDF_FINAL_CALL: 
      case SQLUDF_FINAL_CRA: 
          /* close the file */ 
          fclose(scratch->f); 
          scratch->f = NULL; 
    } 
}

After compiling and linking the code into a shared library, test the function. Together with the UDF readFileTable() that was introduced above, we can also verify if the data written to the file is correctly formatted by reading it again from the file and representing it as a table in the database.

/home/stolze $ db2 "VALUES writeFile
( '/home/stolze/write.file', 10, 'some text' )" 
 
1 
----------- 
          0 
 
  1 record(s) selected. 
 
/home/stolze $ more write.file 
        10some text 
 
/home/stolze $ db2 "VALUES writeFile
( '/home/stolze/write.file', -1234567890, 'some text' )" 
 
1 
----------- 
SQL0443N  Routine "STOLZE.WRITEFILE" (specific name "WRITEFILE") has 
returned an error SQLSTATE with diagnostic text "The number 
-1234567890 is too long."  SQLSTATE=38301 
 
/home/stolze $ db2 "CREATE TABLE t ( id INTEGER, string VARCHAR(50) )" 
/home/stolze $ db2 "INSERT INTO t VALUES ( 1, 'the first text' ), 
( 2, 'another text' )"

The file-write operation that is performed by the UDF in the following SELECT statement processes two rows in the table named "t". If the table would contain more rows, then the other rows would be written out as well. Please note that I do not have any line termination characters in the resulting file write.file. Thus, the file will only show one long string and not multiple lines when dumped to the standard output.

/home/stolze $ db2 "SELECT writeFile
( '/home/stolze/write.file', id, string ) FROM t" 
/home/stolze $ more write.file 
         1the first text                              2another text 
 
/home/stolze $ db2 "SELECT * FROM TABLE 
( readFileTable( '/home/stolze/write.file' ) ) AS x" 
 
INTCOL      STRING 
----------- -------------------------------------------------- 
          1 the first text 
          2 another text 
 
  2 record(s) selected.

Making system calls

In this section I present a simple function that can execute other commands on the database server. The C function system() is used for the task. The UDF takes the command to be executed as input parameter and executes it. The return code of the system call is passed back to the caller of the SQL function and can be further evaluated there. The corresponding SQL statement to register the function in the database system is shown in Listing 9.

Listing 9. Create a function to execute a system command

CREATE FUNCTION systemCall( command VARCHAR(2000) ) 
   RETURNS INTEGER 
   SPECIFIC systemCall  
   EXTERNAL NAME 'os_calls!systemCall' 
   LANGUAGE C  
   PARAMETER STYLE SQL  
   DETERMINISTIC  
   NOT FENCED  
   RETURNS NULL ON NULL INPUT  
   NO SQL  
   EXTERNAL ACTION  
   NO SCRATCHPAD  
   DISALLOW PARALLEL  
   FINAL CALL;  
    
GRANT EXECUTE ON FUNCTION systemCall TO PUBLIC;

As already described, the implementation of the function itself is very simple, as Listing 10 illustrates. The input parameter is just passed to the system() call and the result is passed back.

Listing 10. C code for the function to make a system call

#include <stdlib.h> 
#include <sqludf.h> 
 
void SQL_API_FN systemCall( 
        SQLUDF_VARCHAR *command,     /* input */ 
        SQLUDF_INTEGER *result,      /* output */ 
        /* null indicators */ 
        SQLUDF_NULLIND *command_ind, 
        SQLUDF_NULLIND *result_ind, 
        SQLUDF_TRAIL_ARGS) 
{ 
    int rc = 0; 
 
    if (SQLUDF_NULL(command_ind)) { 
        *result_ind = -1; 
        return; 
    } 
 
    /* execute the command */ 
    rc = system(command); 
 
    *result_ind = 0; 
    *result = rc; 
}

As with all new functionality, it has to be tested to ensure its proper functioning. The first statement to be executed runs an ls command on a certain directory to list its content. The output is redirected to a file, which can be examined at the end.

/home/stolze $ ls -F 
create_tab.sql  lost+found/  os_calls.c    os_calls.exp  sqllib/ 
drop_tab.sql    misc/        os_calls.def  os_calls.o    stolze/  
 
/home/stolze $ db2 "VALUES systemCall
('ls /home/stolze > /home/stolze/ls.out')" 
 
1 
----------- 
          0 
           
/home/stolze $ cat ls.out 
create_tab.sql 
drop_tab.sql 
lost+found/ 
ls.out 
misc/ 
os_calls.c 
os_calls.def 
os_calls.exp 
os_calls.o 
sqllib/ 
stolze/

In the next situation, I attempt to execute a SQL statement to create a table. The function call can be initiated, but the return code of 1024 indicates that the statement was not executed successfully. The executable I used is the DB2 command line process (CLP). Thus, the return code of 1024 corresponds to the SQLCODE -1024 and message SQL1024, which indicates that no database connection exists. Therefore I use a small script that establishes a connection to a database and then executes the original CREATE TABLE statement.

/home/stolze $ db2 "VALUES systemCall
('db2 \"create table t ( col int )\"')" 
 
1 
----------- 
       1024 
 
  1 record(s) selected. 
 
/home/stolze $ cat create_tab.sql 
connect to test; 
create table t ( col int ); 
connect reset; 
 
/home/stolze $ db2 "VALUES systemCall('db2 -tsf /home/stolze/create_tab.sql')" 
 
1 
----------- 
          0 
 
  1 record(s) selected. 
 
/home/stolze $ db2 describe table t 
 
Column                         Type      Type 
name                           schema    name               Length   Scale Nulls 
------------------------------ --------- ------------------ -------- ----- ------ 
COL                            SYSIBM    INTEGER                   4     0 Yes 
 
  1 record(s) selected.

Please consider that the previous example established a new connection to the database. From the perspective of DB2 UDB, there is no relation between the connection that invokes the UDF and the connection that used the SQL script. Both transactions are executed concurrently, and the usual concurrency issues might be applicable.

As the last example, I use a dynamic compound statement to execute different commands on the database server, based on the information in some table. The DB2 catalog is checked for the existence of the table named T. If it exists the table is dropped; otherwise the table is created.

/home/stolze $ db2 "BEGIN ATOMIC 
> IF ( SELECT COUNT(*) FROM syscat.tables WHERE tabname = 'T' ) < 0 THEN 
>    VALUES systemCall('db2 -tsf /home/stolze/drop_tab.sql'); 
> ELSE 
>    VALUES systemCall('db2 -tsf /home/stolze/create_tab.sql'); 
> END IF; 
> END" 
DB20000I  The SQL command completed successfully. 
/home/stolze $ db2 describe table t 
 
Column                         Type      Type 
name                           schema    name               Length   Scale Nulls 
------------------------------ --------- ------------------ -------- ----- ------ 
 
  0 record(s) selected. 
 
SQL0100W  No row was found for FETCH, UPDATE or DELETE; or the result of a 
query is an empty table.  SQLSTATE=02000 
 
/home/stolze $ db2 "BEGIN ATOMIC 
> IF ( SELECT COUNT(*) FROM syscat.tables WHERE tabname = 'T' ) > 0 THEN 
>    VALUES systemCall('db2 -tsf /home/stolze/drop_tab.sql'); 
> ELSE 
>    VALUES systemCall('db2 -tsf /home/stolze/create_tab.sql'); 
> END IF; 
> END" 
DB20000I  The SQL command completed successfully. 
/home/stolze $ db2 describe table t 
 
Column                         Type      Type 
name                           schema    name               Length   Scale Nulls 
------------------------------ --------- ------------------ -------- ----- ------ 
COL                            SYSIBM    INTEGER                   4     0 Yes 
 
  1 record(s) selected.

Things to be aware of

There are several issues to be considered when writing UDFs. This section describes the more important considerations, but it is by no means complete nor conclusive.

Developing UDFs

Whenever you develop a new user-defined function or if you modify an existing one, you should always test the function by registering it as FENCED in the database. A NOT FENCED function provides better performance for the communication between DB2 and the function itself, but an error or bug in the function code can cause damage to the function itself and, in unlucky circumstances, also to DB2 even though the DB2 engine tries to protect itself as much as possible. The use of FENCED functions protects the DB2 engine from such problems.

So you should always test your function sufficiently before running it as NOT FENCED in a production database.

Authorization

A UDF is executed inside a process of DB2. Depending on the definition of in the CREATE FUNCTION statement, either a process of the DB2 engine or a fenced mode process is used. If you create the UDF as NOT FENCED, then a DB2 engine process will execute the function; otherwise, a fenced mode process will be employed.

The respective process space defines implicitly the operating system authorizations for the user-defined function. If the UDF is executed as a fenced function, then it has access to only those files, directories, and other operating system resources that can be accessed by the fenced user (defined for your instance). In particular, if the fenced user is not the same user as the instance owner, which is the recommended configuration, then a fenced UDF cannot access the resources or files of the DB2 instance. This protects the DB2 resources from faulty or malicious UDFs.

The implications for our example UDFs presented above are straightforward. If you define the function to run as FENCED, then you must choose a file that is accessible for the fenced user. When writing to the file, then the fenced user must have the privileges to create a file in the directory where you want to write the data to. Similarly, if you run the function as NOT FENCED, then the same restrictions apply, but this time the privileges for the DB2 instance owner are considered. For example, the systemCall() UDF would be able to execute commands reserved for the DB2 instance owner.

Path names for files

In all our examples we have shown that the UDFs take an absolute path name to the file to be read or written as input parameter. There is a very good reason for not using relative path names. As already mentioned, a UDF is executed in a DB2 process. A relative path would always be relative to the directory from where the DB2 process was started. (This does not have to be the directory where the executable resides in the file system.) Thus, users usually do not know where the starting point of a relative path would be. Even if they know, they should never rely on it, because things might fail unexpectedly in the future. Using absolute path names ensures consistent and correct results.

DISALLOW PARALLEL and EXTERNAL ACTION

The UDFs readFileTable(), writeFile(), and systemCall() are defined using the DISALLOW PARALLEL clause. This is necessary to tell the DB2 that the function must not be executed in parallel. In the case of readFileTable(), it prevents the same row from being read multiple times from different DB2 processes. Obviously, when writing to a file you only want to write the data once and not multiple times, which can happen if multiple DB2 processes execute the SQL statement. Likewise, an executable started with the systemCall() function should only be run once per invocation and not multiple times in parallel. These considerations are especially important for multi-node configurations (MPP), where only a single node must process the UDF in order to get a consistent output.

The clause EXTERNAL ACTION is also defined for the writeFile() and systemCall() functions, because these function clearly perform an action, which changes the state of the system outside the database manager's control. With this clause defined, DB2 will take care to not change the number of calls made to the function inside the statement. No call will be omitted, and no additional call will be inserted. (Such situations could occur if the query rewrite component of DB2 tries to transform certain constructs in the SQL statement.)

Use LOB locators

When dealing with large objects (LOBs), it might be worthwhile to use locators. Locators have the advantage that only as much space is allocated as is really needed. For example, if you define a UDF with a return parameter of BLOB(100M), then DB2 has to make sure that the UDF gets a buffer of size 100M to write its result to. If the actual result is only a couple of hundred bytes, a lot of space would have been wasted, not to mention that the allocation of the 100M buffer is expensive.

Multiple UDFs in the same SQL statements

When writing to a new file using the above writeFile() function, make sure that there are no conflicts on the file system level. That means you should not write to the same file in different occurrences of the UDF in the same SQL statement or run several SQL statements in parallel that would write to the same file. For example, you cannot rely on any well-defined behavior for the resulting file in the following SQL statement:

/home/stolze $ db2 "SELECT writeFile( 
'/home/stolze/write.file', id, string ), writeFile( 
'/home/stolze/write.file', id + 10, string ) FROM t" 
 
/home/stolze $ more write.file 
        11the first text                            12another text

In our example, the second instance of the function could write its results, and the results from the first instance were simply lost. In a different situation, the result could be completely different.

A way to avoid such problems is to not write the data to the same file; that is, use a different file for each of the occurrences of the UDF writeFile() in the SQL statement. The following SQL statement illustrates this, and the results from both occurrences of the UDF can be found in the respective files. Thus, the concurrency issues do not arise in the first place.

/home/stolze $ db2 "SELECT writeFile( 
'/home/stolze/file.1', id, string ), writeFile( 
'/home/stolze/file.2', id + 10, UPPER(string) ) FROM t" 
 
/home/stolze $ more file.1 
         1the first text                             2another text 
 
/home/stolze $ more file.2 
        11the FIRST TEXT                            12ANOTHER TEXT

Alternatively, you can use synchronization mechanisms to prevent the data loss. I leave the implementation of such mechanisms to the interested user.

Massive parallel processing (multiple partitions)

When dealing with files in the server's file system, you must consider where the file is to be found. The question comes up especially in multi-partition configurations of DB2. (This was called DB2 Extended Enterprise Edition (EEE) in Version 7.) In such an environment you do not know on which node the UDF will be executed. You must ensure that the file you want to read with the UDFs readFile() or readFileTable() is accessible from all nodes with the same absolute path name. Or, when writing to a file, be aware that the file could be created on any of the nodes.

An easy approach to address these issues is the use of shared or network file systems. You can employ links on UNIX systems to provide identical absolute path names on all of the nodes if this becomes necessary.


Summary

This article presented a way to access files in the file system of the database server. I showed a UDF readFile() that read an entire file into a single CLOB value, followed be a more advanced table function readFileTable() that parsed single records from a file and represented them as rows consisting of multiple columns in the database. The UDF writeFile() showed how the reverse operation -- the writing of data to a file -- can be implemented and used directly in a SQL statements. I concluded with a brief overview on some of the issues that have to be carefully considered when implementing user-defined functions that access operating system resources.


Download

DescriptionNameSize
Code sampleos_calls.zip6 KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13923
ArticleTitle=Making Operating System Calls from SQL
publish-date=03272003