XML is becoming more and more important to the enterprise. Frequently, users need to create XML documents from the contents of a relational database. This can be done in many ways, but the extensibility features of IBM Informix, Version 11 give you the opportunity to create your own functions to generate XML output from your database.
This article explains a simple approach to generating XML that can answer many needs and provide a building block for more elaborate solutions. The example that accompanies this article uses the stores7 demonstration database provided with the Informix product. It is an update to an earlier article. Since the original article, Informix has implemented XML functions with the name genxml in the server. To avoid having the example code clash with the Informix implementation, the provided functions use a suffix of "2".
You may want to generate XML-formatted data using multiple types of SQL statements. The data could be in a single table, or could require the joining of multiple tables. What these SQL statements have in common is that they all return rows of data. The solution is then the same for manipulating one table or multiple tables.
Informix provides support for a data type called rows. This means that an entire row result can be passed to a user-defined function (UDF) as a single argument. The row is defined by columns that have names and types. What is left to do is pass an additional argument that identifies a row name to encapsulate the row data in the XML representation. In order to accomplish this, consider a function that would have the following definition:
CREATE FUNCTION genxml2(varchar(30), ROW) RETURNING lvarchar . . .
The first argument is the name we want to give to the row, and the second argument is the row itself. A row includes its definition, so the genxml2 function can find how many columns are in the row, their names, and so on. The following statement generates an XML representation for each row of the customer table:
SELECT genxml2("customer", customer) FROM customer;
The second argument is the table name, which means that the entire row is passed as the second argument. The first row returned looks like this (reformatted for this article):
<customer> <customer_num>101<//customer_num> <fname>Ludwig </fname> <lname>Pauli </lname> <company>All Sports Supplies </company> <address1>213 Erstwild Court </address1> <city>Sunnyvale </city> <state>CA</state> <zipcode>94086</zipcode> <phone>408-789-8075 </phone> </customer>
The column names were extracted from the row definition by the genxml2 function and used as tags for their respective columns. But what if we want a partial list of columns? We can dynamically create a row with the desired columns as the second argument to the genxml2 function as follows:
SELECT genxml2("customer", ROW(customer_num, fname, lname)) FROM customer;
The resulting XML row is similar to the previous output with fewer fields. Since we can build a row dynamically, we can also join multiple tables and create output based on all tables, as shown in the following example:
SELECT genxml2("customer_calls", ROW(A.customer_num, fname, lname, call_dtime, call_code, call_descr, res_dtime, res_dtime, res_descr) ) FROM customer a, cust_calls b WHERE a.customer_num = b.customer_num;
These examples cover many of the requirements for generating XML-formatted rows. There may also be a requirement to generate XML output based on the results of an SQL statement that includes an aggregation. Aggregate expressions can be passed to the genxml2 function, as you see here:
SELECT genxml2("stats", ROW(customer_num, COUNT(*)) ) FROM cust_calls GROUP BY customer_num;
With this basic tool, we are now ready to complete the building of an XML document by adding its header and footer.
Handling NULL values
If a column contains a NULL value, the genxml2 function, and all other functions described in this article, does not process the column. This makes sense, as the only things that would appear would be the opening and closing tags for the column.
Row type processing
A table may contain a column that is a row type. In that case, the row passed to the genxml2 function would contain a named or unnamed row type. For example, we could have a definition as follows:
CREATE ROW TYPE address_t ( name varchar(20), address1 varchar(20), address2 varchar(20), city varchar(15), state char(2), zipcode char(5) ); CREATE TABLE employee ( name varchar(30), address address_t, phone varchar(18) );
The function will process the row type, named or unnamed, just like other base types. The execution and result of a query on the employee table would look like this:
SELECT genxml2("employee", employee) FROM employee; <employee> <name>Roy</name> <address> <address1>123 first street</address1> <city>Denver</city> <state>CO</state> <zipcode>80111</zipcode> </address> <phone>303-555-1212</phone> </employee>
As you can see, the
address column tags suround
the definition of the row, and the row columns names are used to identify
the values just as in the higher level. Note that the
address2 column is a
NULL value and therefore does not appear in the
There is one more case we must consider. The Informix product supports the definition of table hierarchies that are based on named row types. Figure 1 shows an example of a hierarchy.
Figure 1. Hierarchy example
One reason to have such a hierarchy is to be able to do risk calculations that differ depending on the type of loan. By creating multiple risk functions that operate on specific row types, we can submit a query such as the following:
SELECT loan_number, risk(loans) FROM loans WHERE branch_id = 127;
This query goes through all the rows for
branch_id 127 and calls the appropriate risk
function depending on the type of row being returned. If we want to
generate an XML representation for each loan for a specific branch using
our existing genxml2 function, we must submit one SQL statement for each
sub-table. We must do this to give the proper row type name for each
Since we are dealing with named row types, the genxml2 function could retrieve this information and use it implicitly. Note that Informix differentiates functions based on their names and arguments types. Therefore we can create a new, additional genxml2 function that takes only the row as its argument and retrieves the name from the row definition. We can then generate an XML representation of loans for a specific branch with an SQL statement such as this:
SELECT genxml2(loans) FROM loans WHERE branch_id = 127;
If you were to use this function on an unnamed row type, the header would be the complete definition of the row. For the customer table, you would get the following header:
<ROW(customer_num integer, fname char(15), lname char(15), company char(20), address1 char(20), address2 char(20), city char(15),state char(2), zipcode char(5), phone char(18) )>
The same row name would be used as the row closing tag.
Completing the XML document
XML is used to describe data. It can also include information that identifies the document type and the definitions of valid documents of this type. These validation files are called document type declarations or DTDs for short. DTDs are themselves XML documents.
XML describes the data but not the way it should be presented. The presentation of the information can be described in an extensible stylesheet language (XSL).
Finally, you should identify the document as an XML document that follows a specific XML standard. This means that you should wrap the generated rows with at least with the identification that it is an XML document and a tag for the complete document. The result could be something such as this:
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE customer_set SYSTEM "/home/dtd/customer_set.dtd"> <?xml-stylesheet type="text/xsl" href="/home/xsl/customer_set.xsl" ?> <customer_set> . . . (XML-formatted rows) </customer_set>
CREATE PROCEDURE xmlcustomerset() RETURNING LVARCHAR DEFINE result LVARCHAR; DEFINE ressql LVARCHAR; LET result = '<?xml version="1.0" encoding="ISO-8859-1" ?>'; LET result = result || '<!DOCTYPE customer_set SYSTEM "/home/dtd/customer_set.dtd">'; LET result = result || '<?xml-stylesheet type="text/xsl" href="/home/xsl/customer_set.xsl" ?>'; LET result = result || '<customer_set>'; FOREACH SELECT genxml2('customer', customer) INTO ressql FROM customer LET result = result || ressql; END FOREACH; LET result = result || '</customer_set>'; RETURN result; END PROCEDURE; EXECUTE PROCEDURE xmlcustomerset();
The procedure is very simple. It simply initializes the result variable to
include the header information. It then executes the SQL statement that
returns multiple XML-formatted rows and appends the result to the result
variable. Finally, it closes the document by adding the closure of the
customer_set tag and returns the result. The
last line shows how the procedure is executed.
Keeping track of header information
The creation of an XML document from a stored procedure is always the same. It would be nice to generalize the process instead of having to create multiple stored procedures. The first step is to find a way to keep track of the document type and the stylesheet that should be used. We can use a table to keep track of this information:
CREATE TABLE genxmlinfo ( name varchar(30) PRIMARY KEY, dtypepath lvarchar, xslpath lvarchar );
The name column is the document type used. In the example above, it is
dtypepath column contains the path to the DTD
xslpath contains the path to the XSL
document to use. Both these columns allow NULLs. If the column is NULL,
then the header line does not appear in the document. This gives us the
flexibility to decide what must be included.
Generalizing the creation of XML documents
If we were to use stored procedures to generate XML documents, we would have to have one procedure for each SQL statement that we want to use. This is required because the stored procedure language only supports static SQL statements.
Now that we have a way to retrieve the header information, we can create
additional user-defined functions that will execute SQL statements
dynamically and add the header information. We can approach this process
in one of two ways. Either we simply deal with the output of
genxml2() or we accept a generic SQL statement
and format the rows that are returned by the statement. Let's look at the
We need to develop a function that takes a document type and an SQL statement as arguments and returns a complete XML document:
CREATE FUNCTION genxmlhdr2(LVARCHAR, LVARCHAR) RETURNING LVARCHAR . . .
We can then generate the
document by executing:
EXECUTE FUNCTION genxmlhdr2("customer_set", "SELECT * FROM customer");
This will generate an XML document that is slightly different from the XML
document generated by the
procedure: we have no way to provide a name for the returned rows. We have
to give them a generic name. In this case: row. This is not a problem. We
simply have to take that into consideration when we put together the DTDs
and XLS documents that relate to our document types.
We can provide a little more flexibility by adding a function similar to
genxml2(lvarchar, lvarchar) function that
assumes that the output of the SQL statement will be XML-formatted rows.
The creation of the
customer_set document would
EXECUTE FUNCTION addxmlhdr2("customer_set", "SELECT genxml2('customer', customer) FROM customer");
To complete our set of functions, we need to add one function that takes an SQL statement and returns all the rows as an XML document without the header information. With this function, it is then possible to execute dynamic SQL statements within a stored procedure (Dynamic SQL statements was added to SPL in Informix, Version 11.50). Since the function signature of this function is different from the one of genxml2(lvarchar, row), we can use the same name:
CREATE FUNCTION genxml2(LVARCHAR, LVARCHAR) RETURNING LVARCHAR . . .
The first argument represents the row name and the second, an SQL
statement. The generation of the
document without the header is then:
EXECUTE FUNCTION genxml2("customer_set", "SELECT * FROM customer");
Iformix provides the capability to create your own aggregation functions. Since the creation of an XML document involves processing a set of rows to return one result, the user-defined aggregate (UDA) capabilities are well suited to XML processing.
By definition, a relational database does not guarantee any order in the
rows. The only way to order the rows is by processing them after an SQL
statement that includes an ORDER BY clause. You can work around this
problem by using a
SELECT from SELECT-type
statement or a temporary table to contain the result you want to generate
Row order not withstanding, you could generate an XML document by using a UDA. This UDA could be used in the following manner:
SELECT aggrxml2(customer, "customer_set") FROM customer;
The first argument of the UDA is the row we want to process. The second
argument is an initialization value that represents the document type name
that is generated. This means that the output is the same as the one
generated by the
functions discussed above. The usage is more natural than using an EXECUTE
FUNCTION statement. Since this is an aggregate function, you can take
advantage of the GROUP BY SQL clause. This capability can be useful if you
have to generate multiple XML documents. For example, this could reduce
the number of SQL statements executed within a stored procedure by
removing an embedded FOR EACH loop.
All the functions discussed above have been implemented in C and are available as example code. Here is a list of the functions including their argument types:
- addxmlhdr2(lvarchar, lvarchar)
- aggrxml2(ROW, lvarchar)
- genxml2(lvarchar, ROW)
- genxml2(lvarchar, lvarchar)
- genxmlhdr2(lvarchar, lvarchar)
Note that the
genxml2(lvarchar, lvarchar) and
genxmlhdr2(lvarchar, lvarchar) functions can
only be executed by an EXECUTE FUNCTION statement.
The source code for these functions is around 300 lines of C code. The code was tested on Informix, Version 11.50 on Linux®.
The implementation also includes a function that's useful if you intend to
modify the code: set_tracing. This function allows you to turn on the
execution tracing of a function. You will need to remove
"-DMITRACE_OFF=1" from the make files and add
tracing code to the source. For more information on tracing, please see
the book Server-Side Programming in C listed in the Resources section or the Infomix
The installation of the example code involved three steps:
- Loading the code: All the files included in the example should
be loaded in a directory under the
$INFORMIXDIR/extenddirectory. The directory name must be
- Compiling the code: You must compile the source code for your
specific platform. The example code includes two makefiles:
Winnt.makis used for compilation under Microsoft windows 2000. It assumes the use of Microsoft Visual C++. The file
makefileis used to compile the code for UNIX#x422-type platforms. You will need to modify the makefile include statement (line 2) to reflect the appropriate platform
- Register the functions: Before you can use the functions
described above, they must be registered in the databases where the
functions are needed. The script named
genxml2.sqlregisters all the required functions and creates the
genxmlinfotable that is used to keep track of the information about document types.
To remove the functions, you can execute the
genxml2_d.sql script in the appropriate
database. It removes all the functions and the
genxmlinfo table. If you want to keep the
table, you should modify the installation script and the removal script
and comment out the lines that involve the table.
The easiest way to register and de-register the functions is to execute the scripts through the command line:
dbaccess -e jroy genxml2 >errlog 2>&1
This example assumes a database name of
redirects the output of
dbaccess to a file
errlog. If an error occurs, the script
continues the execution at the next statement in the script file. The -e
dbaccess to echo the commands to
the output so you can see what command generated which result.
The example code provided with this article contains two major limitations:
- The XML output of a row cannot exceed 2048 characters.
- The XML document output is limited to around 30KB characters.
These limitations can be removed by adding the code to handle these limits.
You can find example code on a type called
idn_mrlvarchar at the IBM web site listed in
the Resources section. You can also find the code
and the description of the type and how to use it in the Open Source
Components book listed also in the Resources section. By using an implementation similar to
idn_mrlvarchar, your XML document limit would
jump to 4TB.
This implementation supports most of the Informix data types except for collection types (LIST, MULTISET, SET) and the large object types (BLOB, BYTE, CLOB, and TEXT).
The ability to generate XML documents directly from the database server can
greatly simplify the implementation of a solution. It simplifies the
interaction between the application and the database. This application can
be code running in an application server, code running in a web server, a
CGI program, or a stand-alone application. Since the
genxml2 code runs in the database server, it is
readily available to any application. This availability fosters code
genxml2 code demonstrates the power of
object-relational technology. And it demonstrates at least two more
- You don't always need to wait for the database vendor to provide functionality to solve your business problems.
- Some interesting functionality can be added to the database server with little effort and provide significant business advantage.
|genxml2 sample code||genxml2.zip||128 KB|
- IBM Informix product page on developerWorks: Get the resources you need to advance your Informix skills.
- developerWorks Information Management zone: Learn more about Information Management. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Informix Dynamic Server.2000: Server-Side Programming in C, Jacques Roy, ISBN 0-13-013709-X, Prentice-Hall (Informix Press), 2000
- Object-Relational Databases: A Plumber Guide, Paul Brown, ISBN 0-13-019460-3, Prentice-Hall (Informix Press), 2001
- Open Source Components for Informix Dynamic Server 9.x, Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown, ISBN 0-13-042827-2, Prentice-Hall (Informix Press), 2002
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in developerWorks blogs and get involved in the My developerWorks community; with your personal profile and custom home page, you can tailor developerWorks to your interests and interact with other developerWorks users.