This article describes how to use the extensibility features of IBM Informix to generate XML-formatted data. Sample code is included.

Editor's note: This article and the download have been updated for changes to the product since this article was originally published.

Share:

Jacques Roy (jacquesr@us.ibm.com), Architect, I.B.M.

Jacques Roy photoJacques Roy is a member of IBM Informix development lab. He has over 25 years of industry experience and over 10 years of experience with database extensibility. He is the author of Informix Dynamic Server.2000: Server-Side Programming in C and co-author of Open-Source Components for the Informix Dynamic Server 9.x.



19 August 2010 (First published 25 February 2003)

Introduction

XML is becoming more and more important to the enterprise. Frequently, users need to create XML documents from the contents of a relational database. This can be done in many ways, but the extensibility features of IBM Informix, Version 11 give you the opportunity to create your own functions to generate XML output from your database.

This article explains a simple approach to generating XML that can answer many needs and provide a building block for more elaborate solutions. The example that accompanies this article uses the stores7 demonstration database provided with the Informix product. It is an update to an earlier article. Since the original article, Informix has implemented XML functions with the name genxml in the server. To avoid having the example code clash with the Informix implementation, the provided functions use a suffix of "2".


Building block

You may want to generate XML-formatted data using multiple types of SQL statements. The data could be in a single table, or could require the joining of multiple tables. What these SQL statements have in common is that they all return rows of data. The solution is then the same for manipulating one table or multiple tables.

Informix provides support for a data type called rows. This means that an entire row result can be passed to a user-defined function (UDF) as a single argument. The row is defined by columns that have names and types. What is left to do is pass an additional argument that identifies a row name to encapsulate the row data in the XML representation. In order to accomplish this, consider a function that would have the following definition:

CREATE FUNCTION genxml2(varchar(30), ROW)
RETURNING lvarchar
. . .

The first argument is the name we want to give to the row, and the second argument is the row itself. A row includes its definition, so the genxml2 function can find how many columns are in the row, their names, and so on. The following statement generates an XML representation for each row of the customer table:

SELECT genxml2("customer", customer)
FROM customer;

The second argument is the table name, which means that the entire row is passed as the second argument. The first row returned looks like this (reformatted for this article):

<customer>
<customer_num>101&lt//customer_num>
<fname>Ludwig        </fname>
<lname>Pauli         </lname>
<company>All Sports Supplies </company>
<address1>213 Erstwild Court  </address1>
<city>Sunnyvale      </city>
<state>CA</state>
<zipcode>94086</zipcode>
<phone>408-789-8075      </phone>
</customer>

The column names were extracted from the row definition by the genxml2 function and used as tags for their respective columns. But what if we want a partial list of columns? We can dynamically create a row with the desired columns as the second argument to the genxml2 function as follows:

SELECT genxml2("customer", ROW(customer_num, fname, lname))
FROM customer;

The resulting XML row is similar to the previous output with fewer fields. Since we can build a row dynamically, we can also join multiple tables and create output based on all tables, as shown in the following example:

SELECT genxml2("customer_calls",
        ROW(A.customer_num, fname, lname, call_dtime, call_code,
            call_descr, res_dtime, res_dtime, res_descr)
       )
FROM customer a, cust_calls b
WHERE a.customer_num = b.customer_num;

These examples cover many of the requirements for generating XML-formatted rows. There may also be a requirement to generate XML output based on the results of an SQL statement that includes an aggregation. Aggregate expressions can be passed to the genxml2 function, as you see here:

SELECT genxml2("stats", ROW(customer_num, COUNT(*)) )
FROM cust_calls
GROUP BY customer_num;

With this basic tool, we are now ready to complete the building of an XML document by adding its header and footer.


Handling NULL values

If a column contains a NULL value, the genxml2 function, and all other functions described in this article, does not process the column. This makes sense, as the only things that would appear would be the opening and closing tags for the column.


Row type processing

A table may contain a column that is a row type. In that case, the row passed to the genxml2 function would contain a named or unnamed row type. For example, we could have a definition as follows:

CREATE ROW TYPE address_t (
  name     varchar(20),
  address1 varchar(20),
  address2 varchar(20),
  city     varchar(15),
  state    char(2),
  zipcode  char(5)
);
CREATE TABLE employee (
  name varchar(30),
  address  address_t,
  phone  varchar(18)
);

The function will process the row type, named or unnamed, just like other base types. The execution and result of a query on the employee table would look like this:

SELECT genxml2("employee", employee) FROM employee;

<employee>
<name>Roy</name>
<address>
  <address1>123 first street</address1>
  <city>Denver</city>
  <state>CO</state>
  <zipcode>80111</zipcode>
</address>
<phone>303-555-1212</phone>
</employee>

As you can see, the address column tags suround the definition of the row, and the row columns names are used to identify the values just as in the higher level. Note that the address2 column is a NULL value and therefore does not appear in the output.


Table hierarchies

There is one more case we must consider. The Informix product supports the definition of table hierarchies that are based on named row types. Figure 1 shows an example of a hierarchy.

Figure 1. Hierarchy example
Figure 1

One reason to have such a hierarchy is to be able to do risk calculations that differ depending on the type of loan. By creating multiple risk functions that operate on specific row types, we can submit a query such as the following:

SELECT loan_number, risk(loans) FROM loans WHERE branch_id = 127;

This query goes through all the rows for branch_id 127 and calls the appropriate risk function depending on the type of row being returned. If we want to generate an XML representation for each loan for a specific branch using our existing genxml2 function, we must submit one SQL statement for each sub-table. We must do this to give the proper row type name for each row.

Since we are dealing with named row types, the genxml2 function could retrieve this information and use it implicitly. Note that Informix differentiates functions based on their names and arguments types. Therefore we can create a new, additional genxml2 function that takes only the row as its argument and retrieves the name from the row definition. We can then generate an XML representation of loans for a specific branch with an SQL statement such as this:

SELECT genxml2(loans) FROM loans WHERE branch_id = 127;

If you were to use this function on an unnamed row type, the header would be the complete definition of the row. For the customer table, you would get the following header:

<ROW(customer_num integer, fname char(15), lname char(15),
     company char(20), address1 char(20), address2 char(20),
     city char(15),state char(2), zipcode char(5),
     phone char(18)
    )>

The same row name would be used as the row closing tag.


Completing the XML document

XML is used to describe data. It can also include information that identifies the document type and the definitions of valid documents of this type. These validation files are called document type declarations or DTDs for short. DTDs are themselves XML documents.

XML describes the data but not the way it should be presented. The presentation of the information can be described in an extensible stylesheet language (XSL).

Finally, you should identify the document as an XML document that follows a specific XML standard. This means that you should wrap the generated rows with at least with the identification that it is an XML document and a tag for the complete document. The result could be something such as this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE customer_set SYSTEM "/home/dtd/customer_set.dtd">
<?xml-stylesheet type="text/xsl"
                 href="/home/xsl/customer_set.xsl" ?>
<customer_set>
. . . (XML-formatted rows)
</customer_set>
CREATE PROCEDURE xmlcustomerset()
RETURNING LVARCHAR
  DEFINE result LVARCHAR;
  DEFINE ressql LVARCHAR;
  LET result = '<?xml version="1.0" encoding="ISO-8859-1" ?>';
  LET result = result ||
      '<!DOCTYPE customer_set SYSTEM "/home/dtd/customer_set.dtd">';
  LET result = result ||
'<?xml-stylesheet type="text/xsl" href="/home/xsl/customer_set.xsl" ?>';
  LET result = result || '<customer_set>';
  FOREACH SELECT genxml2('customer', customer) INTO ressql FROM customer
    LET result = result || ressql;
  END FOREACH;
  LET result = result || '</customer_set>';
  RETURN result;
END PROCEDURE;
EXECUTE PROCEDURE xmlcustomerset();

The procedure is very simple. It simply initializes the result variable to include the header information. It then executes the SQL statement that returns multiple XML-formatted rows and appends the result to the result variable. Finally, it closes the document by adding the closure of the customer_set tag and returns the result. The last line shows how the procedure is executed.


Keeping track of header information

The creation of an XML document from a stored procedure is always the same. It would be nice to generalize the process instead of having to create multiple stored procedures. The first step is to find a way to keep track of the document type and the stylesheet that should be used. We can use a table to keep track of this information:

CREATE TABLE genxmlinfo (
  name       varchar(30) PRIMARY KEY,
  dtypepath  lvarchar,
  xslpath    lvarchar
);

The name column is the document type used. In the example above, it is customer_set. The dtypepath column contains the path to the DTD and the xslpath contains the path to the XSL document to use. Both these columns allow NULLs. If the column is NULL, then the header line does not appear in the document. This gives us the flexibility to decide what must be included.


Generalizing the creation of XML documents

If we were to use stored procedures to generate XML documents, we would have to have one procedure for each SQL statement that we want to use. This is required because the stored procedure language only supports static SQL statements.

Now that we have a way to retrieve the header information, we can create additional user-defined functions that will execute SQL statements dynamically and add the header information. We can approach this process in one of two ways. Either we simply deal with the output of genxml2() or we accept a generic SQL statement and format the rows that are returned by the statement. Let's look at the latter first.

We need to develop a function that takes a document type and an SQL statement as arguments and returns a complete XML document:

CREATE FUNCTION genxmlhdr2(LVARCHAR, LVARCHAR)
RETURNING LVARCHAR
. . .

We can then generate the customer_set XML document by executing:

EXECUTE FUNCTION genxmlhdr2("customer_set",
                           "SELECT * FROM customer");

This will generate an XML document that is slightly different from the XML document generated by the xmlcustomerset() procedure: we have no way to provide a name for the returned rows. We have to give them a generic name. In this case: row. This is not a problem. We simply have to take that into consideration when we put together the DTDs and XLS documents that relate to our document types.

We can provide a little more flexibility by adding a function similar to the genxml2(lvarchar, lvarchar) function that assumes that the output of the SQL statement will be XML-formatted rows. The creation of the customer_set document would become:

EXECUTE FUNCTION addxmlhdr2("customer_set",
        "SELECT genxml2('customer', customer) FROM customer");

To complete our set of functions, we need to add one function that takes an SQL statement and returns all the rows as an XML document without the header information. With this function, it is then possible to execute dynamic SQL statements within a stored procedure (Dynamic SQL statements was added to SPL in Informix, Version 11.50). Since the function signature of this function is different from the one of genxml2(lvarchar, row), we can use the same name:

CREATE FUNCTION genxml2(LVARCHAR, LVARCHAR)
RETURNING LVARCHAR
. . .

The first argument represents the row name and the second, an SQL statement. The generation of the customer_set document without the header is then:

EXECUTE FUNCTION genxml2("customer_set", "SELECT * FROM customer");

User-defined aggregate

Iformix provides the capability to create your own aggregation functions. Since the creation of an XML document involves processing a set of rows to return one result, the user-defined aggregate (UDA) capabilities are well suited to XML processing.

By definition, a relational database does not guarantee any order in the rows. The only way to order the rows is by processing them after an SQL statement that includes an ORDER BY clause. You can work around this problem by using a SELECT from SELECT-type statement or a temporary table to contain the result you want to generate as XML.

Row order not withstanding, you could generate an XML document by using a UDA. This UDA could be used in the following manner:

SELECT aggrxml2(customer, "customer_set") FROM customer;

The first argument of the UDA is the row we want to process. The second argument is an initialization value that represents the document type name that is generated. This means that the output is the same as the one generated by the genxml2(lvarchar, lvarchar) functions discussed above. The usage is more natural than using an EXECUTE FUNCTION statement. Since this is an aggregate function, you can take advantage of the GROUP BY SQL clause. This capability can be useful if you have to generate multiple XML documents. For example, this could reduce the number of SQL statements executed within a stored procedure by removing an embedded FOR EACH loop.


Implementation

All the functions discussed above have been implemented in C and are available as example code. Here is a list of the functions including their argument types:

  • addxmlhdr2(lvarchar, lvarchar)
  • aggrxml2(ROW, lvarchar)
  • genxml2(lvarchar, ROW)
  • genxml2(ROW)
  • genxml2(lvarchar, lvarchar)
  • genxmlhdr2(lvarchar, lvarchar)

Note that the addxmlhdr2(lvarchar, lvarchar), genxml2(lvarchar, lvarchar) and genxmlhdr2(lvarchar, lvarchar) functions can only be executed by an EXECUTE FUNCTION statement.

The source code for these functions is around 300 lines of C code. The code was tested on Informix, Version 11.50 on Linux®.

The implementation also includes a function that's useful if you intend to modify the code: set_tracing. This function allows you to turn on the execution tracing of a function. You will need to remove "-DMITRACE_OFF=1" from the make files and add tracing code to the source. For more information on tracing, please see the book Server-Side Programming in C listed in the Resources section or the Infomix documentation.


Installation

The installation of the example code involved three steps:

  • Loading the code: All the files included in the example should be loaded in a directory under the $INFORMIXDIR/extend directory. The directory name must be genxml2.
  • Compiling the code: You must compile the source code for your specific platform. The example code includes two makefiles: winnt.mak and makefile. Winnt.mak is used for compilation under Microsoft windows 2000. It assumes the use of Microsoft Visual C++. The file makefile is used to compile the code for UNIX#x422-type platforms. You will need to modify the makefile include statement (line 2) to reflect the appropriate platform
  • Register the functions: Before you can use the functions described above, they must be registered in the databases where the functions are needed. The script named genxml2.sql registers all the required functions and creates the genxmlinfo table that is used to keep track of the information about document types.

To remove the functions, you can execute the genxml2_d.sql script in the appropriate database. It removes all the functions and the genxmlinfo table. If you want to keep the table, you should modify the installation script and the removal script and comment out the lines that involve the table.

The easiest way to register and de-register the functions is to execute the scripts through the command line:

dbaccess -e jroy genxml2 >errlog 2>&1

This example assumes a database name of jroy and redirects the output of dbaccess to a file named errlog. If an error occurs, the script continues the execution at the next statement in the script file. The -e option tells dbaccess to echo the commands to the output so you can see what command generated which result.


Limitations

The example code provided with this article contains two major limitations:

  • The XML output of a row cannot exceed 2048 characters.
  • The XML document output is limited to around 30KB characters.

These limitations can be removed by adding the code to handle these limits. You can find example code on a type called idn_mrlvarchar at the IBM web site listed in the Resources section. You can also find the code and the description of the type and how to use it in the Open Source Components book listed also in the Resources section. By using an implementation similar to idn_mrlvarchar, your XML document limit would jump to 4TB.

This implementation supports most of the Informix data types except for collection types (LIST, MULTISET, SET) and the large object types (BLOB, BYTE, CLOB, and TEXT).


Conclusion

The ability to generate XML documents directly from the database server can greatly simplify the implementation of a solution. It simplifies the interaction between the application and the database. This application can be code running in an application server, code running in a web server, a CGI program, or a stand-alone application. Since the genxml2 code runs in the database server, it is readily available to any application. This availability fosters code reuse.

The genxml2 code demonstrates the power of object-relational technology. And it demonstrates at least two more things:

  • You don't always need to wait for the database vendor to provide functionality to solve your business problems.
  • Some interesting functionality can be added to the database server with little effort and provide significant business advantage.

Download

DescriptionNameSize
genxml2 sample codegenxml2.zip128 KB

Resources

Learn

  • IBM Informix product page on developerWorks: Get the resources you need to advance your Informix skills.
  • developerWorks Information Management zone: Learn more about Information Management. Find technical documentation, how-to articles, education, downloads, product information, and more.
  • Stay current with developerWorks technical events and webcasts.
  • Informix Dynamic Server.2000: Server-Side Programming in C, Jacques Roy, ISBN 0-13-013709-X, Prentice-Hall (Informix Press), 2000
  • Object-Relational Databases: A Plumber Guide, Paul Brown, ISBN 0-13-019460-3, Prentice-Hall (Informix Press), 2001
  • Open Source Components for Informix Dynamic Server 9.x, Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown, ISBN 0-13-042827-2, Prentice-Hall (Informix Press), 2002

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=95791
ArticleTitle=Generating XML from the Informix server
publish-date=08192010