Get off to a fast start with DB2 9 pureXML, Part 2: Create and populate a DB2 XML database

The IBM® DB2® V9 for Linux®, UNIX®, and Windows® features significant new support for storing, managing, and searching XML data, referred to as pureXML®. This series helps you master these new XML features quickly through several step-by-step articles that explain how to accomplish fundamental tasks. In this article, learn how to create database objects for managing your XML data and how to populate your DB2 database with XML data. [18 Mar 2010: Originally written in 2006, this article has been updated to include changes in DB2 versions 9.5 and 9.7.--Ed.]

Cynthia M. Saracco, Senior Software Engineer, IBM

C. M. Saracco works at IBM's Silicon Valley Laboratory in the DB2 XML organization. She works on database management, XML, Web application development, and related topics.



18 March 2010 (First published 02 March 2006)

Also available in Russian Japanese Vietnamese Portuguese

You've probably heard the buzz about DB2's V9 -- IBM's first database management system to support both tabular (SQL-based) and hierarchical (XML-based) data structures. If you're curious about DB2's new native support for XML and want to get off to a fast start, you've come to the right place.

To help you quickly get up to speed on DB2's native XML features, this article walks through several common tasks, such as:

  • Creating database objects for managing XML data, including a test database, sample tables, and views
  • Populating the database with XML data using INSERT and IMPORT statements
  • Validating your XML data. Develop and register your XML schemas with DB2, and use the XMLVALIDATE option when importing data.

Future articles will cover other topics, such as querying, updating, and deleting DB2 XML data with SQL, querying DB2 XML data with XQuery, and developing Java applications and Web components that access DB2 XML data.

Creating database objects

To get started, create a single DB2 Unicode database. (With DB2 V9.1, a Unicode database is required for XML. DB2 V9.5 and later no longer require a Unicode database.) Later, you'll create objects within this database to manage both XML and other types of data.

Creating a test database

To create a new DB2 Unicode test database, open a DB2 command window and issue a statement specifying a Unicode codeset and a supported territory, as shown in Listing 1.

Listing 1. Creating a database for storing XML data
create database test using codeset UTF-8 territory us

Once you create a database, you don't need to issue any special commands or take any further action to enable DB2 to store XML data in its native hierarchical format. Your DB2 system is ready to go.

Creating sample tables

To store XML data, you create tables that contain one or more XML columns. These tables serve as logical containers for collections of documents. Behind the scenes, DB2 actually uses a different storage scheme for XML and non-XML data. However, using tables as a logical object for managing all forms of supported data simplifies administration and application development issues, particularly when different forms of data need to be integrated in a single query.

You can define DB2 tables to contain only XML columns, only columns of traditional SQL types, or a combination of both. This article models the latter. The example in Listing 2 connects to the test database and creates two tables. The first is an Items table that tracks information about items for sale and comments that customers have made about them. The second table tracks information about Clients, including contact data. Note that Comments and Contactinfo are based on the new DB2 XML data type, while all other columns in the tables are based on traditional SQL data types.

Listing 2. Creating tables for XML data
connect to test;

create table items (
  id          int primary key not null, 
  brandname   varchar(30), 
  itemname    varchar(30), 
  sku         int, 
  srp         decimal(7,2), 
  comments    xml
);

create table clients(
  id          int primary key not null, 
  name        varchar(50), 
  status      varchar(10), 
  contactinfo xml
);

If you look closely at these table definition examples, you'll notice that neither specified the internal structure of the XML documents to be stored in the Comments or Contactinfo columns. This is an important DB2 feature. Users do not need to pre-define an XML data structure (or, more accurately, an XML schema) in order to store their data. Indeed, DB2 can store any well-formed XML document in a single column, which means that XML documents of different schemas (or documents not associated with any registered schema) can be stored within the same DB2 column. This article discusses this feature more when it discusses how to store data in DB2.

The option to store smaller XML documents inline was introduced in V9.5. If the XML document is small enough to fit into the page size, it can be stored with the other SQL elements. If it is not small enough to fit into a page, it will be stored separately. Along with the inline keyword, you supply the maximum size of the XML to be inlined. Base this value on the page size and on the size of the other relational columns. Listing 3 shows the code snippet to do this:

Listing 3. Creating tables for XML data with the inline option
connect to test;

create table items (
  id          int primary key not null, 
  brandname   varchar(30), 
  itemname    varchar(30), 
  sku         int, 
  srp         decimal(7,2), 
  comments    xml inline length 10240
);

Creating views

Optionally, you can create views over tables containing XML data, just as you can create views over tables containing only traditional SQL data types. The example in Listing 4 creates a view of clients with a Gold status:

Listing 4. Creating a view that contains XML data
create view goldview as 
select id, name, contactinfo 
from clients where status='Gold';

A note about indexes

Finally, note that you can create specialized indexes on your XML columns to speed searches of your data. Because this is an introductory article and the sample data is small, this article will not be covering that topic. However, in production environments, defining appropriate indexes can be critical to achieving optimal performance. See Resources for help on how to learn more about DB2's new indexing technology.


Storing XML data

With your tables created, you can now populate them with data. Issue SQL INSERT statements directly or by invoking the DB2 IMPORT facility, which issues INSERT statements behind the scenes. With DB2 V9.5, the LOAD facility also supports XML data.

Using INSERT statements

With INSERT, you supply DB2 with the raw XML data directly. That's perhaps easiest to do if you've written an application and stored the XML data in a variable. But if you're just getting started with DB2 and don't want to write an application, you can issue your INSERT statements interactively. (I find it convenient to use the DB2 Command Editor, although you can also use the command line processor, if you'd prefer.)

To use the DB2 Command Editor, launch the DB2 Control Center. From the Tools pull-down menu at the top, select Command Editor. A separate window appears, as shown in Figure 1.

Figure 1. DB2 Command Editor
DB2 Command Editor

Type the following statements into the upper pane:

Listing 5. Inserting XML data interactively
connect to test;

insert into clients values (77, 'John Smith', 'Gold', 
   '<addr>111 Main St., Dallas, TX, 00112</addr>')

Click the green arrow at left to execute the command.

In this case, the input document was quite simple. If the document was large or complex, it would be impractical to type the XML data into the INSERT statement as shown. In most cases, you'd write an application to insert the data using a host variable or a parameter marker. You'll find a brief Java coding example that accompanies this article. However, this introductory tutorial does not cover application development topics in detail. Instead, we'll discuss another option for populating DB2 XML columns with data—using the IMPORT facility.

Using DB2 IMPORT

If you already have your XML data in files, the DB2 IMPORT facility provides a simple way for you to populate your DB2 tables with this data. You don't need to write an application. You just need to create a delimited ASCII file containing the data you want to load into your table. For XML data stored in files, a parameter specifies the appropriate file names.

You can create the delimited ASCII file using the text editor of your choice. (By convention, such files are usually of type .del.) Each line in your file represents a row of data to be imported into your table. If your line contains an XML Data Specifier (XDS), IMPORT will read the data contained in the referenced XML file and import that into DB2. For example, the first line in Listing 6 contains information for Ella Kimpton, including her ID, name, and customer status. Her contact information is included in the Client3227.xml file.

Listing 6. clients.del file
3227,Ella Kimpton,Gold,<XDS FIL='Client3227.xml' />
8877,Chris Bontempo,Gold,<XDS FIL='Client8877.xml' />
9077,Lisa Hansen,Silver,*lt;XDS FIL='Client9077.xml' />
9177,Rita Gomez,Standard,<XDS FIL='Client9177.xml' />
5681,Paula Lipenski,Standard,<XDS FIL='Client5681.xml' />
4309,Tina Wang,Standard,<XDS FIL='Client4309.xml' />

The content of the Client3227.xml file is shown in Listing 7. The file contains XML elements for Ella Kimpton's address, phone numbers, fax number, and email.

Listing 7. Client3227.xml file
<?xml version="1.0"?>
<Client
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
xsi:noNamespaceSchemaLocation="http://bogus">
	<Address>
		<street>5401 Julio Ave</street>
		<city>San Jose</city>
		<state>CA</state>
		<zip>95116</zip>
	</Address>
	<phone>
		<work>4084630000</work>
		<home>4081111111</home>
		<cell>4082222222</cell>
	</phone>
	<fax>4087776666</fax>
	<email>love2shop@yahoo.com</email>
</Client>

Perhaps you're curious about importing data if you don't have XML files for all the rows you wish to insert. That's easy to do. Omit the XDS information from your input file. For example, the items.del file in Listing 8 omits the name of an XML file for Item 3641 (the Dress to Impress suit). As a result, the XML column for this row will not contain any data.

items.del file
3926,NatureTrail,Walking boot, 38112233,64.26,<XDS FIL='Comment3926.xml' />
4023,NatureTrail,Back pack,552238,34.99,<XDS FIL='Comment4023.xml' />
3641,Dress to Impress,Syutm7811421,149.99,
4272,Classy,Cocktail dress,981140,156.99,<XDS FIL='Comment4272.xml' />

With your XML files and delimited ASCII files available, you're now ready to use DB2 IMPORT. The statement in Listing 9 imports the contents specified in the clients.del file into the C:/XMLFILES directory into the clients table.

Listing 9. Importing data into the clients table
import from clients.del of del xml from C:/XMLFILES insert into user1.clients;

The clients.del file shown in Listing 6 contains data for six rows, including references to six XML files. Successfully executing an IMPORT command results in output similar to Listing 10.

Listing 10. Sample output of DB2 IMPORT
import from clients.del of del xml from C:/XMLFiles insert into saracco.clients
SQL3109N  The utility is beginning to load data from file "clients.del".

SQL3110N  The utility has completed processing.  "6" rows were read from the
input file.

SQL3221W  ...Begin COMMIT WORK. Input Record Count = "6".

SQL3222W  ...COMMIT of any database changes was successful.

SQL3149N  "6" rows were processed from the input file.  "6" rows were
successfully inserted into the table.  "0" rows were rejected.

Number of rows read		= 6
Number of rows skipped		= 0 
Number of rows inserted 	= 6 
Number of rows updated 		= 0
Number of rows rejected 	= 0
Number of rows committed 	= 6

Independent software vendors offer tools to help you convert Microsoft® Word, Acrobat PDF, and other document formats into XML for import into DB2. See Resources for more information about ISVs.


Validating your XML data

The INSERT and IMPORT examples just discussed can write any well-formed XML data to your tables. They don't validate that data. In other words, they don't verify that the data conforms to a particular XML schema and therefore adheres to a certain structure. It is possible to direct DB2 to do that, however. Here is one approach:

Step 1: Creating an XML schema

To validate XML data, you need to define an XML schema that specifies acceptable XML elements, their order and data types, and so on. XML schemas are a W3C industry standard and are written in XML. While it is beyond the scope of this article to explain the features of XML schemas, various tutorials are available (see Resources).

There are many ways to develop XML schemas, ranging from using your favorite text editor to manually create your schema to using tools to graphically design or generate a schema. Independent software vendors provide such XML tools, and IBM also offers XML schema generation support through Java™-integrated development environments.

For example, with IBM Rational® Application Developer or IBM Rational Software Architect, you can import an xml file into a Web project. The xml file used in this example was taken from the customer table in the sample database of DB2. Right-click the project, and select Generate > XML Schema. This generates a valid XML schema for your particular input file, as shown in Figure 2 (larger image). You can then modify the file (if necessary) and register it with DB2.

Figure 2. Using IBM Rational Software Architect to generate an XML schema from an XML file
shows generated schema file in center pane

Assume you need to make your XML schema rather flexible so that you can collect different types of contact information for different customers. For example, some customers might provide you with multiple phone numbers or email addresses, while others might not. The XML schema shown in Listing 11, which was derived from the schema that IBM Rational Software Architect generated, allows for this flexibility. It includes additional specifications about the minimum and maximum number of occurrences (minOccurs and maxOccurs) allowed for a given element. In this case, the customer isn't required to give you any of the contact information you'd like to collect. However, if a customer chooses to give you email information, this schema enables conforming documents to contain up to five email addresses (that is, five email element values).

Listing 11. Sample XML schema for client contact information
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmins:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:element name="address">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="street" minOccurs="0"/>
                <xsd:element ref="apt" minOccurs="0"/>
                <xsd:element ref="city" minOccurs="0"/>
                <xsd:element ref="state" minOccurs="0"/>
                <xsd:element ref="zip" minOccurs="0"/>
            </xsd:sequence>
         </xsd:complexType>
    </xsd:element>
    <xsd:element name="Client">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element ref="Address" minOccurs="0" maxOccurs="5"/>
                <xsd:element ref="phone" minOccurs="0" maxOccurs="1"/>
                <xsd:element ref="fax" minOccurs="0" maxOccurs="1"/>
                <xsd:element ref="email" minOccurs="0" maxOccurs="5"/>
            </xsd:sequence>
         </xsd:complexType>
    </xsd:element>
    <xsd:element name="cell type="xsd:string"/>
    <xsd:element name="city type="xsd:string"/>
    <xsd:element name="email" type="xsd:string"/>
    <xsd:element name="fax" type="xsd:string"/>
    <xsd:element name="home" type="xsd:string"/>
    <xsd:element name="phone">
         <xsd:complexType>
           <xsd:sequence>
                <xsd:element ref="work" minOccurs="0"/>
                <xsd:element ref="home" minOccurs="0"/>
                <xsd:element ref="cell" minOccurs="0"/>			
            </xsd:sequence>
         </xsd:complexType>
    </xsd:element>
    . . .
</xsd:schema>

XML schemas also contain type information. The schema shown in Listing 11 specifies that all base elements are treated as strings. However, most production XML schemas make use of other data types as well, such as integer, decimal, date, and so on. If you validate XML documents against a given schema as part of your INSERT or IMPORT operation, DB2 automatically adds type annotations to your XML documents.

Step 2: Registering the XML schema

Once you have created an appropriate XML schema, you need to register the schema with DB2. IBM provides multiple ways to do this. You can launch graphical wizards from the DB2 Control Center to guide you through the process, invoke system-supplied stored procedures, or issue DB2 commands directly. For this example, use the latter method, because it might help you more readily understand what DB2 is doing behind the scenes on your behalf.

If your schema is very large, you may need to increase your application heap size before attempting to register it. For example, issue the following statements:

Listing 12. Increasing the application heap size
 connect to test;
 update db cfg using applheapsz 10000;

Next, register your XML schema. If your XML schema does not reference other XML schemas, you can register and complete the process with a single command. Otherwise, you need to issue individual commands to register your primary XML schema, add the other required schemas, and complete the registration process. When a schema document becomes very large, it's common to divide its content into multiple files to improve maintenance, readability, and reuse. This is akin to breaking up a complex application or component into multiple modules. For details on this topic, refer to the W3C XML Schema primer.

This article uses a simple, independent XML schema. You can register it with DB2 using the following command:

Listing 13. Registering an XML schema
register xmlschema 'http://mysample.org' from 'C:/XMLFiles/ClientInfo.xsd' as 
user1.mysample complete;

In this example, ClientInfo.xsd is the name of the XML schema file. It is located in the C:/XMLFiles directory. This XML schema will be registered in DB2's internal repository under the SQL schema user1 and the XML schema mysample. The http://mysample.org parameter is just a placeholder in this example. It specifies the uniform resource indicator (URI) referenced by XML instance documents. Many XML documents use namespaces, which are specified using a URI. Finally, the complete clause instructs DB2 to complete the XML schema registration process so that the schema can be used to validate XML data.

Note that the schema registration process does not involve specifying table columns to which the schema will be applied. In other words, schemas are not the equivalent of SQL column constraints. A given schema can validate data for a variety of XML columns in different tables. However, validation is not automatic. DB2 allows any well-formed XML document to be stored in an XML column. If you want to validate your data against a registered schema prior to storage, you need to instruct DB2 to do so.

Step 3: Importing XML data with validation

With an XML schema created and completely registered in DB2, you're now ready to have DB2 validate XML data when inserting or importing it into a table. Revisit the earlier IMPORT scenario with schema validation in mind.

If you've already populated your Clients table, you might find it convenient to delete its contents or drop and recreate the table. This is only necessary if you plan to add the same data to the table as you did previously. Recall that clients were defined with a primary key on the client ID column, so attempting to import duplicate rows will fail.

To validate the XML data while importing it into the Clients table, use the XMLVALIDATE clause of DB2 IMPORT. The statement in Listing 14 instructs DB2 to use your previously registered XML schema (user1.mysample) as the default XDS (XML Data Specifier) for validating the XML files specified in the clients.del file before inserting them into the Clients table.

Listing 14. Importing XML data with validation
import from clients.del of del xml from C:/XMLFILES xmlvalidate using xds default  
user1.mysample insert into user1.clients;

If DB2 determines that an XML document does not conform to the specified schema, the entire row associated with that document is rejected. Listing15 illustrates sample output from an IMPORT operation in which one row of six was rejected because its XML document did not conform to the specified schema.

Listing 15. Importing XML data with validation
SQL3149N  "6 rows were processed from the input file.  "5" rows were 
successfully inserted into the table.  "1" rows were rejected.

Number of rows read		= 6
Number of rows skipped		= 0
Number of rows inserted		= 5
Number of rows updated 		= 0
Number of rows rejected 	= 1
Number or rows committed 	= 6

Note that XMLVALIDATE can also be used with INSERT statements to instruct DB2 to validate XML data before inserting it. The syntax is similar to the IMPORT example just shown in that you specify a registered (and completed) XML schema when invoking the XMLVALIDATE clause. (See "A simple Java example" for more information.)


Summary

DB2 V9 provides significant new capabilities for supporting XML, including a new XML data type and underlying engine-level components that automatically store and process XML data in an efficient manner. To help you get up to speed quickly on these features, this article described how to create a test database and sample tables for storing XML documents. It also reviewed how you can populate your database with XML data. Finally, it summarized DB2's ability to validate XML data against user-supplied XML schemas and provided examples to show you how to get started.

Now that you've learned how to store XML data using DB2's native XML capabilities, you're ready to query that data. You'll see how to do that in subsequent articles, which will introduce you to DB2's XQuery support and to its XML extensions to SQL (sometimes called SQL/XML).

Acknowledgments

Thanks to Rav Ahuja, Matthias Nicola, and Gary Robinson for their comments on this paper.


Download

DescriptionNameSize
Sample codesamples.zip4KB

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=104814
ArticleTitle=Get off to a fast start with DB2 9 pureXML, Part 2: Create and populate a DB2 XML database
publish-date=03182010