You've probably heard the buzz about DB2's V9 -- IBM's first database management system to support both tabular (SQL-based) and hierarchical (XML-based) data structures. If you're curious about DB2's new native support for XML and want to get off to a fast start, you've come to the right place.
To help you quickly get up to speed on DB2's native XML features, this article walks through several common tasks, such as:
- Creating database objects for managing XML data, including a test database, sample tables, and views
- Populating the database with XML data using
- Validating your XML data. Develop and register your XML schemas with
DB2, and use the
XMLVALIDATEoption when importing data.
Future articles will cover other topics, such as querying, updating, and deleting DB2 XML data with SQL, querying DB2 XML data with XQuery, and developing Java applications and Web components that access DB2 XML data.
Creating database objects
To get started, create a single DB2 Unicode database. (With DB2 V9.1, a Unicode database is required for XML. DB2 V9.5 and later no longer require a Unicode database.) Later, you'll create objects within this database to manage both XML and other types of data.
Creating a test database
To create a new DB2 Unicode test database, open a DB2 command window and issue a statement specifying a Unicode codeset and a supported territory, as shown in Listing 1.
Listing 1. Creating a database for storing XML data
create database test using codeset UTF-8 territory us
Once you create a database, you don't need to issue any special commands or take any further action to enable DB2 to store XML data in its native hierarchical format. Your DB2 system is ready to go.
Creating sample tables
To store XML data, you create tables that contain one or more XML columns. These tables serve as logical containers for collections of documents. Behind the scenes, DB2 actually uses a different storage scheme for XML and non-XML data. However, using tables as a logical object for managing all forms of supported data simplifies administration and application development issues, particularly when different forms of data need to be integrated in a single query.
You can define DB2 tables to contain only XML columns, only columns of traditional SQL types, or a combination of both. This article models the latter. The example in Listing 2 connects to the test database and creates two tables. The first is an Items table that tracks information about items for sale and comments that customers have made about them. The second table tracks information about Clients, including contact data. Note that Comments and Contactinfo are based on the new DB2 XML data type, while all other columns in the tables are based on traditional SQL data types.
Listing 2. Creating tables for XML data
connect to test; create table items ( id int primary key not null, brandname varchar(30), itemname varchar(30), sku int, srp decimal(7,2), comments xml ); create table clients( id int primary key not null, name varchar(50), status varchar(10), contactinfo xml );
If you look closely at these table definition examples, you'll notice that neither specified the internal structure of the XML documents to be stored in the Comments or Contactinfo columns. This is an important DB2 feature. Users do not need to pre-define an XML data structure (or, more accurately, an XML schema) in order to store their data. Indeed, DB2 can store any well-formed XML document in a single column, which means that XML documents of different schemas (or documents not associated with any registered schema) can be stored within the same DB2 column. This article discusses this feature more when it discusses how to store data in DB2.
The option to store smaller XML documents inline was introduced in V9.5. If the XML document is small enough to fit into the page size, it can be stored with the other SQL elements. If it is not small enough to fit into a page, it will be stored separately. Along with the inline keyword, you supply the maximum size of the XML to be inlined. Base this value on the page size and on the size of the other relational columns. Listing 3 shows the code snippet to do this:
Listing 3. Creating tables for XML data with the inline option
connect to test; create table items ( id int primary key not null, brandname varchar(30), itemname varchar(30), sku int, srp decimal(7,2), comments xml inline length 10240 );
Optionally, you can create views over tables containing XML data, just as you can create views over tables containing only traditional SQL data types. The example in Listing 4 creates a view of clients with a Gold status:
Listing 4. Creating a view that contains XML data
create view goldview as select id, name, contactinfo from clients where status='Gold';
A note about indexes
Finally, note that you can create specialized indexes on your XML columns to speed searches of your data. Because this is an introductory article and the sample data is small, this article will not be covering that topic. However, in production environments, defining appropriate indexes can be critical to achieving optimal performance. See Resources for help on how to learn more about DB2's new indexing technology.
Storing XML data
With your tables created, you can now populate them with data. Issue SQL
INSERT statements directly or
by invoking the DB2
IMPORT facility, which
INSERT statements behind the scenes.
With DB2 V9.5, the
LOAD facility also supports
Using INSERT statements
INSERT, you supply DB2 with the raw XML
data directly. That's perhaps easiest to do if you've written an
application and stored the XML data in a variable. But if you're just
getting started with DB2 and don't want to write an application, you can
INSERT statements interactively. (I
find it convenient to use the DB2 Command Editor, although you can also
use the command line processor, if you'd prefer.)
To use the DB2 Command Editor, launch the DB2 Control Center. From the Tools pull-down menu at the top, select Command Editor. A separate window appears, as shown in Figure 1.
Figure 1. DB2 Command Editor
Type the following statements into the upper pane:
Listing 5. Inserting XML data interactively
connect to test; insert into clients values (77, 'John Smith', 'Gold', '<addr>111 Main St., Dallas, TX, 00112</addr>')
Click the green arrow at left to execute the command.
In this case, the input document was quite simple. If the document was large or complex, it would be impractical to type the XML data into the INSERT statement as shown. In most cases, you'd write an application to insert the data using a host variable or a parameter marker. You'll find a brief Java coding example that accompanies this article. However, this introductory tutorial does not cover application development topics in detail. Instead, we'll discuss another option for populating DB2 XML columns with data—using the IMPORT facility.
Using DB2 IMPORT
If you already have your XML data in files, the DB2 IMPORT facility provides a simple way for you to populate your DB2 tables with this data. You don't need to write an application. You just need to create a delimited ASCII file containing the data you want to load into your table. For XML data stored in files, a parameter specifies the appropriate file names.
You can create the delimited ASCII file using the text editor of your
choice. (By convention, such files are usually of type .del.) Each line in
your file represents a row of data to be imported into your table. If your
line contains an XML Data Specifier (XDS),
IMPORT will read the data contained in the
referenced XML file and import that into DB2. For example, the first line
in Listing 6 contains information for Ella
Kimpton, including her ID, name, and customer status. Her contact
information is included in the Client3227.xml file.
Listing 6. clients.del file
3227,Ella Kimpton,Gold,<XDS FIL='Client3227.xml' /> 8877,Chris Bontempo,Gold,<XDS FIL='Client8877.xml' /> 9077,Lisa Hansen,Silver,*lt;XDS FIL='Client9077.xml' /> 9177,Rita Gomez,Standard,<XDS FIL='Client9177.xml' /> 5681,Paula Lipenski,Standard,<XDS FIL='Client5681.xml' /> 4309,Tina Wang,Standard,<XDS FIL='Client4309.xml' />
The content of the Client3227.xml file is shown in Listing 7. The file contains XML elements for Ella Kimpton's address, phone numbers, fax number, and email.
Listing 7. Client3227.xml file
<?xml version="1.0"?> <Client xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceSchemaLocation="http://bogus"> <Address> <street>5401 Julio Ave</street> <city>San Jose</city> <state>CA</state> <zip>95116</zip> </Address> <phone> <work>4084630000</work> <home>4081111111</home> <cell>4082222222</cell> </phone> <fax>4087776666</fax> <email>email@example.com</email> </Client>
Perhaps you're curious about importing data if you don't have XML files for all the rows you wish to insert. That's easy to do. Omit the XDS information from your input file. For example, the items.del file in Listing 8 omits the name of an XML file for Item 3641 (the Dress to Impress suit). As a result, the XML column for this row will not contain any data.
3926,NatureTrail,Walking boot, 38112233,64.26,<XDS FIL='Comment3926.xml' /> 4023,NatureTrail,Back pack,552238,34.99,<XDS FIL='Comment4023.xml' /> 3641,Dress to Impress,Syutm7811421,149.99, 4272,Classy,Cocktail dress,981140,156.99,<XDS FIL='Comment4272.xml' />
With your XML files and delimited ASCII files available, you're now ready
to use DB2
IMPORT. The statement in
Listing 9 imports the contents specified in
the clients.del file into the C:/XMLFILES directory into the clients
Listing 9. Importing data into the clients table
import from clients.del of del xml from C:/XMLFILES insert into user1.clients;
The clients.del file shown in Listing 6 contains
data for six rows, including references to six XML files. Successfully
IMPORT command results in
output similar to Listing 10.
Listing 10. Sample output of DB2 IMPORT
import from clients.del of del xml from C:/XMLFiles insert into saracco.clients SQL3109N The utility is beginning to load data from file "clients.del". SQL3110N The utility has completed processing. "6" rows were read from the input file. SQL3221W ...Begin COMMIT WORK. Input Record Count = "6". SQL3222W ...COMMIT of any database changes was successful. SQL3149N "6" rows were processed from the input file. "6" rows were successfully inserted into the table. "0" rows were rejected. Number of rows read = 6 Number of rows skipped = 0 Number of rows inserted = 6 Number of rows updated = 0 Number of rows rejected = 0 Number of rows committed = 6
Independent software vendors offer tools to help you convert Microsoft® Word, Acrobat PDF, and other document formats into XML for import into DB2. See Resources for more information about ISVs.
Validating your XML data
IMPORT examples just discussed can write any
well-formed XML data to your tables. They don't validate that data. In
other words, they don't verify that the data conforms to a particular XML schema
and therefore adheres to a certain structure. It is possible to direct DB2
to do that, however. Here is one approach:
Step 1: Creating an XML schema
To validate XML data, you need to define an XML schema that specifies acceptable XML elements, their order and data types, and so on. XML schemas are a W3C industry standard and are written in XML. While it is beyond the scope of this article to explain the features of XML schemas, various tutorials are available (see Resources).
There are many ways to develop XML schemas, ranging from using your favorite text editor to manually create your schema to using tools to graphically design or generate a schema. Independent software vendors provide such XML tools, and IBM also offers XML schema generation support through Java™-integrated development environments.
For example, with IBM Rational® Application Developer or IBM Rational Software Architect, you can import an xml file into a Web project. The xml file used in this example was taken from the customer table in the sample database of DB2. Right-click the project, and select Generate > XML Schema. This generates a valid XML schema for your particular input file, as shown in Figure 2 (larger image). You can then modify the file (if necessary) and register it with DB2.
Figure 2. Using IBM Rational Software Architect to generate an XML schema from an XML file
Assume you need to make your XML schema rather flexible so that you can collect different types of contact information for different customers. For example, some customers might provide you with multiple phone numbers or email addresses, while others might not. The XML schema shown in Listing 11, which was derived from the schema that IBM Rational Software Architect generated, allows for this flexibility. It includes additional specifications about the minimum and maximum number of occurrences (minOccurs and maxOccurs) allowed for a given element. In this case, the customer isn't required to give you any of the contact information you'd like to collect. However, if a customer chooses to give you email information, this schema enables conforming documents to contain up to five email addresses (that is, five email element values).
Listing 11. Sample XML schema for client contact information
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmins:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="address"> <xsd:complexType> <xsd:sequence> <xsd:element ref="street" minOccurs="0"/> <xsd:element ref="apt" minOccurs="0"/> <xsd:element ref="city" minOccurs="0"/> <xsd:element ref="state" minOccurs="0"/> <xsd:element ref="zip" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Client"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Address" minOccurs="0" maxOccurs="5"/> <xsd:element ref="phone" minOccurs="0" maxOccurs="1"/> <xsd:element ref="fax" minOccurs="0" maxOccurs="1"/> <xsd:element ref="email" minOccurs="0" maxOccurs="5"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="cell type="xsd:string"/> <xsd:element name="city type="xsd:string"/> <xsd:element name="email" type="xsd:string"/> <xsd:element name="fax" type="xsd:string"/> <xsd:element name="home" type="xsd:string"/> <xsd:element name="phone"> <xsd:complexType> <xsd:sequence> <xsd:element ref="work" minOccurs="0"/> <xsd:element ref="home" minOccurs="0"/> <xsd:element ref="cell" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> . . . </xsd:schema>
XML schemas also contain type information. The schema shown in Listing 11 specifies
that all base elements are treated as strings. However, most production XML
schemas make use of other data types as well, such as integer, decimal,
date, and so on. If you validate XML documents against a given schema as
part of your
IMPORT operation, DB2 automatically adds
type annotations to your XML documents.
Step 2: Registering the XML schema
Once you have created an appropriate XML schema, you need to register the schema with DB2. IBM provides multiple ways to do this. You can launch graphical wizards from the DB2 Control Center to guide you through the process, invoke system-supplied stored procedures, or issue DB2 commands directly. For this example, use the latter method, because it might help you more readily understand what DB2 is doing behind the scenes on your behalf.
If your schema is very large, you may need to increase your application heap size before attempting to register it. For example, issue the following statements:
Listing 12. Increasing the application heap size
connect to test; update db cfg using applheapsz 10000;
Next, register your XML schema. If your XML schema does not reference other XML schemas, you can register and complete the process with a single command. Otherwise, you need to issue individual commands to register your primary XML schema, add the other required schemas, and complete the registration process. When a schema document becomes very large, it's common to divide its content into multiple files to improve maintenance, readability, and reuse. This is akin to breaking up a complex application or component into multiple modules. For details on this topic, refer to the W3C XML Schema primer.
This article uses a simple, independent XML schema. You can register it with DB2 using the following command:
Listing 13. Registering an XML schema
register xmlschema 'http://mysample.org' from 'C:/XMLFiles/ClientInfo.xsd' as user1.mysample complete;
In this example, ClientInfo.xsd is the name of the XML schema file. It is
located in the
C:/XMLFiles directory. This XML schema will be registered
in DB2's internal repository under the SQL schema
user1 and the XML
mysample. The http://mysample.org parameter is just a placeholder
in this example. It specifies the uniform resource indicator (URI)
referenced by XML instance documents. Many XML documents use namespaces,
which are specified using a URI. Finally, the
complete clause instructs DB2 to complete the XML schema registration process so that the
schema can be used to validate XML data.
Note that the schema registration process does not involve specifying table columns to which the schema will be applied. In other words, schemas are not the equivalent of SQL column constraints. A given schema can validate data for a variety of XML columns in different tables. However, validation is not automatic. DB2 allows any well-formed XML document to be stored in an XML column. If you want to validate your data against a registered schema prior to storage, you need to instruct DB2 to do so.
Step 3: Importing XML data with validation
With an XML schema created and completely registered in DB2, you're now ready to have DB2 validate XML data when inserting or importing it into a table. Revisit the earlier IMPORT scenario with schema validation in mind.
If you've already populated your Clients table, you might find it convenient to delete its contents or drop and recreate the table. This is only necessary if you plan to add the same data to the table as you did previously. Recall that clients were defined with a primary key on the client ID column, so attempting to import duplicate rows will fail.
To validate the XML data while importing it into the Clients table, use
XMLVALIDATE clause of DB2
IMPORT. The statement in
Listing 14 instructs DB2 to use your previously
registered XML schema (user1.mysample) as the default XDS (XML Data
Specifier) for validating the XML files specified in the clients.del file
before inserting them into the Clients table.
Listing 14. Importing XML data with validation
import from clients.del of del xml from C:/XMLFILES xmlvalidate using xds default user1.mysample insert into user1.clients;
If DB2 determines that an XML document does not conform to the specified
schema, the entire row associated with that document is rejected.
Listing15 illustrates sample output from an
IMPORT operation in which one row of six was
rejected because its XML document did not conform to the specified schema.
Listing 15. Importing XML data with validation
SQL3149N "6 rows were processed from the input file. "5" rows were successfully inserted into the table. "1" rows were rejected. Number of rows read = 6 Number of rows skipped = 0 Number of rows inserted = 5 Number of rows updated = 0 Number of rows rejected = 1 Number or rows committed = 6
XMLVALIDATE can also be
INSERT statements to instruct DB2 to
validate XML data before inserting it. The syntax is similar to the
IMPORT example just shown in that you specify
a registered (and completed) XML schema when invoking the
XMLVALIDATE clause. (See
"A simple Java example" for more information.)
DB2 V9 provides significant new capabilities for supporting XML, including a new XML data type and underlying engine-level components that automatically store and process XML data in an efficient manner. To help you get up to speed quickly on these features, this article described how to create a test database and sample tables for storing XML documents. It also reviewed how you can populate your database with XML data. Finally, it summarized DB2's ability to validate XML data against user-supplied XML schemas and provided examples to show you how to get started.
Now that you've learned how to store XML data using DB2's native XML capabilities, you're ready to query that data. You'll see how to do that in subsequent articles, which will introduce you to DB2's XQuery support and to its XML extensions to SQL (sometimes called SQL/XML).
Thanks to Rav Ahuja, Matthias Nicola, and Gary Robinson for their comments on this paper.
- Explore all the pieces of thisseries.
- IBM DB2 e-kit for Database Professionals: Grow your skills, and quickly and easily become certified for DB2 for Linux, UNIX, and Windows.
- XML Database - DB2 pureXML Learn more about DB2's XML support.
- "What's new in DB2 Viper: XML to the Core" (developerWorks, February 2006): Get an overview of the new XML technologies.
- Exegenix offers tools that can help you convert Word, PDF, and other document formats into XML for import into DB2.
- XML schemas:
- Various tutorials are available on the Web that explain the features of XML schemas.
- W3C XML Schema primer provides an easily readable description of the XML Schema facilities and is oriented towards quickly understanding how to create schemas using the XML Schema language.
- "Firing up the Hybrid Engine" (DB2 Magazine, Quarter 3, 2005): Read more about IBM's hybrid database management system.
- System RX: One Part Relational, One Part XML (SIGMOD conference, 2005): Learn about the architecture and design aspects of building a hybrid relational and XML DBMS.
- "Native XML Support in DB2 Universal Database" (VLDB conference, 2005): Read more about DB2 XML support.
- "Managing XML for Maximum Return" (IBM, November 2005): This white paper explores the business benefits of DB2's XML support.
- "Use DB2 native XML with PHP" (developerWorks, October 2005): Compare and contrast DB2's new XML support with traditional relational database technology.
- Stay current with developerWorks wiki on periodic pureXML topics given by the experts .
- Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for the community.
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
- Participate in the discussion forum.
- Check out the developerWorks blogs and get involved in the developerWorks community.