You've probably heard the buzz about DB2's new Viper release -- IBM's first database management system to support both tabular (SQL-based) and hierarchical (XML-based) data structures. If you're curious about DB2's new "native" support for XML and want to get off to a fast start, you've come to the right place.
To help you quickly get up to speed on DB2's native XML features, walk through several common tasks, such as:
- Creating database objects for managing XML data, including a test database, sample tables, and views
-
Populating the database with XML data using
INSERTandIMPORTstatements -
Validating your XML data. Develop and register your XML schemas with DB2, and use the
XMLVALIDATEoption when importing data.
Future articles will cover other topics, such as querying, updating, and deleting DB2 XML data with SQL, querying DB2 XML data with XQuery, and developing Java applications and Web components that access DB2 XML data.
To get started, let's create a single DB2 Unicode database. In DB2 Viper, only Unicode databases can store both XML documents and more traditional forms of SQL data, such as integer, date/time, varying length character strings, and so on. Later, you'll create objects within this database to manage both XML and other types of data.
To create a new DB2 Unicode "test" database, open a DB2 command window and issue a statement specifying a Unicode codeset and a supported territory, as shown in Listing 1:
Listing 1. Creating a database for storing XML data
create database test using codeset UTF-8 territory us |
Once you create a Unicode database, you don't need to issue any special commands or take any further action to enable DB2 to store XML data in its native hierarchical format. Your DB2 system is ready to go.
To store XML data, create tables that contain one or more XML columns. These tables serve as logical containers for collections of documents; behind the scenes, DB2 actually uses a different storage scheme for XML and non-XML data. However, using tables as a logical object for managing all forms of supported data simplifies administration and application development issues, particularly when different forms of data need to be integrated in a single query.
You can define DB2 tables to contain only XML columns, only columns of traditional SQL types, or a combination of both. This article models the latter. The example in Listing 2 connects to the "test" database and creates two tables. The first is an "items" table that tracks information about items for sale and comments that customers have made about them. The second table tracks information about "clients," including contact data. Note that "comments" and "contactinfo" are based on the new DB2 XML data type, while all other columns in the tables are based on traditional SQL data types.
Listing 2. Creating tables for XML data
connect to test; create table items ( id int primary key not null, brandname varchar(30), itemname varchar(30), sku int, srp decimal(7,2), comments xml ); create table clients( id int primary key not null, name varchar(50), status varchar(10), contactinfo xml ); |
If you look closely at these table definition examples, you'll notice that neither specified the internal structure of the XML documents to be stored in the "comments" or "contactinfo" columns. This is an important DB2 feature. Users don't need to pre-define an XML data structure (or, more accurately, an XML schema) in order to store their data. Indeed, DB2 can store any well-formed XML document in a single column, meaning that XML documents of different schemas -- or documents not associated with any registered schema -- can be stored within the same DB2 column. This article discusses this feature further when we explore how to store data in DB2.
Optionally, you can create views over tables containing XML data, just as you can create views over tables containing only traditional SQL data types. The example in Listing 3 creates a view of clients with a "Gold" status:
Listing 3. Creating a view that contains XML data
create view goldview as select id, name, contactinfo from clients where status='Gold'; |
Finally, it's worth noting that you can create specialized indexes on your XML columns to speed searches of your data. Because this is an introductory article and the sample data is small, this article will not be covering that topic. However, in production environments, defining appropriate indexes can be critical to achieving optimal performance. See "Resources," at the end of this article for help on how to learn more about DB2's new indexing technology.
With your tables created, you can now populate them with data. You can do so by issuing SQL INSERT statements directly or by invoking the DB2 IMPORT facility, which issues INSERT statements behind the scenes.
With INSERT, you supply DB2 with the raw XML data directly. That's perhaps easiest to do if you've written an application and stored the XML data in a variable. But if you're just getting started with DB2 Viper and don't want to write an application, you can issue your INSERT statements interactively. (I find it convenient to use the DB2 Command Editor, although you can also use the command line processor, if you'd prefer.)
To use the DB2 Command Editor, launch the DB2 Control Center. From the "Tools" pull-down menu at the top, select the Command Editor. A separate window will appear, which should look like Figure 1.
Figure 1. DB2 Command Editor
Type the following statements into the upper pane:
Listing 4. Inserting XML data interactively
connect to test; insert into clients values (77, 'John Smith', 'Gold', '<addr>111 Main St., Dallas, TX, 00112</addr>') |
Click on the green arrow at left to execute the command.
In this case, the input document was quite simple. If the document was large or complex, it would be impractical to type the XML data into the INSERT statement as shown. In most cases, you'd write an application to insert the data using a host variable or a parameter marker. You'll find a brief Java coding example that accompanies this article. However, since this is an introductory tutorial, we won't be discussing application development topics in detail. Instead, we'll discuss another option for populating DB2 XML columns with data -- using the IMPORT facility.
If you already have your XML data in files, the DB2 IMPORT facility provides a simple way for you to populate your DB2 tables with this data. You don't need to write an application. You just need to create a delimited ASCII file containing the data you want to load into your table. For XML data stored in files, a parameter specifies the appropriate file names.
You can create the delimited ASCII file using the text editor of your choice. (By convention, such files are usually of type .del.) Each line in your file represents a row of data to be imported into your table. If your line contains an XML Data Specifier (XDS), IMPORT will read the data contained in the referenced XML file and import that into DB2. For example, the first line in Figure 2 contains information for Ella Kimpton, including her ID, name, and customer status. Her contact information is included in the Client3227.xml file.
Figure 2. Sample delimited ASCII file for input to DB2 IMPORT
The content of the Client3227.xml file is shown in Figure 3. As you can see, the file contains XML elements for Ella Kimpton's address, phone numbers, fax number, and email.
Figure 3. Sample client XML file
Perhaps you're curious about importing data if you don't have XML files for all the rows you wish to insert. That's easy to do. Omit the XDS information from your input file. For example, the items.del file in Figure 4 omits the name of an XML file for Item 3641 (the "Dress to Impress" suit). As a result, the XML column for this row will not contain any data.
Figure 4. Sample delimited ASCII file with no XML Data Specifier for one row
With your XML files and delimited ASCII files available, you're now ready to use DB2
IMPORT. The following statement in Listing 4 will import the contents specified in the clients.del file in the C:/XMLFILES directory into the "clients" table.
Listing 4. Importing data into the "clients" table
import from clients.del of del xml from C:/XMLFILES insert into user1.clients; |
The clients.del file shown in Figure 2 contains data for six rows, including references to six XML files. Successfully executing an IMPORT command will result in output similar to Figure 5.
Figure 5. Sample output of DB2 IMPORT
Independent software vendors, such as Exegenix, offer tools that can help you convert Word, PDF, and other document formats into XML for import into DB2. (See Resources for more information on Exegenix.)
The INSERT and IMPORT examples just discussed will write any well-formed XML data to your tables. They don't validate that data -- that is, they don't verify that the data conforms to a particular XML schema and therefore adheres to a certain structure. It's possible to direct DB2 to do that, however. Let's explore one way to do so:
Step 1: Creating an XML schema
To validate XML data, you need to define an XML schema that specifies acceptable XML elements, their order and data types, and so on. XML schemas are a W3C industry standard and are written in XML. While it's beyond the scope of this paper to explain the features of XML schemas, various tutorials are available on the Web. (See "Resources.")
There are many ways to develop XML schemas, ranging from using your favorite text editor to manually create your schema to using tools to graphically design or generate a schema. Independent software vendors, such as MDXSYS Limited, provide such XML tools, and IBM also offers XML schema generation support through its Java™-integrated development environment.
For example, with IBM WebSphere® Studio, you can import the Client3227.xml file, shown in Figure 3, into a Web project. Right-click, using your mouse, and elect to Generate -> XML Schema. This will generate a valid XML schema for your particular input file, as shown in Figure 6. You can then modify the file (if necessary) and register it with DB2.
Figure 6. Using WebSphere Studio to generate an XML schema from an XML file
Let's assume you need to make your XML schema rather flexible so that you can collect different types of contact information for different customers. For example, some customers may provide you with multiple phone numbers or email addresses, while others may not do so.
The XML schema shown in Figure 7, which was derived from the schema generated by WebSphere Studio, allows for this flexibility. It includes additional specifications about the minimum and maximum number of occurrences ("minOccurs" and "maxOccurs") allowed for a given element. In this case, the customer isn't required to give you any of the contact information you'd like to collect. However, if a customer chooses to give you email information, this schema will enable conforming documents to contain up to five email addresses (that is, five "email" element values).
Figure 7. Sample XML schema for client contact information
As you may have noted, XML schemas also contain type information. While the schema shown in Figure 7 simply specifies that all base elements are to be treated as strings, most production XML schemas make use of other data types as well, such as integer, decimal, date, and so on. If you validate XML documents against a given schema as part of your INSERT or IMPORT operation, DB2 will automatically add type annotations to your XML documents.
Step 2: Registering the XML schema
Once you've created an appropriate XML schema, you need to register the schema with DB2. IBM provides multiple ways to do this. You can launch graphical wizards from the DB2 Control Center to guide you through the process, invoke system-supplied stored procedures, or issue DB2 commands directly. Let's use the latter method here, as it may help you more readily understand what DB2 is doing behind the scenes on your behalf.
If your schema is very large, you may need to increase your application heap size before attempting to register it. For example, issue the following statements:
Listing 4. Increasing the application heap size
connect to test; update db cfg using applheapsz 10000; |
Next, register your XML schema. If your XML schema does not reference other XML schemas, you may register and complete the process with a single command. Otherwise, you will need to issue individual commands to register your primary XML schema, add the other required schemas, and complete the registration process. When a schema document becomes very large, it's common to divide its content into multiple files to improve maintenance, readability, and reuse. This is akin to breaking up a complex application or component into multiple modules. For details on this topic, refer to the W3C "XML Schema primer." (See "Resources.")
This article uses a simple, independent XML schema. You can register it with DB2 using the following command:
Listing 5. Registering an XML schema
register xmlschema 'http://mysample.org' from 'C:/XMLFiles/ClientInfo.xsd' as user1.mysample complete; |
In this example, ClientInfo.xsd is the name of the XML schema file. It's located in the C:/XMLFiles directory. This XML schema will be registered in DB2's internal repository under the SQL schema "user1" and the XML schema "mysample." The http://mysample.org parameter is just a placeholder in this example. It specifies the uniform resource indicator (URI) referenced by XML instance documents; many XML documents use namespaces, which are specified using a URI. Finally, the "complete" clause will instruct DB2 to complete the XML schema registration process so that the schema may be used for validating XML data.
It's worth noting that the schema registration process doesn't involve specifying table column(s) to which the schema will be applied. In other words, schemas aren't the equivalent of SQL column constraints. A given schema can validate data for a variety of XML columns in different tables. However, validation isn't automatic. DB2 allows any well-formed XML document to be stored in an XML column. If you want to validate your data against a registered schema prior to storage, you need to instruct DB2 to do so.
Step 3: Importing XML data with validation
With an XML schema created and completely registered in DB2, you're now able to have DB2 validate XML data when inserting or importing it into a table. Let's revisit the earlier IMPORT scenario with schema validation in mind.
If you've already populated your "clients" table, you might find it convenient to delete its contents or drop and recreate the table. This is only necessary if you plan to add the same data to the table as you did previously. Recall that "clients" was defined with a primary key on the client ID column, so attempting to import duplicate rows will fail.
To validate the XML data while importing it into the "clients" table, use the XMLVALIDATE clause of DB2 IMPORT. The following statement in Listing 6 instructs DB2 to use your previously registered XML schema (user1.mysample) as the default XDS (XML Data Specifier) for validating the XML files specified in the clients.del file prior to inserting them into the "clients" table.
Listing 6. Importing XML data with validation
import from clients.del of del xml from C:/XMLFILES xmlvalidate using xds default user1.mysample insert into user1.clients; |
If DB2 determines that an XML document doesn't conform to the specified schema, the entire row associated with that document will be rejected. Figure 8 illustrates sample output from an IMPORT operation in which one row of six was rejected because its XML document didn't conform to the specified schema.
Figure 8. Sample output from DB2 IMPORT, with one row rejected
It's worth noting that XMLVALIDATE can also be used with INSERT statements to instruct DB2 to validate XML data before inserting it. The syntax is similar to the IMPORT example just shown, in that you specify a registered (and completed) XML schema when invoking the XMLVALIDATE clause. (See "A simple Java example" for more information on this.)
DB2 Viper provides significant new capabilities for supporting XML, including a new XML data type and underlying engine-level components that automatically store and process XML data in an efficient manner. To help you get up to speed quickly on these features, this article described how to create a test database and sample tables for storing XML documents. It also reviewed how you can populate your database with XML data. Finally, it summarized DB2's ability to validate XML data against user-supplied XML schemas and provided examples to show you how to get started.
Now that you've learned how to store XML data using DB2's new "native" XML capabilities, you're ready to query that data. You'll see how to do that in subsequent articles, which will introduce you to DB2's new XQuery support as well as its XML extensions to SQL (sometimes called "SQL/XML").
Thanks to Rav Ahuja, Matthias Nicola, and Gary Robinson for their comments on this paper.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code | samples.zip | 4KB |
FTP
|
Information about download methods
Learn
-
DB2 Viper Web site:
Learn more about DB2's XML support.
-
"What's new in DB2 Viper: XML to the Core" (developerWorks, February 2006): Get an overview of the new XML technologies now in beta in DB2.
-
Exegenix offers tools that can help you convert Word, PDF, and other document formats into XML for import into DB2.
-
XML schemas:
- Various tutorials are available on the Web that explain the features of XML schemas.
-
W3C XML Schema primer provides an easily readable description of the XML Schema facilities and is oriented towards quickly understanding how to create schemas using the XML Schema language.
-
"Firing up the Hybrid Engine" (DB2 Magazine, Quarter 3, 2005): Read more about IBM's hybrid database management system.
-
System RX: One Part Relational, One Part XML (SIGMOD conference, 2005): Learn about the architecture and design aspects of building a hybrid relational and XML DBMS.
-
"Native XML Support in DB2 Universal Database" (VLDB conference, 2005): Read more about DB2 XML support.
-
"Managing XML for Maximum Return" (IBM, November 2005): This white paper explores the business benefits of DB2's XML support.
-
"Use DB2 native XML with PHP" (developerWorks, October 2005): Compare and contrast DB2's new XML support with traditional relational database technology.
-
Stay current with developerWorks
technical events and webcasts.
-
developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.
Get products and technologies
-
Participate in the DB2 Viper test drive. Download and try it out today.
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
-
Participate in the DB2 Viper test drive. Download and try it out today.
Discuss
-
Participate in developerWorks blogs and get involved in the developerWorks community.
Comments (Undergoing maintenance)





