Skip to main content

Get off to a fast start with DB2 Viper

Cynthia M. Saracco (saracco@us.ibm.com), Senior Software Engineer, IBM
C. M. Saracco works at IBM's Silicon Valley Laboratory in the DB2 XML organization. She works on database management, XML, Web application development, and related topics.

Summary:  The DB2® Viper release, now in beta, features significant new support for storing, managing, and searching XML data. Master these new XML features quickly through several step-by-step articles that explain how to accomplish fundamental tasks. In this article, learn how to create database objects for managing your XML data and how to populate your DB2 database with XML data.

Date:  02 Mar 2006
Level:  Introductory
Activity:  1839 views

You've probably heard the buzz about DB2's new Viper release -- IBM's first database management system to support both tabular (SQL-based) and hierarchical (XML-based) data structures. If you're curious about DB2's new "native" support for XML and want to get off to a fast start, you've come to the right place.

To help you quickly get up to speed on DB2's native XML features, walk through several common tasks, such as:

  1. Creating database objects for managing XML data, including a test database, sample tables, and views
  2. Populating the database with XML data using INSERT and IMPORT statements
  3. Validating your XML data. Develop and register your XML schemas with DB2, and use the XMLVALIDATE option when importing data.

Future articles will cover other topics, such as querying, updating, and deleting DB2 XML data with SQL, querying DB2 XML data with XQuery, and developing Java applications and Web components that access DB2 XML data.

Creating database objects

To get started, let's create a single DB2 Unicode database. In DB2 Viper, only Unicode databases can store both XML documents and more traditional forms of SQL data, such as integer, date/time, varying length character strings, and so on. Later, you'll create objects within this database to manage both XML and other types of data.

Creating a test database

To create a new DB2 Unicode "test" database, open a DB2 command window and issue a statement specifying a Unicode codeset and a supported territory, as shown in Listing 1:


Listing 1. Creating a database for storing XML data
				
create database test using codeset UTF-8 territory us 

Once you create a Unicode database, you don't need to issue any special commands or take any further action to enable DB2 to store XML data in its native hierarchical format. Your DB2 system is ready to go.

Creating sample tables

To store XML data, create tables that contain one or more XML columns. These tables serve as logical containers for collections of documents; behind the scenes, DB2 actually uses a different storage scheme for XML and non-XML data. However, using tables as a logical object for managing all forms of supported data simplifies administration and application development issues, particularly when different forms of data need to be integrated in a single query.

You can define DB2 tables to contain only XML columns, only columns of traditional SQL types, or a combination of both. This article models the latter. The example in Listing 2 connects to the "test" database and creates two tables. The first is an "items" table that tracks information about items for sale and comments that customers have made about them. The second table tracks information about "clients," including contact data. Note that "comments" and "contactinfo" are based on the new DB2 XML data type, while all other columns in the tables are based on traditional SQL data types.


Listing 2. Creating tables for XML data
				
connect to test;

create table items (
  id          int primary key not null, 
  brandname   varchar(30), 
  itemname    varchar(30), 
  sku         int, 
  srp         decimal(7,2), 
  comments    xml
);

create table clients(
  id          int primary key not null, 
  name        varchar(50), 
  status      varchar(10), 
  contactinfo xml
);

If you look closely at these table definition examples, you'll notice that neither specified the internal structure of the XML documents to be stored in the "comments" or "contactinfo" columns. This is an important DB2 feature. Users don't need to pre-define an XML data structure (or, more accurately, an XML schema) in order to store their data. Indeed, DB2 can store any well-formed XML document in a single column, meaning that XML documents of different schemas -- or documents not associated with any registered schema -- can be stored within the same DB2 column. This article discusses this feature further when we explore how to store data in DB2.

Creating views

Optionally, you can create views over tables containing XML data, just as you can create views over tables containing only traditional SQL data types. The example in Listing 3 creates a view of clients with a "Gold" status:


Listing 3. Creating a view that contains XML data
				
create view goldview as 
select id, name, contactinfo 
from clients where status='Gold'; 

A note about indexes

Finally, it's worth noting that you can create specialized indexes on your XML columns to speed searches of your data. Because this is an introductory article and the sample data is small, this article will not be covering that topic. However, in production environments, defining appropriate indexes can be critical to achieving optimal performance. See "Resources," at the end of this article for help on how to learn more about DB2's new indexing technology.


Storing XML Data

With your tables created, you can now populate them with data. You can do so by issuing SQL INSERT statements directly or by invoking the DB2 IMPORT facility, which issues INSERT statements behind the scenes.

Using INSERT statements

With INSERT, you supply DB2 with the raw XML data directly. That's perhaps easiest to do if you've written an application and stored the XML data in a variable. But if you're just getting started with DB2 Viper and don't want to write an application, you can issue your INSERT statements interactively. (I find it convenient to use the DB2 Command Editor, although you can also use the command line processor, if you'd prefer.)

To use the DB2 Command Editor, launch the DB2 Control Center. From the "Tools" pull-down menu at the top, select the Command Editor. A separate window will appear, which should look like Figure 1.


Figure 1. DB2 Command Editor
DB2 Command Editor

Type the following statements into the upper pane:


Listing 4. Inserting XML data interactively
				
connect to test;

insert into clients values (77, 'John Smith', 'Gold', 
   '<addr>111 Main St., Dallas, TX, 00112</addr>') 

Click on the green arrow at left to execute the command.

In this case, the input document was quite simple. If the document was large or complex, it would be impractical to type the XML data into the INSERT statement as shown. In most cases, you'd write an application to insert the data using a host variable or a parameter marker. You'll find a brief Java coding example that accompanies this article. However, since this is an introductory tutorial, we won't be discussing application development topics in detail. Instead, we'll discuss another option for populating DB2 XML columns with data -- using the IMPORT facility.

Using DB2 IMPORT

If you already have your XML data in files, the DB2 IMPORT facility provides a simple way for you to populate your DB2 tables with this data. You don't need to write an application. You just need to create a delimited ASCII file containing the data you want to load into your table. For XML data stored in files, a parameter specifies the appropriate file names.

You can create the delimited ASCII file using the text editor of your choice. (By convention, such files are usually of type .del.) Each line in your file represents a row of data to be imported into your table. If your line contains an XML Data Specifier (XDS), IMPORT will read the data contained in the referenced XML file and import that into DB2. For example, the first line in Figure 2 contains information for Ella Kimpton, including her ID, name, and customer status. Her contact information is included in the Client3227.xml file.


Figure 2. Sample delimited ASCII file for input to DB2 IMPORT
ASCII file

The content of the Client3227.xml file is shown in Figure 3. As you can see, the file contains XML elements for Ella Kimpton's address, phone numbers, fax number, and email.


Figure 3. Sample client XML file
Client file

Perhaps you're curious about importing data if you don't have XML files for all the rows you wish to insert. That's easy to do. Omit the XDS information from your input file. For example, the items.del file in Figure 4 omits the name of an XML file for Item 3641 (the "Dress to Impress" suit). As a result, the XML column for this row will not contain any data.


Figure 4. Sample delimited ASCII file with no XML Data Specifier for one row
Import file

With your XML files and delimited ASCII files available, you're now ready to use DB2 IMPORT. The following statement in Listing 4 will import the contents specified in the clients.del file in the C:/XMLFILES directory into the "clients" table.


Listing 4. Importing data into the "clients" table
				
import from clients.del of del xml from C:/XMLFILES insert into user1.clients;

The clients.del file shown in Figure 2 contains data for six rows, including references to six XML files. Successfully executing an IMPORT command will result in output similar to Figure 5.


Figure 5. Sample output of DB2 IMPORT
Import output

Independent software vendors, such as Exegenix, offer tools that can help you convert Word, PDF, and other document formats into XML for import into DB2. (See Resources for more information on Exegenix.)


Validating your XML data

The INSERT and IMPORT examples just discussed will write any well-formed XML data to your tables. They don't validate that data -- that is, they don't verify that the data conforms to a particular XML schema and therefore adheres to a certain structure. It's possible to direct DB2 to do that, however. Let's explore one way to do so:

Step 1: Creating an XML schema

To validate XML data, you need to define an XML schema that specifies acceptable XML elements, their order and data types, and so on. XML schemas are a W3C industry standard and are written in XML. While it's beyond the scope of this paper to explain the features of XML schemas, various tutorials are available on the Web. (See "Resources.")

There are many ways to develop XML schemas, ranging from using your favorite text editor to manually create your schema to using tools to graphically design or generate a schema. Independent software vendors, such as MDXSYS Limited, provide such XML tools, and IBM also offers XML schema generation support through its Java™-integrated development environment.

For example, with IBM WebSphere® Studio, you can import the Client3227.xml file, shown in Figure 3, into a Web project. Right-click, using your mouse, and elect to Generate -> XML Schema. This will generate a valid XML schema for your particular input file, as shown in Figure 6. You can then modify the file (if necessary) and register it with DB2.


Figure 6. Using WebSphere Studio to generate an XML schema from an XML file
WSAD

Let's assume you need to make your XML schema rather flexible so that you can collect different types of contact information for different customers. For example, some customers may provide you with multiple phone numbers or email addresses, while others may not do so.

The XML schema shown in Figure 7, which was derived from the schema generated by WebSphere Studio, allows for this flexibility. It includes additional specifications about the minimum and maximum number of occurrences ("minOccurs" and "maxOccurs") allowed for a given element. In this case, the customer isn't required to give you any of the contact information you'd like to collect. However, if a customer chooses to give you email information, this schema will enable conforming documents to contain up to five email addresses (that is, five "email" element values).


Figure 7. Sample XML schema for client contact information
WSAD

As you may have noted, XML schemas also contain type information. While the schema shown in Figure 7 simply specifies that all base elements are to be treated as strings, most production XML schemas make use of other data types as well, such as integer, decimal, date, and so on. If you validate XML documents against a given schema as part of your INSERT or IMPORT operation, DB2 will automatically add type annotations to your XML documents.

Step 2: Registering the XML schema

Once you've created an appropriate XML schema, you need to register the schema with DB2. IBM provides multiple ways to do this. You can launch graphical wizards from the DB2 Control Center to guide you through the process, invoke system-supplied stored procedures, or issue DB2 commands directly. Let's use the latter method here, as it may help you more readily understand what DB2 is doing behind the scenes on your behalf.

If your schema is very large, you may need to increase your application heap size before attempting to register it. For example, issue the following statements:


Listing 4. Increasing the application heap size
				
 connect to test;
 update db cfg using applheapsz 10000; 

Next, register your XML schema. If your XML schema does not reference other XML schemas, you may register and complete the process with a single command. Otherwise, you will need to issue individual commands to register your primary XML schema, add the other required schemas, and complete the registration process. When a schema document becomes very large, it's common to divide its content into multiple files to improve maintenance, readability, and reuse. This is akin to breaking up a complex application or component into multiple modules. For details on this topic, refer to the W3C "XML Schema primer." (See "Resources.")

This article uses a simple, independent XML schema. You can register it with DB2 using the following command:


Listing 5. Registering an XML schema
				
register xmlschema 'http://mysample.org' from 'C:/XMLFiles/ClientInfo.xsd' as 
user1.mysample complete;

In this example, ClientInfo.xsd is the name of the XML schema file. It's located in the C:/XMLFiles directory. This XML schema will be registered in DB2's internal repository under the SQL schema "user1" and the XML schema "mysample." The http://mysample.org parameter is just a placeholder in this example. It specifies the uniform resource indicator (URI) referenced by XML instance documents; many XML documents use namespaces, which are specified using a URI. Finally, the "complete" clause will instruct DB2 to complete the XML schema registration process so that the schema may be used for validating XML data.

It's worth noting that the schema registration process doesn't involve specifying table column(s) to which the schema will be applied. In other words, schemas aren't the equivalent of SQL column constraints. A given schema can validate data for a variety of XML columns in different tables. However, validation isn't automatic. DB2 allows any well-formed XML document to be stored in an XML column. If you want to validate your data against a registered schema prior to storage, you need to instruct DB2 to do so.

Step 3: Importing XML data with validation

With an XML schema created and completely registered in DB2, you're now able to have DB2 validate XML data when inserting or importing it into a table. Let's revisit the earlier IMPORT scenario with schema validation in mind.

If you've already populated your "clients" table, you might find it convenient to delete its contents or drop and recreate the table. This is only necessary if you plan to add the same data to the table as you did previously. Recall that "clients" was defined with a primary key on the client ID column, so attempting to import duplicate rows will fail.

To validate the XML data while importing it into the "clients" table, use the XMLVALIDATE clause of DB2 IMPORT. The following statement in Listing 6 instructs DB2 to use your previously registered XML schema (user1.mysample) as the default XDS (XML Data Specifier) for validating the XML files specified in the clients.del file prior to inserting them into the "clients" table.


Listing 6. Importing XML data with validation
				
import from clients.del of del xml from C:/XMLFILES xmlvalidate using xds default  
user1.mysample insert into user1.clients; 

If DB2 determines that an XML document doesn't conform to the specified schema, the entire row associated with that document will be rejected. Figure 8 illustrates sample output from an IMPORT operation in which one row of six was rejected because its XML document didn't conform to the specified schema.


Figure 8. Sample output from DB2 IMPORT, with one row rejected
Import rejected rows

It's worth noting that XMLVALIDATE can also be used with INSERT statements to instruct DB2 to validate XML data before inserting it. The syntax is similar to the IMPORT example just shown, in that you specify a registered (and completed) XML schema when invoking the XMLVALIDATE clause. (See "A simple Java example" for more information on this.)


Summary

DB2 Viper provides significant new capabilities for supporting XML, including a new XML data type and underlying engine-level components that automatically store and process XML data in an efficient manner. To help you get up to speed quickly on these features, this article described how to create a test database and sample tables for storing XML documents. It also reviewed how you can populate your database with XML data. Finally, it summarized DB2's ability to validate XML data against user-supplied XML schemas and provided examples to show you how to get started.

Now that you've learned how to store XML data using DB2's new "native" XML capabilities, you're ready to query that data. You'll see how to do that in subsequent articles, which will introduce you to DB2's new XQuery support as well as its XML extensions to SQL (sometimes called "SQL/XML").

Acknowledgments

Thanks to Rav Ahuja, Matthias Nicola, and Gary Robinson for their comments on this paper.



Download

DescriptionNameSizeDownload method
Sample codesamples.zip4KB FTP | HTTP

Information about download methods


Resources

Learn

Get products and technologies

Discuss

About the author

C. M. Saracco works at IBM's Silicon Valley Laboratory in the DB2 XML organization. She works on database management, XML, Web application development, and related topics.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=104814
ArticleTitle=Get off to a fast start with DB2 Viper
publish-date=03022006
author1-email=saracco@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers