© 2002 International Business Machines Corporation. All rights reserved.
DB2DD: Thanks so much for speaking with us, Susan, on this important topic. XML is such a pervasive technology. Can you explain why data management developers and DBAs need to be aware of what's happening with XML?
Susan: Data management developers need to be familiar with XML because XML is widely used as a notation for data exchange between computers. XML underpins Web Services and Grid Computing, which provide an infrastructure for computer systems and applications communication.
Increasingly, it is becoming desirable for:
- Data held in databases to be annotated and published as XML for data exchange.
- Database processing to be driven by incoming XML documents.
- XML document to be stored and searched.
DB2DD: I've noticed that XML people use the word "document" a lot. What is the specific meaning of that word in this context?
Susan: Basically, chunks of XML are called XML documents. Here is a sample XML document where name, last and first are element names (tag names) and salary and band are attribute names:
<personnelRec> <person salary="999000" band="A"> <name> <last>Austen</last> <first>Jane</first> </name> <email>firstname.lastname@example.org</email> </person> </personnelRec>
Notice how the tags are hierarchical. This a requirement for XML documents.
DB2DD:Can you explain why this relatively simple concept has become so popular?
Susan: Software running on diverse platforms and operating environments can produce data annotated as XML, which makes it easier for software running on the same or different systems to process the XML data without much planning. In addition, there are considerable bodies of general purpose software and tools that manipulate XML making it easier to develop XML-enabled applications. All these characteristics have made XML popular since it first became a candidate recommendation at the World Wide Web Consortium (W3C) in 1998.
DB2DD: But if we consider the relational data case, why use XML? Why not exchange relational results between systems?
Susan: Two features make XML suitable for data exchange between computers that have not been introduced to each other before:
- The way characters are encoded in an XML document is defined within the XML itself through an encoding declaration. The encoding declaration tells the processing software the level of the XML specification that the document conforms to, and what encoding the document is in, ideally in Unicode.
- Additional constraints on the content of an XML document can be specified separately in either a:
- Document Type Definition (DTD)
- XML Schema (also an XML document)
DTDs and XML schemas can be placed locally or across the network, and can be found using the universal notation of URLs.
Software can process a piece of XML in accordance with the XML specification by consulting the encoding declaration in the XML document itself and optionally the appropriate DTDs or schemas, if rigorous checking or analysis is required. The processing software does not need to check with the XML producer.
Contrast this with relational result set processing: Only the database system that produces the result set knows the description of what has been produced, and it may be that the software processing the result set is not permitted to access the database system to make inquiries, for example when the database is owned by another company or organization. Increasingly, we see relational result sets being published as XML, with an accompanying schema, making it possible to process result sets independently of the producing database system. To make this easier, consortia are defining XML schemas for exchanging banking, insurance or medical information that is typically held in relational databases. You can find lists of such schemas at the OASIS consortium.
DB2DD: What are you saying? Are you implying that this is the end of relational databases?
Susan: No, indeed not. Rather, it is the beginning of integration between hierarchical data exchange formats and relational data storage formats. Relational databases have excellent characteristics for high performance processing through data independence after normalization, and a very advanced query language. XML lacks both these properties, although XML query languages will become available through SQL/XML and XQuery. Instead, XML has mechanisms to make it possible to interpret the content of an XML documents independently of its producer. You can think of XML documents being constructed from various hierarchical arrangements or views on relational data, and then being processed on another system with a completely different set of underlying normalized tables. Because your system can be driven by XML documents created by another system, perhaps in another company, you will sometimes want to store and search the XML documents you receive, as well as use the XML documents as a data input source for your system, hence the need for hierarchical and relational data integration.
DB2DD: You mentioned SQL/XML and Xquery. What are they?
Susan: These are topics of much interest to the data management community. SQL/XML consists of extensions being defined in the SQL language to support XML explicitly. One of the first steps in SQL/XML has been to map data types and names between XML and relational systems; for example, how to map between a relational column name and an XML element or attribute name. Mappings from relational tables to XML documents have also been defined. Another step has been to define SQL functions to annotate relational results from an SQL query as XML. The SQL/XML work is being done in the H2.3 subgroup of the SQL group (NCITS H2) of the US standards body. SQL/XML extensions appear in DB2 V8.1.
XQuery is a language being defined at the W3C to query XML directly and to return XML results. This is an exciting activity. It is not often that a major new query language is invented.
DB2DD: You mentioned Web services and Grid computing. What are they? How are they related to XML? Are they relevant to database systems?
Susan: Web services provide a way of defining a software interface (method names, input parameters, output parameters etc), called Web Services Definition Language (WSDL) standardized at the W3C. By examining WSDL, it is possible to deduce the request and reply messages that should flow across the network between clients and servers. For cross-language and platform inter-operability, WSDL messages are annotated as XML. Interoperability between diverse clients and a server that supports a particular WSDL interface is one of the goals of Web services. Further Web Services related standards are being developed, such as WS-Security at OASIS.
DB2® supports Web services making it possible to access DB2 data and stored procedures as Web services, by generating the appropriate WSDL in a straightforward way. It will also be possible for DB2 applications to act as Web services clients. You can read more about Web services support in DB2 in the DB2 Web Services zone on the DB2 Developer Domain.
Grid computing is about defining standard interfaces between systems software components, to do such things as make it possible for one computer to schedule work to run on another computer, or to access data from distributed sources using standardized interfaces. Grid computing interfaces are defined through WSDL and are being standardized through the Global Grid Forum (GGF). Web services based interfaces for files and databases are beginning to be defined at GGF in the Database Access and Integration Services Working Group (DAIS-WG). IBM is actively participating in GGF in the DAIS-WG. A paper presented in GGF5 is available at http://www.globalgridforum.org/Meetings/GGF5/pdf/dais/document2.pdf
Keep an eye on Grid computing - As you can see from the recent IBM splash about eBusiness on demand, IBM is working hard to make Grid computing more and more a business reality.
DB2DD: Obviously a key component for the success of XML is the fact that that there are many XML related standards but it is difficult to keep track of them. Since it has grown so fast, how does the standardization process work?
Susan: It's true that there are a number of consortia etc that are working on standardization because there is indeed so much work to do. Let's break the standards down into four categories as shown in Figure 1; it makes it much easier to think of them this way:
- First there are standards on which XML relies, such as Unicode and the URL notation.
- Next, there are standards for the basic definition and composition of XML. These standards make it possible to exchange XML across heterogeneous systems and applications, and to write basic XML applications. The standards include the XML specification (includes DTDs) itself and the XML schema specification, XML fragments, XInclude, etc.
- Next, there are standards that are the building blocks for XML data and systems. These are standards that help with constructing advanced XML applications and formats. These include XML transformation technologies (such as stylesheets through XSL, XSLT), query languages (such as SQL/XML and XQuery), and XML navigational interfaces (such as DOM).
- Next there are the standards to help build applications that communicate through XML notations across neworks in standard ways, such as Web services and Grid computing.
Figure 1. Some XML-related standards
DB2DD: Speaking of standards, we've heard about XML 1.1. What is it? Is it important?
Susan: Yes, XML 1.1 is important because it provides support for the latest version of Unicode and therefore XML 1.1 also includes support for a character that is used on z/OSTM systems to denote line endings in document (similar to line feed or carriage return characters). Until XML 1.1 is supported in XML parsers, z/OS users may find that some of their XML documents may be rejected as non-well formed or invalid, depending on where the character appears in their documents.
DB2DD: You've talked about how XML affects applications. How is the success of XML affecting data management systems, such as DB2?
Susan: The popularity of the XML notation as an exchange format is putting many requirements on database management systems, including:
- Storing and searching XML as more quantities of XML are being exchanged.
- Querying and updating XML as more quantities of XML are being stored.
- Transforming XML into other XML formats as many more ways of representing data as XML are being defined.
- Transforming XML into relational formats as XML is being used to drive many existing and new business applications.
- Transforming relational data into XML as more companies, systems and applications exchange data formatted as XML.
DB2DD: It sounds as if we should be talking to you about how these requirements may show up as changes to DB2.
Susan: I'd be happy to.
DB2DD: Are there any closing thoughts you'd like to add?
Susan: It is a wonderful time to be working in data management. There are so many new opportunities concerning data. Seize them!
All statements regarding IBM's future direction or intent are subject to change without notice, and represent goals and objectives only.
- W3C (The World Wide Web Consortium) at http://www.w3.org/
- IETF at http://www.ietf.org/
- OASIS Consortium at http://www.oasis-open.org/
- SQL Standards at http://www.ncits.org/tc_home/h2.htm
- XQuery at http://www.w3.org/TR/xquery/
- WSDL (Web Services Definition Language) at http://www.w3.org/TR/wsdl
- GGF (Global Grid Forum) at http://www.gridforum.org/
- DAIS-WG at http://www.gridforum.org/6_DATA/dais.htm
- XML 1.1 at http://www.w3.org/TR/xml11/
- Tutorial using SQL/XML features: Efficient Ways to Publish Your DB2 Data as XML by Seeling Cheung