Introduction
This tutorial examines the validation of XML documents using either Document Type Definitions (DTDs) or XML Schema. It is aimed at developers who need to control the types and content of the data in their XML documents, and assumes that you are familiar with the basic concepts of XML. (You can get a basic grounding in XML itself through the Introduction to XML tutorial.) It also assumes a basic familiarity with XML Namespaces. (You can pick up the basics of namespaces in the Understanding DOM tutorial.)
This tutorial demonstrates validation using the Java language from the command line, but the principles and concepts of validation are the same for any programming environment, so experience with Java technology is not required to gain a thorough understanding. DTDs and XML Schema, in particular, are language- and platform-independent.
In the creation of a database, using a data model in conjunction with integrity constraints can ensure that the structure and content of the data meet the requirements. But how do you enforce that kind of control using XML, when your data is just text in hand-editable files? Fortunately, validating files and documents can ensure that data fits constraints. In this tutorial, you'll learn what validation is, and you'll learn how to check a document against a Document Type Definition (DTD) or an XML Schema document.
DTDs were originally defined in the XML 1.0 Recommendation and are a carryover from the original Standard Generalized Markup Language (SGML), the precursor to HTML. Their syntax is slightly different from XML, which is one drawback to using them. They also have limitations in how they can be used, which led developers to seek an alternative in the form of XML schemas. However, DTDs are still in use in a significant number of environments, so an understanding of them is important.
The primary alternative to DTDs is the XML Schema Recommendation, maintained by the World Wide Web Consortium (W3C). (Throughout the course of the tutorial, "XML Schema" should be considered synonymous with "W3C XML Schema.") Schemas, which are also XML documents, provide a more familiar and more powerful environment in which to create the constraints on the data that can exist in an XML document.
By the end of this tutorial you will learn how to create both a DTD and an XML Schema document. You'll also learn the concepts of using them to validate an XML document.
The examples in this tutorial, should you decide to try them out, require that you install the following tools and make sure they are working correctly. Running the examples is not a requirement for understanding the content provided.
- A text editor: XML files, DTDs, and XML Schema documents are simply text. To create and read them, a text editor is all you need.
- You can manipulate and validate XML in any language where a validating parser is available. The bulk of the tutorial deals with the creation of documents, but you will also see how to build an application that uses a validating parser. XML support has been built into the latest version of Java (available at http://java.sun.com/j2se/1.4.2/download.html), so you won't need to install any separate classes. (If you're using an earlier version, such as Java 1.3.x, you'll also need an XML parser such as the Apache project's Xerces-Java (available at http://xml.apache.org/xerces2-j/index.html), or Sun's Java API for XML Parsing (JAXP), part of the Java Web Services Developer Pack (available at http://java.sun.com/webservices/downloads/webservicespack.html).
If you have a different set of tools installed, you can use them instead. Just check the documentation for instructions about turning on validation. You can download C++ and Perl implementations of Xerces from the Apache Project at http://xml.apache.org.

