When leveraging established patterns of object-oriented programming in constructing XML schemas, I use the three main principles of object-oriented design: encapsulation, inheritance, and polymorphism. To help discuss object-oriented frameworks in this context, I use an example of a fictitious company, Bond Publishing.
Bond Publishing, a supplier of books and magazines, publishes its products in factories in Detroit. When products are shipped, these factories send information electronically about products using XML and then forward this information to the distributing company, Distributor. At the end of each month, Distributor returns information regarding the sale of the products. Figure 1 outlines the flow of the process.
Figure 1. Bond Publishing workflow

Bond Publishing is constantly expanding and adding new products to its portfolio. The company wants its XML to be extensible so that new products can be added neatly without extensive rewriting of the schemas. Bond Publishing sends schemas to the factories and to the distributing company so that they understand the structure of the XML they are receiving. Once this protocol has been agreed upon, each entity in this workflow can validate the data received when it arrives at its destination.
Therefore, you need two XML documents:
- The XML that carries product information (Product.xml)
- The XML that carries sales information (Sales.xml)
Encapsulation refers to the concept of making an object a black box, so that when you use that object you do not know its internal workings. In the world of schemas, this translates into creating types that are predefined and easily accessible anywhere simply by referencing their type.
In the example, you create a Book type, which specifies that a book must have an Author, a Title, and an ISBN. The Book type, therefore, encapsulates all the information relevant to a book. You put this datatype into a schema of datatypes called DataTypes.xsd (see Listing 1). This schema contains all the generic datatypes that will be used by many different schemas. For instance, the schemas for Sales, Product, and Accounting all require the definition of Book.
Listing 1. DataTypes.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:complexType name="Book">
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="ISBN" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Magazine">
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Editor" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
|
Since this is just a datatypes library, there is no root element. Instead, it is just a collection of complex elements, each of which describes the details of each component. In order to be referenceable, the element must be a global element -- i.e. an element declaration that is an immediate child of <schema>. A local element is an element declaration that is nested within another component; for example, the Title element is a local element within the global element Book.
The easiest way to access the datatypes library is to include its location in your schema. In this way, the included schema adopts the namespace of its enclosing schema. In this case, the DataType schema will have the "http://www.Bond.com" namespace. This is known as the Chameleon Effect because the included schema can change its namespace according to its enveloping schema, like a chameleon changes its appearance. As you can see in Listing 2, the Product.xsd schema includes the DataType.xsd schema.
Product.xsd is the schema that defines Product.xml, which contains the information about the books sent from the book factory to Bond Publishing. When you want to use a Book component, you simply enter type="Book" and Product.xsd will use the definition from DataType.xsd. The magazine factory can similarly use the definition of the Magazine datatype.
Listing 2. Product.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.Bond.com"
xmlns="http://www.Bond.com"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:include schemaLocation="DataType.xsd"/>
<xs:element name="Product">
<xs:complexType>
<xs:sequence>
<xs:element name="Books">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
<xs:element name="Book" type="Book"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
Product.xsd and DataTypes.xsd are shared by the factory and Bond Publishing so that they can create and validate the data they exchange. The snippet in Listing 3 is one example of the resulting XML document, Product.xml. Creating components like this has several advantages. You can reuse and easily update the components since the update can be done in one place (for example, the DataType.xsd). Another advantage is that components can be added to this library as the business expands. Thus, using encapsulation allows both flexibility and standardization in your system.
Listing 3. Product.xml
<?xml version="1.0" encoding="UTF-8"?>
<Product xmlns="http://www.Bond.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.Bond.com Product.xsd">
<Books>
<Book>
<Title>Complete Works</Title>
<Author>Shakespeare</Author>
<ISBN>0517053616</ISBN>
</Book>
<Book>
<Title>Being Rich is Cool</Title>
<Author>Donald Trump</Author>
<ISBN>05146553616</ISBN>
</Book>
</Books>
</Product>
|
Software reuse is another flagship of object-oriented design. You can achieve software reuse through inheritance. In programming languages, this capability is provided by subclasses. In XML schemas you can use abstract classes or just tags specifying extensions or derivations of base types.
Abstract classes cannot be used in instance documents; they simply provide a placeholder for their derived types. In this example, you can make Magazine an abstract type that has a Title and an Editor. A women's magazine (WomensMagazine) is derived from an abstract Magazine type, and therefore inherits both Title and Editor. You can achieve this by using the code extension base="Magazine" under your definition of WomensMagazine. In addition, the women's magazine will also have cosmetics advertisements and so you add CosmeticsAdvert to the inherited basetype.
Listing 4. DataTypes.xsd
<xs:complexType name="Magazine" abstract="true"> <xs:sequence> <xs:element name="Title" type="xs:string"/> <xs:element name="Editor" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="WomensMagazine"> <xs:complexContent> <xs:extension base="Magazine"> <xs:sequence maxOccurs="unbounded"> <xs:element name="CosmeticsAdvert" type="xs:string"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> |
As you can see in Listing 5, the instance document contains all the base class characteristics and the added CosmeticsAdvert component.
Listing 5. Product.xml
<?xml version="1.0" encoding="UTF-8"?> <Product xmlns="http://www.Bond.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Bond.com Product.xsd"> <Magazines> <WomensMagazine> <Title>Vogue</Title> <Editor>Anna Wintour</Editor> <CosmeticsAdvert>L'oreal</CosmeticsAdvert> <CosmeticsAdvert>Estee Lauder</CosmeticsAdvert> </WomensMagazine> </Magazines> </Product> |
Another example for derivation is when you use an extension without an abstract type. A BookSales type contains information about a book, and includes the number of books sold and the price at which they were sold. You can extend the Book type to create a BookSales type using the extension base keyword. The following snippet of the DataTypes schema shows how this is done.
Listing 6. DataTypes.xsd
<xs:complexType name="Book"> <xs:sequence> <xs:element name="Author" type="xs:string"/> <xs:element name="Title" type="xs:string"/> <xs:element name="ISBN" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="BookSales"> <xs:complexContent> <xs:extension base="Book"> <xs:sequence> <xs:element name="Number" type="xs:integer"/> <xs:element name="Price" type="xs:double"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> |
This means that whenever the element BookSales is referenced, the XML instance document will include number and price information with the title, the author, and the ISBN (see Listing 7).
Listing 7. Product.xml
<?xml version="1.0" encoding="UTF-8"?>
<Sales xmlns="http://www.Bond.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.Bond.com Sales.xsd">>
<Books>
<BookSales>
<Author>Shakespeare</Author>
<Title>Complete Works</Title>
<ISBN>0517053616</ISBN>
<Number>234</Number>
<Price>14.50</Price>
</BookSales>
</Books>
</Sales>
|
Derivation by restriction is useful in cases where you want to create a subset of the base type. One example is restricting the range of values. In this case, you want to restrict the definition of a Pamphlet, which is similar to a Book in every way except that it has no author. You use the code restriction base="Book" when creating your pamphlet.
Listing 8. DataTypes.xsd
<xs:complexType name= "Pamphlet"> <xs:complexContent> <xs:restriction base="Book"> <xs:sequence> <xs:element name="Title" type="xs:string"/> <xs:element name="ISBN" type="xs:integer"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> |
In programming, you can declare some interfaces and classes final so that they are not subclassed. By making some components final, you can achieve the same goal in schemas. For example, Bond Publishing has a very strict way of describing itself: name, headquarters, CTO. The company does not want anyone to be able to extend this definition. Therefore it declares the component BondDefinition final, as shown below. When the keyword, #all is used, the component can be neither extended nor restricted. In the other two cases, the final blocks either extension or restriction.
<xsd:complexType name="BondDefinition" final="#all"> <xsd:complexType name="BondDefinition" final="extension"> <xsd:complexType name="BondDefinition" final="restriction"> |
Polymorphism means the ability to assign a different meaning or usage to something in different contexts -- specifically, to allow an object to have more than one form. In programming languages such as Java technology, polymorphism means a different reaction to an input. More specifically, it is the ability of subclasses to respond differently to the same messages. Simple inheritance allows two different subclasses to add different methods to those inherited from the superclass, while polymorphism adds the idea of implementing the same method but in different ways and in different subclasses.
Since XML is not a behaviorial language, polymorphism occurs at the attribute level. Let's take the inheritance example above. We said that a Pamphlet inherits the Title and ISBN attributes from Book. However, now you want to specify that the ISBN of a Pamphlet has characteristics that are specific to it and different from those of Book. In other words, you know that the ISBN of a Pamphlet always has a limit of five digits. You want to enforce this rule in your schema so that there is a validation error if you get more than five digits.
You therefore need to build a new type of component called PamphletISBN. You derive this component from the type ISBN and restrict an ISBN to a maximum of five digits. Then, you construct your Pamphlet component as you did in Listing 8. However, this time, you put the type of ISBN as PamphletISBN, thus ensuring that while the name of the element will remain ISBN, the validator will check against the more restricted PamphletISBN. The ability to vary implementation of derived components in subtypes demonstrates one flavor of polymorphism in XML schemas.
Listing 9. DataTypes.xsd
<xs:complexType name="Pamphlet"> <xs:complexContent> <xs:restriction base="Book"> <xs:sequence> <xs:element name="Title" type="xs:string"/> <xs:element name="ISBN" type="PamphletISBN"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> <xs:simpleType name="ISBN"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="PamphletISBN"> <xs:restriction base="ISBN"> <xs:maxLength value="5"/> </xs:restriction> </xs:simpleType> |
Listing 10. Product.xml
<?xml version="1.0" encoding="UTF-8"?> <Product xmlns="http://www.Bond.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Bond.com Product.xsd"> <Books> <Book> <Title>Complete Works</Title> <Author>Shakespeare</Author> <ISBN>0517053616</ISBN> </Book> </Books> <Magazines> <WomensMagazine> <Title>Vogue</Title> <Editor>Anna Wintour</Editor> <CosmeticsAdvert>L'oreal</CosmeticsAdvert> <CosmeticsAdvert>Estee Lauder</CosmeticsAdvert> </WomensMagazine> </Magazines> <Pamphlets> <Pamphlet> <Title>Guide to Yoga</Title> <ISBN>25274</ISBN> </Pamphlet> </Pamphlets> </Product> |
Design patterns for decoupling
Recently, some design patterns have emerged that address decoupling and cohesiveness in XML schemas. We have already discussed how to create reusable components. Now, you'll learn how to vary the granularity of datatypes. This is similar to trying to answer the question "How can I refactor my code and how much refactoring is appropriate for a given situation?" There are currently three design patterns that represent three levels of granularity when creating components:
- Level 1: Russian Doll
- Level 2: Salami Slice
- Level 3: Venetian Blind
Level 1: Russian Doll
Components contain all the relevant components within themselves (like a Russian doll). Observe in the example below that the type Book is composed of the components Title, Author, and ISBN. These components are defined locally within the Book component. The Russian Doll design is the least extensible since three out of four types are locally defined and therefore inaccessible to anything else.
Listing 11. Russian Doll design
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="ISBN" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
|
Level 2: Salami Slice
Components are put together or aggregated by referencing different types. Thus, Book, Title, Author, and ISBN are each global elements. The Book type then references the other three as part of its definition as shown in Listing 12. This design pattern is referred to as Salami Slice because each component or type represents one slice. There is more granularity at this level than at the Russian Doll level.
Listing 12. Salami Slice design
<xs:element name="Title" type="string"/> <xs:element name="Author" type="string"/> <xs:element name="ISBN" type="integer"/> <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element ref="Title"/> <xs:element ref="Author"/> <xs:element ref="ISBN"/> </xs:sequence> </xs:complexType> </xs:element> |
Level 3: Venetian Blind
All elements and components are defined as types. This means that when you define the component called "Title", you need to reference it by the type "Title", even though it is a simple type defined by the fact that it is a string. This illustrates the highest level of factoring components into their most atomic stage.
Since each component is a type, each can be qualified by a namespace if elementFormDefault="qualified". The ability to show or hide namespaces like a Venetian blind lends this design pattern its name.
Listing 13. Venetian Blind design
<xs:simpleType name="Title"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="Name"> <xs:restriction base="xs:string"> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> <xs:complexType name="Book"> <xs:sequence> <xs:element name="Title" type="Title"/> <xs:element name="Author" type="Name"/> </xs:sequence> </xs:complexType> |
Choosing between these three patterns depends on your business requirements. If schema size is an issue, then the Russian Doll is appropriate. However, if extensibility is the primary concern, then the Venetian Blind is the best design pattern to apply.
Simulating packages using namespaces
Object-oriented programming places a great deal of emphasis on packaging classes according to their services. The package structure organizes the code and facilitates modularity and maintenance. You can achieve similar benefits by organizing your XML schemas according to their functions. For instance, the accounting department creates its own schema, Accounting.xsd, whereas the marketing department has Marketing.xsd specify its datatypes and definitions.
To differentiate these schemas, you must do more than just save them as different files -- you need to assign them separate namespaces. So the accounting department organizes its schemas under the namespace http://www.Bond.com/Accounting. Figure 2 shows how schemas can be separated to simulate software packages in this way. You can easily see who is responsible for the schema by looking at the namespace of the .xsd file.
Figure 2. Schemas and software packages

All of the schemas that we have discussed so far lie in the http://www.Bond.com namespace. This means that Bond Publishing has uniquely defined all the components such as Book, Magazine, and BookSales. Distributor, however, has many clients (including Bond Publishing) and needs to extend the BookSales schema described above to include its own information such as distributor fees. Since the characteristics of distributor fees are defined by Distributor, these will be defined in its namespace http://www.Distributor.com, and the schema is called Fees.xsd as shown below.
Both companies want the XML instance document to indicate the namespace from which each component derives its definition. Indicating the namespace assists in parsing and querying the data, and in keeping the definitions of the two companies separate. This approach involves qualifying the elements by their namespace codes.
Listing 14. Fees.xsd
<?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://www.Distributor.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.Distributor.com" elementFormDefault="qualified"> <xs:element name="Fees"> <xs:complexType> <xs:sequence> <xs:element name="Date" type="xs:date"/> <xs:element name="Amount" type="xs:double"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> |
Next, Distributor will import the Fees schema into the Sales schema, effectively extending the latter. Both Distributor and Bond Publishing will now share the new Sales and Fees schema. Just as it's essential to share protocols when designing software systems, sharing the format in which you send the data is imperative to inter-firm communication. Both companies will now have an updated Sales.xsd for validation and querying.
Listing 15. Sales.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.Bond.com"
xmlns:dist="http://www.Distributor.com"
xmlns="http://www.Bond.com"
xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:include schemaLocation="DataType.xsd"/>
<xs:import namespace="http://www.Distributor.com"
schemaLocation="Fees.xsd"/>
<xs:element name="Sales">
<xs:complexType>
<xs:sequence>
<xs:element name="Books">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
<xs:element name="BookSales" type="BookSales"/>
<xs:element ref="dist:Fees"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
The resulting XML that reaches Bond Publishing will qualify all the information that is defined by the Distributor namespace with the prefix dist. Therefore, Sales.xml will have this prefix before any element specific to Distributor.
Listing 16. Sales.xml
<?xml version="1.0" encoding="UTF-8"?> <Sales xmlns="http://www.Bond.com" xmlns:dist="http://www.Distributor.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Bond.com Sales.xsd"> <Books> <BookSales> <Title>Complete Works</Title> <Author>Shakespeare</Author> <ISBN>0517053616</ISBN> <Number>98</Number> <Price>31</Price> </BookSales> <dist:Fees> <dist:Date>2002-10-13</Distributor:Date> <dist:Amount>300</Distributor:Amount> </dist:Fees> </Books> </Sales> |
Extensions and industry standards
This technique is the easiest way to extend schemas. It is useful not only when companies want to keep their definitions separate, but also when companies want to add on to an industry standard for internal use. For example, firms use the schema specified by the industry standard ebXML to exchange business information. Using ebXML, firms can communicate with many counterparties without needing to define and explain their schemas; each firm follows the industry schema as defined by ebXML. If a company wants to use the industry standard but wants to add on proprietary information for internal communication and storage, they can extend the industry standard using the method described above.
Industry standards are becoming the norm as firms realize that standards are necessary for consistency, scalability, and integration. Understanding how to work with industry standards that fall short of one's information needs is going to become an important topic in years to come. Using extensions as described, companies can extend industry schema architectures for their own internal use and strip off the additional information any time they want to use the XML externally for inter-firm communication. In this manner, XML documents can be easily stripped to their industry standard level again and re-used for industry communication.
If your system is going to use XML to transport data information, either internally or externally, then you should seriously consider how to properly design your XML schemas. In this article, you have seen how to create schemas that use inheritance, encapsulation and polymorphism, and even had a glimpse of emerging design patterns in XML schema design. Leveraging these object-oriented frameworks helps you design XML schemas that are modular and extensible, maintain data integrity, and can be easily integrated with other XML protocols.
| Name | Size | Download method |
|---|---|---|
| x-flexschema/codesamples.zip | HTTP |
Information about download methods
- Participate in the discussion forum.
- Read W3C XML Schema Design Patterns: Dealing With Change by Dare Obasanjo.
- Take a look at XML Schemas: Best Practices for an excellent discussion on schema design which is updated frequently.
- Read the book XML Schema by Eric van der Vlist.
- For more on ebXML, take a look at Nicholas Chase's tutorial "Introduction to ebXML" here on developerWorks. (June 2002)
- Find more XML
on the developerWorks XML zone.
- Check out IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

Ayesha Malik is a Senior Consultant of Object Machines, a software engineering firm providing Java technology and XML solutions to businesses. Ayesha has worked extensively on large XML and messaging systems for companies such as Deutsche Bank and American International Group (AIG). Most recently, she has been researching new ways to make schemas extensible and object-oriented. She also serves on the Architecture Working Group of Financial Products Markup Language (FpML), a data-interchange standard set forth by International Swaps and Derivatives Association (ISDA). Ayesha holds a BA with honors from Harvard University and an MS from Columbia University where she studied operations research, applied mathematics, and computer science. She will be speaking on "Best Practices for XML Schemas" at O'Reilly's Bioinformatics Conference in February 2003. You can contact Ayesha at ayesha.malik@objectmachines.com.
Comments (Undergoing maintenance)





