Skip to main content

skip to main content

developerWorks  >  XML  >

XML for Data: Reuse it or lose it, Part 2

Understanding reusable components

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Kevin Williams (kevin@blueoxide.com), CEO, Blue Oxide Technologies, LLC

25 Apr 2003

In his last column, Kevin Williams explained how component-level reuse in XML designs can decrease code complexity and shorten maintenance cycles. In this article, the second in a series of three, he describes the types of components that can be reused in XML designs and provides examples of each in XML and XML Schema.

Now that I've explained the why of component-level reuse in XML document design, I'll delve into the what. Understanding the types of components that you can reuse in XML documents -- and the advantages to doing so -- will help you to identify good reuse opportunities and take advantage of them.

One note: Because the XML world is full of jargon overload, the terms used in this column are intentionally chosen not to map directly to XML Schema constructs, DTD constructs, or anything else in the XML world. When I need to make reference to the XML world, I'll explicitly say so -- for example, "XML Schema complexType" rather than "complexType" -- just to avoid any ambiguity.

Datatypes

The most atomic design construct you can reuse is the datatype. A datatype usually takes one of the built-in datatypes from XML Schema (or another user-defined datatype) and restricts it in some way. For example, I might have a datatype for a part number that looks like this:


Listing 1. Datatype for a part number
		
<xsd:simpleType name='PartNumberType'>
  <xsd:restriction base='xsd:string'>
    <xsd:pattern value='[0-9]{2}-[0-9]{4}' />
  </xsd:restriction>
</xsd:simpleType>

Text-only elements or attributes that have this datatype must be in the format I have specified for the part number -- in other words, two digits, then a hyphen, then four digits.

Another example might be a rate datatype like this:


Listing 2. Rate datatype
	
<xsd:simpleType name='rateType'>
  <xsd:restriction base='xsd:decimal'>
    <xsd:totalDigits value='7' />
    <xsd:fractionDigits value='4' />
  </xsd:restriction>
</xsd:simpleType>

In this case, text-only elements or attributes with the rateType type must have no more than three digits to the left of the decimal and no more than four digits to the right of it.

It's important to note that datatypes can overlap. For example, we might have the following two declarations:


Listing 3. Overlapping datatypes
			
<xsd:simpleType name='cityType'>
  <xsd:restriction base='xsd:string'>
    <xsd:maxLength value='30' />
  </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name='lastNameType'>
  <xsd:restriction base='xsd:string'>
    <xsd:maxLength value='30' />
  </xsd:restriction>
</xsd:simpleType>

An example last name declaration that leverages the last name datatype would look like this:

<xsd:element name='customerLastName' type='lastNameType' />

While at first this approach might seem redundant -- two datatypes with the exact same base and restrictions -- it actually provides a couple of benefits. First, this approach gives the flexibility to modify one type independent of the other. So if 30 characters isn't enough for a last name, but is still sufficient for a city, you can change the last name datatype without affecting the city datatype. Second, there is semantic information implied by the name itself, which a clever programmer can take advantage of to distinguish between the two types when writing code. For example, you might write an XSLT generator that consumes the schema and produces a stylesheet to render instances of that schema to XML. In that case, the generator can distinguish between cities and last names just by the datatype, allowing each to be rendered a different way. You'll see an example of this technique in my next column.



Back to top


Enumerations

Enumerations are lumped together with datatypes in XML Schema, but they are different enough from other datatypes that they deserve their own discussion. An enumeration is a list of allowable values for a particular text-only element or attribute. The classic example is the list of all the states in the United States. The following example defines an enumeration for these (I've omitted most of the states for brevity):


Listing 4. Partial enumeration for list of U.S. states
	
<simpleType name='stateEnum'>
  <restriction base='xsd:string'>
    <enumeration value='AK' />
    <enumeration value='AR' />
    <enumeration value='WV' />
    <enumeration value='WY' /> 
  </restriction>
</simpleType>

Text-only elements or attributes with this type must have one of the values explicitly defined. The benefit here is obvious; if the list of states should change at some point, it will only need to be changed in one place. It is then reflected in every text-only element or attribute with this type.



Back to top


Datapoints

Datapoints are either text-only elements or attributes. They are values in the XML document. Because both text-only elements and attributes have the same kinds of allowable values, I lump them together here into one datapoint bucket.

As an aside, in the world of data (remember the name of this column!), text-only elements and attributes are effectively interchangeable -- the differences are more important when working with text-based XML, such as a marked-up piece of text. You can leverage datapoint reuse across both kinds of document designs.

Reusable datapoint declarations look pretty much like reusable datatype declarations:


Listing 5. Reusable datapoint declaration
		
<xsd:simpleType name='lastName'>
  <xsd:restriction base='xsd:string'>
    <xsd:maxLength value='30' />
  </xsd:restriction>
</xsd:simpleType>

The difference is that in an XML Schema these reusable datatypes are referenced using the ref attribute, rather than the name and type attributes:

<xsd:element ref='lastName' />

Why make the distinction? Well, it's a distinction that gets made all the time in data design efforts for other platforms, such as relational databases. There are also some semantic differences: A reused datapoint asserts that values for the two elements have the same semantic meaning, whereas a reused datatype only asserts that their value spaces have the same semantics. I'll talk more about this in the next column.



Back to top


Structures

Structures are aggregations of other structures and datapoints. I don't use the term element here because in an XML instance an element can also be a datapoint. A reusable structure is declared as an XML Schema complex type:


Listing 6. A reusable structure
<xsd:complexType name='Address'>
  <xsd:sequence>
    <xsd:element name='address1' type='xsd:string' />
    <xsd:element name='address2' type='xsd:string' minOccurs='0' />
    <xsd:element name='city' type='xsd:string' />
    <xsd:element name='state' type='stateType' />
    <xsd:element name='postalCode' type='postalCodeType' />
  </xsd:sequence>
</xsd:complexType>

An element that reuses this structure would look like this:

<xsd:element ref='Address' />

Structure reuse is really where you get the best impact on coding -- if you want addresses to be rendered the same throughout the system, you write one stylesheet fragment to handle them (because you know they are going to be the same everywhere). Then, changes to the appearance of the address only need to be made in one place.



Back to top


Higher-order reuse: operations, services, flows

As you move higher up the reuse food chain, you start to exit the world of XML Schema and enter the world of Web services. It's a bit out of scope for this column, but I'll just mention that higher-order artifacts -- operations, services, and even orchestrated flows -- can be built up out of these reusable building blocks. For example, you might define a getQuote operation that takes a tickerSymbol datapoint (reused) as its input and returns a stockInfo structure (reused) as its output. A service might then implement the getQuote operation (reused) at a particular endpoint and with particular bindings.



Back to top


Conclusion

XML design can be complicated and cumbersome. With the rapid proliferation of XML designs throughout the enterprise, you're in for significant headaches if you don't take the time to see how you can reuse these design artifacts up front. While a good approach to reuse isn't necessarily going to address design requirements for an entire application, most design efforts can probably leverage some existing structures, datapoints, datatypes, and enumerations, and then add just those components that are unique to a particular task. In the next column, I'll show you some approaches to implementing these reusable components in the enterprise.



Resources



About the author

Kevin Williams is the CEO of Blue Oxide Technologies, LLC, a company that designs XML and Web service creation software. Visit their Web site at http://www.blueoxide.com. Kevin can be reached for comment at kevin@blueoxide.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top