Meet CAM: A new XML validation technology

Take semantic and structural validation to the next level

XML documents are frequently validated against either a DTD (less likely) or an XML schema (more likely). Recently, a new technology called Content Assembly Mechanism (CAM) has emerged. It is endorsed by the Organization for the Advancement of Structured Information Standards (OASIS). CAM represents a step up from XML schema because it provides even more flexibility in defining both the semantics of an XML document and the business rules associated with the actual data content. Take a broad overview of CAM, including its benefits over the alternatives, in this article.

Share:

Brian M. Carey, Information Systems Consultant, Carey Development Corporation

Photo of Brian CareyBrian Carey is an information systems consultant specializing in Java, Java Enterprise, PHP, Ajax, and related technologies. You can follow Brian Carey on Twitter at http://twitter.com/brianmcarey.



22 September 2009

Also available in Russian Vietnamese

Precursors to CAM

As stated in the summary, CAM represents the latest technology in validating XML documents. This, of course, implies that previous technologies validated XML documents.

Frequently used acronyms

  • CCTS: Core Components Technical Specification
  • DTD: Document Type Definition
  • IT: information technology
  • OWL: Web Ontology Language
  • XPath: XML Path Language
  • XML: Extensible Markup Language
  • XSD: XML Schema Definition

The oldest is known by the acronym DTD, which stands for Document Type Definition. As with most entry points in emerging technologies, it was limited. It facilitated validation of XML document structure, but not much in the way of semantics. It also used somewhat awkward syntax to define the valid XML structure.

DTD was later replaced by XSD, which stands for XML Schema Definition. This was a much more powerful means of validating XML documents. First, the syntax was similar to an XML document itself. Next, it offered improved support for semantics. For the last several years, bleeding-edge technologists have opted to validate their XML documents with XSD as opposed to DTD.

Enter CAM

The history of technology has shown repeatedly that there is always a better way to build the proverbial mousetrap. XML validation is no exception to that principle. CAM represents the latest and most sophisticated entry in the family of technologies used to validate XML documents.

CAM is offered by the standards body known as OASIS. This organization has provided a number of specifications, most notably regarding Web services and electronic business Extensible Markup Language (ebXML).

CAM is more powerful and flexible than its predecessors. Unlike XSD, it doesn't tightly couple the data structure to the business rules. It also provides for context-driven validation, something which is lacking in both XSD and DTD.

For most people who are familiar with XML, CAM is also much easier to learn than XSD or DTD. This is because, in defining structure, the format of a CAM document is strikingly similar to an XML instance. And, in defining business rules, CAM uses the well-known (XPath.

The structure of a CAM template

In Listing 1, you can see that the structure of a CAM template is not complicated.

Listing 1. The structure of a CAM template
<as:CAM xmlns:as="http://www.oasis-open.org/committees/cam" 
CAMlevel="1" 
version="1.0"> 
<as:Header /> 
<as:AssemblyStructure /> 
<as:BusinessUseContext /> 
</as:CAM>

The root element, CAM, defines the namespace used throughout the template itself as well as the level and version of CAM.

The Header element provides specific information about the validation document. Many of the child elements (not shown) are self-explanatory: Description, Owner, Version, and DateTime.

The AssemblyStructure element defines the actual structure of the XML document instance. This is where CAM and XSD part company. The AssemblyStructure element provides validation against the structure of the XML document but does not contain any information about semantics.

And, finally, the BusinessUseContext element provides the business rules that were lacking in the previous element. How are these business rules enforced? That is an excellent question, but first you should be familiar with how CAM defines structure.

How CAM defines structure

Listing 2 shows how CAM defines the structure for a simple purchase order.

Listing 2. A CAM structure for a simple purchase order
<as:AssemblyStructure> 
 <as:Structure ID="myPO" taxonomy="XML"> 
  <PurchaseOrder> 
  <ShippingAddress> 
   <Name>%string%</Name> 
   <Street>%string%</Street> 
   <City>%string%</City> 
   <State>%string%</State> 
   <Zip>%string%</Zip> 
  </ShippingAddress> 
  <ShipDate>%DD-MM-YYYY%</ShipDate> 
  <comment>%string%</comment> 
  <LineItems> 
   <LineItem> 
    <ItemName>%string%</ItemName> 
    <Quantity>%1%</Quantity> 
    <Price>%54321.00%</Price> 
    <Comment>%string%</Comment> 
   </LineItem> 
   </LineItems>
   <TotalPrice>%54321.00%</TotalPrice>
   <ShippingMethod>%string%</ShippingMethod> 
  </PurchaseOrder> 
 </as:Structure> 
</as:AssemblyStructure>

In looking at Listing 2, note that the structure of the XML document is defined almost exactly as though it were an XML instance. In this respect, most IT professionals probably agree that CAM is far more readable than XSD for people who already understand XML syntax. The reality of the situation is that it really is depicted as an XML instance, but with irrelevant content, which I will explain anon.

The Structure element is the parent of the actual structure definition. It has an ID attribute that identifies this particular structure. The only currently recognized value for the taxonomy attribute is XML.

Notice that most elements include values demarcated by percent signs (%). These are simply place holders for actual content that will be included in the XML instance. They serve to make the document easier to understand to the naked eye as opposed to providing any validation logic. Some people, when constructing CAM templates, actually place example values inside the elements as opposed to the more generic values included in Listing 2. How to best include place holders is up to the individual developers.

Now that you understand how structure is defined in CAM, it's time to learn a little more about how business rules are enforced.

How CAM enforces business rules

It's really this simple: XPath.

Yes, that's right. XPath.

And now you have yet another advantage of CAM versus older validation technologies. It uses syntax that most XML technologists already understand to enforce business rules. For these people, there is no need to learn another language to implement CAM validation within their applications.

Listing 3 has an example of the BusinessUseContext element.

Listing 3. Enforcing business rules with CAM
<as:BusinessUseContext> 
 <as:Rules> 
  <as:default> 
   <as:context> 
    <as:constraint action="makeRepeatable(//PurchaseOrder/LineItems/LineItem)"/> 
    <as:constraint action="makeOptional(//LineItem/Comment)"/> 
    <as:constraint action="setLength(//ShippingAddress/State,2)"/> 
    <as:constraint action="setDateMask(//PurchaseOrder/ShipDate,DD-MM-YYYY)"/> 
    <as:constraint action="setNumberMask(//LineItem/Quantity,###)"/> 
    <as:constraint action="setNumberMask(//LineItem/Price,###.##)"/>
    <as:constraint action="setNumberMask(//PurchaseOrder/TotalPrice,###.##)"/> 
    <as:constraint condition="//PurchaseOrder/TotalPrice > 100" 
     action="makeOptional(//PurchaseOrder/ShippingMethod)"> 
   </as:context> 
  </as:default> 
 </as:Rules> 
</as:BusinessUseContext>

To the experienced XML developer, this structure should be fairly easy to interpret. This is not only because the constraints use XPath, but also because the validation rules are named in standard English. Again, this is what makes CAM so attractive.

The rules themselves are defined within the context element. Each rule is an action parameter of one of the constraint child elements.

Note the first rule: makeRepeatable(//PurchaseOrder/LineItems/LineItem). As the name implies, this is telling the validator that the LineItem child element of the LineItems element is repeatable. This means that there can be many of them, which makes perfect sense because a typical purchase order may contain many different items.

The next rule is about the Comment element. This rule states that comments are optional. In other words, the XML document can be valid with an empty Comment element.

The next rule enforces the maximum length, in characters, of the State element. In this case, that maximum length is 2, which is the understood postal abbreviation for a state in the United States.

The next rule enforces the format of the date. Here, the format DD-MM-YYYY is used, although you can certainly use other formats as well. In this case, a valid date would be something like 03-03-2009, meaning March 3, 2009.

The next rule enforces the format of the Quantity element. In this case, the contents of that element must be a number conforming to the ### mask. In other words, a purchase order containing a line item with a four-digit number in the Quantity element would be considered invalid. With this rule, a purchase order cannot contain a line item that orders a quantity of more than 999 of any one product.

The next two rules, Price and TotalPrice, are similar to the previous rule. Like the Quantity rule, they enforce a number mask. The difference is that the number mask allows for decimal points. This is because these two elements are dollar values that can contain fractional values representing cents.

And, finally, there is a particularly interesting rule. It is interesting because it introduces a context-driven constraint. What exactly is that? It's a constraint that can validate an XML document based on the content of certain elements. In this case, if the total price of the purchase order exceeds $100, then the ShippingMethod element of the XML document can be empty. Otherwise, it cannot be empty. The business rule being applied here is that orders totaling $100 or more automatically get free standard shipping. For orders less than $100, the document must specify a shipping method.

Putting it all together

Listing 4 shows an entire CAM document assembled from the fragments provided earlier.

Listing 4. All together now
<?xml version='1.0'?> 
<as:CAM CAM level="1" version="1.0" 
	xmlns:as="http://www.oasis-open.org/committees/cam" > 
<as:Header> 
 <as:Description>Simple Purchase Order</as:Description> 
 <as:Owner>developerWorks</as:Owner> 
 <as:Version>0.1</as:Version> 
 <as:DateTime>2009-07-07T12:00:00</as:DateTime> 
</as:Header> 
<as:AssemblyStructure> 
 <as:Structure ID="myPO" taxonomy="XML"> 
  <PurchaseOrder> 
  <ShippingAddress> 
   <Name>%string%</Name> 
   <Street>%string%</Street> 
   <City>%string%</City> 
   <State>%string%</State> 
   <Zip>%string%</Zip> 
  </ShippingAddress> 
  <ShipDate>%DD-MM-YYYY%</ShipDate> 
  <comment>%string%</comment> 
  <LineItems> 
   <LineItem> 
    <ItemName>%string%</ItemName> 
    <Quantity>%1%</Quantity> 
    <Price>%54321.00%</Price> 
    <Comment>%string%</Comment> 
   </LineItem> 
   </LineItems>
   <TotalPrice>%54321.00%</TotalPrice>
   <ShippingMethod>%string%</ShippingMethod> 
  </PurchaseOrder> 
 </as:Structure> 
</as:AssemblyStructure>
<as:BusinessUseContext> 
 <as:Rules> 
  <as:default> 
   <as:context> 
    <as:constraint action="makeRepeatable(//PurchaseOrder/LineItems/LineItem)"/> 
    <as:constraint action="makeOptional(//LineItem/Comment)"/> 
    <as:constraint action="setLength(//ShippingAddress/State,2)"/> 
    <as:constraint action="setDateMask(//PurchaseOrder/ShipDate,DD-MM-YYYY)"/> 
    <as:constraint action="setNumberMask(//LineItem/Quantity,###)"/> 
    <as:constraint action="setNumberMask(//LineItem/Price,###.##)"/>
    <as:constraint action="setNumberMask(//PurchaseOrder/TotalPrice,###.##)"/> 
    <as:constraint condition="//PurchaseOrder/TotalPrice > 100" 
     action="makeOptional(//PurchaseOrder/ShippingMethod)"> 
   </as:context> 
  </as:default> 
 </as:Rules> 
</as:BusinessUseContext> 
</as:CAM>

As you can see, Listing 4 is little more than a concatenation of Listings 2 and 3. A Header element is added, which simply identifies information about this particular validation file. In this case, a simple description, an owner, a version, and a document date are added.

Although it is not shown in Listing 4, the Header element can also contain parameters. The validation of the XML document can vary based on the value of the parameters. For example, if a parameter named noMoreThan10LineItems is set to true, the CAM document enforces a business rule that there can be no more than 10 LineItem elements in the entire order. This is an example of how powerful and flexible CAM can be when it comes to validation. The benefit here is that you can simply change that parameter to false to invalidate that rule.

The benefits of CAM versus its competition

Obviously, just because a certain technology is new does not mean that it is useful or provides a higher return on investment than its predecessors. CAM, however, has several distinct advantages compared to its competition.

First, CAM separates structure from business rules. This is a recurring pattern throughout software development and is not at all limited to CAM. For example, the Model-View-Controller (MVC) pattern in distributed object development environments separates the model from the view from the controller. Contrary to CAM, XSD tightly couples the structure and the business rules, resulting in higher maintenance overhead.

CAM also enables context-driven validation. In other words, CAM recognizes a dynamic structure based on the content of certain elements or attributes. So, if element X contains a certain value, a business rule is applied to element Y. If it contains another value, that business rule can instead be applied to element Z. This was demonstrated in Listing 3 with the final rule. In that case, purchase order documents with a total price of $100 or more do not need to specify a shipping method because the standard shipping is free for those orders. CAM's predecessors do not facilitate such complex validation.

Analyzing rule sets and structure is much easier with CAM. The structure is represented as an XML instance in the usual tree format, thereby humans as well as computers can read it more easily. The grammar used to enforce business rules is likewise intuitive: makeRepeatable, makeOptional, setLength, and so on are not terribly difficult to decipher. And, although the rules and the structure are separated, they are in the same document, making it easy to get a bird's-eye view of the overall validation requirements. XSD, on the other hand, requires an understanding of a whole new set of non-intuitive definitions—such as complexType (What does that mean?)—and is not so easily analyzed.

Sticking with the "you don't have to learn anything new" theme, CAM uses XPath. As shown previously, this is the language that enforces business rules on certain elements. Not only is XPath intuitive and easy to learn, it is already understood by most XML technologists. This makes the transition to CAM much smoother because the business logic validation does not require XML developers to learn something totally new. The XSD grammar is not anything like XPath.

Another advantage of CAM over XSD is that localization needs are more easily enforced with CAM. With XSD, enumerations are static and, therefore, cannot be made context-aware. However, with CAM, you can apply particular enumerations based on context values. In the emerging global marketplace, the need for such streamlined validation should be self-explanatory.

CAM templates also provide next-generation Service-Oriented Architecture (SOA) support. CAM supports business processing technologies such as Business Process Execution Language (BPEL), Business Process Specification Schema (BPSS), and Business Process Modeling Notation (BPMN) modeling tools. To quote from the Wiki: "Completing the SOA picture CAM has extension mechanisms that can be used to support semantic registry referencing (such as ebXML-regrep) and metadata definitions (such as CCTS and OWL) external to the templates that are key to next generation SOA exchanges." Also, CAM was developed by OASIS, so you can be sure that the organization will ensure that CAM is compliant, if not compatible, with its other standards.

Conclusion

CAM represents the latest generation of XML validation technologies. It provides numerous benefits over its predecessors. Those benefits include a separation of concerns regarding structure and business logic, dynamic validation based on context, interoperability with cutting-edge technologies, lower maintenance overhead, and it is easier to learn. CAM is also endorsed by a well-respected standards organization, OASIS.

CAM is an emerging technology. As such, it is not as well documented and does not enjoy the benefit of mass experience. However, it certainly is robust in its initial release and promises to be a much more efficient means of XML validation.

CAM is almost certainly here to stay and supplant its predecessors.

Resources

Learn

  • The OASIS CAM Wiki: Learn more about CAM.
  • On XML Schema Tutorial: Explore how to create XML Schemas, why XML Schemas are more powerful than DTDs, and how to use XML Schema in your application.
  • DTD Tutorial: Learn how to use DTDs.
  • Introduction to XML (Doug Tidwell, developerWorks, August 2002): XML, the Extensible Markup Language, has gone from the latest buzzword to an entrenched eBusiness technology in record time. Learn what XML is, why it was developed, and how it's shaping the future of electronic commerce.
  • Validating XML (Nicholas Chase, developerWorks, August 2003): Validate files and documents to make sure that data fits integrity constraints. Learn what validation is and how to check a document against a Document Type Definition (DTD) or XML Schema document.
  • Design XML schemas for enterprise data (Bilal Siddiqui, developerWorks, October 2006): Learn to use W3C XML Schema features to design data formats for production management.
  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
  • developerWorks technical events and webcasts: Stay current with technology in these sessions.
  • The technology bookstore: Browse for books on these and other technical topics.
  • developerWorks podcasts: Listen to interesting interviews and discussions for software developers.

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=428987
ArticleTitle=Meet CAM: A new XML validation technology
publish-date=09222009