XML Validation Framework using OASIS CAM (CAMV)

Use a declarative programming approach to write your XML data validation rules

In this article, we present an approach for XML Validation using OASIS Content Assembly Mechanism (CAM) templates to support a wide array of complex message exchanges with business partners using B2B or B2C business patterns. The CAM templates simplify and externalize the validation rules while allowing the gateway to act as a pass-through on information that is not directly relevant. We also cover our experiences using an open source component built using Eclipse and Java™ technology to deliver the needed validation services. Follow the application development process as it happened along with sample code snippets and an XML example using the STAR (Standards for Technology in Automotive Retail) Automotive Business Object Document (BOD) schema and associated CAM XML template.

Puneet Kathuria (puneet.kathuria@in.ibm.com), Advisory IT Architect, IBM

Photo of Puneet KathuriaPuneet Kathuria is an Integration Architect working with IBM India Ltd. He has more than 13 years of experience, mainly in the application and integration architectures, and has been with IBM for the past four years.



David Webber (drrwebber@acm.org), Senior Architect, INTEGRITYOne Partners

Photo of David WebberDavid is currently consulting on NIEM IEPD development for the US government and is based in Washington DC, USA. David is Chair of the OASIS CAM technical committee and co-developer of the CAM Studio Eclipse editor responsible for the majority of the XSLT processing scripts. David has over 30 years experience in the industry and in 2007 was recognized as a Senior Member of the ACM for his industry work in XML. David has authored many articles on the topic of XML and information exchange optimization, standards specifications for OASIS, and presented widely on XML in North America, Europe, and Asia.



Martin Roberts (martin.me.roberts@googlemail.com), CAM Open Source Project Developer/Designer, Ontology Systems

Photo of Martin RobertsMartin is a consultant based in Suffolk, England specializing in XML, Ontologies, Java, Eclipse and Web solutions with over 20 years experience. Martin authored both the original OASIS CAM specifications and the CAM Studio Eclipse editor and CAMV validation engine implementations. Martin was also previously active in the telecommunications industry work on XML-based message exchanges and standards work in Europe. He has presented at a number of industry events including OASIS sponsored technology expositions in Europe particularly.



11 May 2010

Also available in Chinese Japanese Vietnamese

Business and technical challenges

In today's complex information exchanges with XML and associated large XSD schema, coupled with an array of trading partners, it becomes a significant challenge to support and maintain accurate handling of all incoming transactions. Currently, XML schemas and DTDs provide the ability to validate, or verify, the structural content of a XML document. Certain validation rules can also be accommodated as part of XML schemas but not all kinds of transaction validations can be performed using XML schemas or DTDs.

Frequently used acronyms

  • API: Application Programming Interface
  • B2B: Business-to-business
  • B2C: Business-to-consumer
  • DTD: Document Type Definition
  • HTTP: Hypertext Transfer Protocol
  • JAX-RPC: Java API for XML-Based remote procedure call
  • JDOM: Java-based Document Object Model
  • J2EE: Java 2 Platform, Enterprise Edition
  • OASIS: Organization for the Advancement of Structured Information Standards
  • UI: User Interface
  • WSDL: Web Services Description Language
  • XML: Extensible Markup Language
  • XPath: XML Path Language
  • XSD: XML Schema Definition
  • XSLT: Extensible Stylesheet Transformations

With the advent of industry specific standards such as the Standards for Technology in Automotive Retail (STAR), whole collections of standard XML message exchange formats are provided in the form of XML Schemas. Both the consumers and providers of Web services must comply with these schemas to be certified by their industry standards body. However, such industry specific schemas are loosely bound with minimal validations and can be used for only structural validation of the incoming XML. Additional code is required to implement required validations that augment the schema checks. These validations prevent errors when the data is received by applications or components that expect the data to be in a particular structure and comply with business content validation rules.

The most common way to implement the needed validation logic in a Web service and its associated XML applications is to write custom code; as a result, the validation rules are buried inside the applications and cannot be easily adapted, documented, or shared. Depending upon the number and nature of the validations required, the validation code can be complex and lengthy and its maintenance can become a significant burden as more partners are added. Add to that the time, effort, and risk associated with recompiling and redeploying that code to a production server every time the validation logic changes.

In addition to the standalone applications, validations also are required when exposing the services through an Enterprise Service Bus (ESB). Figure 1 illustrates the typical architecture of an ESB centered on a messaging bus. The bus provides message delivery services based on standards such as SOAP, HTTP, and Java Messaging Service (JMS). The ESB enables services to interact with each other based on the quality of service requirements of the individual transactions. It also supports different standards such as SOAP, XML, WSDL, JMS, J2EE, JAX-RPC, and so on.

Figure 1. How to perform validations in an ESB architecture
Diagram of an ESB messaging layer located between message consumers and message providers

One of the major challenges facing developers is how to perform message validations at the message provider and message consumer end points while interacting across the ESB. For example, as in Figure 1, a Web services component might require information from an existing application. The Web service (consumer) sends a message requesting information to the existing application (provider) through the ESB. The application component requires a request in a certain format with correct information, so it will validate the request message before processing it. The Web services component has its own set of requirements and will validate the response message. If the two endpoints use different protocols or standards, the ESB can transform each message and will perform validations before transforming the messages.

Each provider and consumer has its own requirements; hence, depending on the number of transaction types and validations, this can result in a long development cycle to define, create, and test all the validations. This stabilization phase proceeds until each validation component is able to provide correct feedback about message validation to its invoking component.


Solution description

The solution approach we describe here is to implement the XML validation services based on the OASIS Content Assembly Mechanism (CAM) specification. The OASIS CAM template approach is based on a simple approach to XML content handling and validation that allows businesses to create common interchange models for their exchanges in XML. CAM templates support context-based rules, code-lists, and cross-field validations. Many cross-field validations cannot be implemented in an XSD schema alone; in other cases, it is not possible in the published industry schemas to accommodate all the validations variations.

The solution includes CAM Studio (an Eclipse-based UI template editor) that is used to define the CAM template. Then the CAMV validation engine provides a set of open source Java APIs which are used to validate the XML with the specific compiled CAM templates at run-time. CAM Studio template editor supports adding custom XPath expressions to its generated templates but the UI can define most rules without writing any custom expressions.

Figure 2 shows the Model, Author and Test, Deploy, and Monitor stages in the life-cycle of developing the validation rules:

Figure 2. Validation rules life cycle
Diagram of life cycle (Model, Author and test, Deploy, Monitor) for developing validation rules

Model stage

In this step the data entities and their data elements are identified along with their corresponding validation rules. The required XML exchange schema is designed; alternatively, the required elements are mapped to an existing industry standard schema such as one from STAR (Standards for Technology in Automotive Retail).

Author and Test stage

CAM Templates are assembled or authored using the CAM Studio editor. These are the three possible editor options provided to create a CAM template:

  1. Create from scratch or hand-crafted
  2. Use an existing XML Schema
  3. Use an existing XML instance

Once you create the CAM template, the next step is to review each and every element and attribute and specify the validation rules as applicable. A panel in the editor displays the rules for each template node. Figure 3 displays a screen capture of the template structure in the CAM Template Editor:

Figure 3. CAM template in the CAM TemplateEditor
Screen capture of the CAM Template Editor showing an outline of the template structure

While all the validation rules need not be binary in nature (that is, either pass or fail), CAM supports classifying the validation failures as Warnings. This feature comes in handy for scenarios where corrective action can be taken at the service provider-end, modifying the payload to make the message usable rather than rejecting the complete message. For example, a rule might require the length of a particular comment field to be within 255 characters; however, a request message should not be rejected when the length exceeds the maximum value, but a warning should be sent to the consumer specifying that only the first 255 characters will be used from the comment.

You will see the details of how to set up a validation message classification as a Warning in the Tips and tricks section of this article.

Deploy stage

The CAM templates are compiled using the CAM Studio Editor before you use them with the application run-time CAMV engine. The compiled format is the condensed XML version of the original CAM template itself and is designed to optimize performance of the CAMV validation engine. To compile the CAM Template, select the menu option Tools > Compile Template. This will generate the .cxx file format of the template which will be used at run time.

The CAMV validation engine offers a simple, open-source Java API which can be used in any Java application to validate an input XML with the applicable CAM template. The code snippets in Listing 1 illustrate the usage of CAMV:

Listing 1. Usage of CAMV API
TemplateValidator tv = new TemplateValidator(templateDocument);
tv.setErrHandler(new ElementErrorHandler(tv));

boolean tvResult = tv.validate(ioReader);

if (tvResult){
        System.out.println("No errors, might be warnings.....");
}

List errList = tv.getErrors();
List warnList = tv.getWarnings();

The error, warning messages are formatted as

<error classification>: <XPATH> => <error or warning message> => Node: <node name> => attribute: <attribute name>

For example, an error message would look like this:

/p:ProcessRepairOrder[1]/p:ApplicationArea[1]/p:CreationDateTime[1]=>Content does not conform to the mask:YYYY-MM-DD'T'HH:MI:SSZ =>Node: CreationDateTime

A warning message would look like this:

Warning: /p:ProcessRepairOrder[1]/p:ProcessRepairOrderDataArea[1]/p:RepairOrder[1] /p:RepairOrderHeader[1]/p:OwnerParty[1]/p:SpecifiedPerson[1]/p:ResidenceAddress[1] /p:LineOne[1]=> length should be less than 80 =>Node: LineOne

Monitor stage

By virtue of using CAMV, you can now externalize all the validation checks and need not embed them inside code or implement using custom coding. During the monitoring cycle, you can meet the need for additional validations by simply updating the validation templates. To add additional validations or remove existing ones, redistribute the compiled CAM templates (.cxx files). You do not need to recompile and redeploy any Java code in the event of a change in validation logic.


New features in the latest CAMV release

Some of the key features added to the latest (December 2009) release of CAMV are:

  1. A backward compatible release download for Java 1.5 has been created in addition to the default Java 1.6.
  2. CAMV is thread-safe; hence, it can be deployed in any J2EE container such as WebSphere® Application Server.
  3. CAMV can now accept XML input as StringReader in addition to JDOM documents, reducing the possible instances of serialization and de-serialization during message handling.
  4. Multiple conditions can be now defined on a single XML element or attribute.

Tips and tricks

The following are tips and tricks that we identified from a recent project where we used CAMV to create a validation framework for a B2B Gateway that exposes STAR-based Web services for a leading automotive industry organization.

Validation classifications

CAMV supports creating validations rules for providing Warning messages in addition to Errors. A conditional XPath expression needs to be specified on the XML element to specify the validation for the Warning message.

For example, consider a business scenario where the Web service request need not be rejected if length of a particular field exceeds the specified limit of 255 characters. The business decision is to truncate the length of the field to 255 characters, if it exceeds, as required by the backend system; however a warning must be issued to the invoking component.

Such scenarios can be handled by specifying a printmessage() expression in the CAM template rules.

The Message Text must have a Prefix Warning: followed by the required warning message such as length should be less than 255. The complete message text will appear as Warning: length should be less than 255.

As the warning is returned only if the length of specific element exceeds the specified length, this rule is specified as conditional and an XPath expression is created to perform the length check as depicted in Figure 4 screen capture of the CAM Studio Editor expression entry wizard tool:

Figure 4. How to configure a warning rule
Screen capture of a rule for a warning message that checks for a 255-character length

Cache the CAMV template

You can cache CAMV templates into memory to perform repeated validations and not read the templates from the hard disk for each and every validation performed. This reduces the disk I/O and significantly improves the performance and throughput.

Checking for validation errors

The CAMV Java method TemplateValidator.validate(..) returns true even if warnings are returned. It is set to false only when errors are returned. Hence, in the event where only warnings are returned, use the getWarnings() method to get the list of any warnings messages.

Validation messages

If the returned messages (which contain the XPath path, a validation message, and a node name) are not sufficient for the business scenario and more information is required, the client application can create custom code. CAMV returns the same input XML after adding the CAMERROR and CAMWARN attributes to the input exchange message XML as depicted in Listing 2.

Listing 2. Modified XML after performing validation
<p:ApplicationArea>
<p:Sender>
<p:CreatorNameCode>CNV</p:CreatorNameCode>
<p:SenderNameCode>SNC</p:SenderNameCode>
</p:Sender>
<p:CreationDateTime CAMERROR="CreationDateTime | Content does not conform to the
mask:YYYY-MM-DD'T'HH:MI:SSZ">2001-12-31T12:00:00</p:CreationDateTime>
<p:Destination/>
</p:ApplicationArea>

<p:ResidenceAddress>
<p:LineOne CAMWARN="WARNING:LineOne |  length should be less than 80">100 Moon Drive 
100 Moon Drive 100 Moon Drive 100 Moon Drive 100 Moon Drive 100 Moon Drive</p:LineOne>
<p:LineTwo>APT # 100</p:LineTwo>
<p:CityName>MALIBU</p:CityName>
<p:CountryID>US</p:CountryID>
<p:Postcode>99999</p:Postcode>
<p:StateOrProvinceCountrySub-DivisionID>CA</p:StateOrProvinceCountrySub-DivisionID>
</p:ResidenceAddress>

Wildcard expressions

When entering rules into the template, the XPath validation expressions are specified (by default) using the wildcard expression of two slashes (//) which selects all nodes in the document from the current node that match the selection no matter where they are.

Figure 5. How to specify wildcard expressions while defining rules
Screen capture of a rule with an XPath expression using two slashes (a wildcard value)

This results in rules being applied to all such instances of a particular element. (Note: The rules might not be visible immediately at all other instances of a particular element but become visible once the template is refreshed in the CAM template editor view).

However, in case you need to apply the check to a particular instance of an XML element then it is advisable to select Full for Rule XPath check box.

Figure 6. How to specify explicit expression while defining rules
Screen capture of a rule with an explicit XPath expression

Summary

Using CAMV, you can enforce the validation checks consistently and then rapidly change the rules to fine-tune message handling to match particular partner exchanges and content. By externalizing the validation rules, which conventionally have been embedded deep inside the backend application code, you have much better control and management along with more predictable message handling. These standards-based rules templates can optionally be shared with partners to facilitate better content handling alignment across systems.

With a more adaptive and fault tolerant process, the application is able to handle a wider variation in content and, hence, more easily support a broad set of interaction partners with reduced support and maintenance costs—which is the opposite of normal experiences.

The use of open source greatly facilitated collaboration on developing the solution and integrating the CAMV engine into the deployment environment.

Overall, this project demonstrated that innovative use of XML and dynamically configurable XML rule templates can provide a better, more stable, faster, and capable customer application experience than relying on static compiled code resources alone.


Download

DescriptionNameSize
Sample Java project that uses CAMV Java APIsValidationFrameworkSample.zip2032KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=489003
ArticleTitle=XML Validation Framework using OASIS CAM (CAMV)
publish-date=05112010