Skip to main content

Enforce basic document structure with XML constraint checking

Evaluate two approaches based on the healthcare environment

Susan Malaika (malaika@us.ibm.com), Senior Technical Staff Member, IBM
Photo of Susan Malaika
Susan Malaika is a Senior Technical Staff Member in the IBM Information Management Group (part of IBM Software Group). Her specialties include XML, the Web, and databases. She has developed standards that support data for grid environments at the Global Grid Forum. In addition to working as an IBM product software developer, she has also worked as an Internet specialist, a data analyst, and an application designer and developer. She has also co-authored a book on the Web and published articles on transaction processing and XML. She is a member of the IBM Academy of Technology.
Christian Pichler (cpichler@researchstudio.at), Researcher, Research Studios Austria
Photo of Christian Pichler
Christian Pichler is a Researcher at Research Studios Austria where he is working on different projects around eGovernment and eBusiness. He holds a master's degree in Medicine and Computer Science as well as in Computer Science Management. Currently, he is also a candidate for a PhD degree in Computer Science at the Technical University of Vienna in Austria focusing on Inter-Organisational Systems.

Summary:  The ability to specify, check and act upon constraints is vital to ensuring the overall quality of healthcare information. The Health Level 7 (HL7) Clinical Document Architecture (CDA), described through XML Schema, allows the specification of constraints through HL7 Templates, which can be implemented in Schematron. Schematron can be applied through XSLT. This article illustrates software and hardware solutions for constraint checking in the HL7 CDA. The two solutions are demonstrated in an SOA that includes both successful and failing XML Schema and Schematron constraint checks. The article evaluates the application of constraints in the HL7 CDA and identifies some categories of constraints that require further investigation. The outcome of this evaluation shows that the ability to specify, check, and act upon constraints through Schematron complements XML Schema processing. The two constraint approaches are very useful and practical, and should therefore be pursued further.

Date:  15 Sep 2009
Level:  Intermediate PDF:  A4 and Letter (67KB | 12 pages)Get Adobe® Reader®
Activity:  2821 views

Introduction

Frequently used acronyms

  • SOA: Service Oriented Architecture
  • XML: Extensible Markup Language
  • XSLT: Extensible Stylesheet Transformations

The exchange of industry specific information, such as banking, insurance, or finance, between two or more parties plays an important role in today's business. To enable information exchange, an agreement on the information format is usually achieved through industry specific formats (see Industry Formats and Services with pureXML) defined by industry consortia. The same requirements are true for the healthcare industry. The structure of clinical information, as defined in health care standards and specifications such as the Health Level 7 (HL7) Clinical Document Architecture (CDA) (see The Clinical Document Architecture and the Continuity of Care Record: A Critical Analysis) is important for enabling the exchange of healthcare information. HL7 CDA specifies the structure and semantics for clinical documents (see HL7 Clinical Document Architecture, Release 2). The work presented in this paper is based on the HL7 CDA. However, the pilot study described here could lead to further applications of the constraints concepts in other industry domains. In addition to an agreement on a structure to represent information, it is also important to provide a mechanism to ensure the completeness, correctness, and integrity of information. An example in the financial industry is the Financial Product Markup Language (FpML) to describe Financial Derivatives, which provides additional validation rules, beyond those specified in the FpML XML schema, which should be applied to ensure the quality of the data (see Consistency Checking of Financial Derivatives Transactions). The HL7 CDA allows the use of HL7 Templates to constrain the CDA for specific clinical situations and to provide a validating rule set (see Layered Constraints: The Proposal for HL7 Healthcare Templates). The purpose of rule sets is to improve the quality of healthcare information exchanged through the HL7 CDA documents, by the application of constraints.

The work presented in this paper classifies the types of constraints in the context of the HL7 CDA, and evaluates the ability to specify, check, and act upon possible violations of constraints. The application of constraints to healthcare information is illustrated in two different solutions which were implemented as part of the work presented in this paper. The first approach is entirely software based, whereas the second approach takes advantage of XML-processing hardware. Both of the solutions presented are demonstrated in an SOA environment. The Web services aspects of the solutions could potentially make validated data accessible to health care providers who might not have the staff or equipment to access the data in other ways. Moreover, the work describes ways to produce user-friendly messages when constraint failures occur.

Constraints

The classification of constraints described in this section is motivated by the classification of the Financial Derivatives Consistency Constraint Classification (see Consistency Checking of Financial Derivatives Transactions). Figure 1 illustrates processes in the context of HL7 CDA documents where inconsistencies in healthcare information can occur: (1) in the enforcement of local requirements (Type I); (2) in HL7 CDA document cross-checking (Type II); (3) reference data checking in XML format (Type III); and (4) in reference data checking in non-XML format (Type IV).


Figure 1. Classification of HL7 CDA constraints based on the Consistency Checking of Financial Derivatives Transactions.
Diagram of classification of HL7 CDA constraints by type

The following section describes the different sources of inconsistent information, provides examples, and illustrates sample implementations for constraints that are used to avoid inconsistency. The implementation of the constraints identified is based on the fact that HL7 CDA documents are encoded in Extensible Markup Language (XML). If needed, you'll find links to an introduction to XML Schema or Schematron in Resources.

Type I (Structural requirements). These constraints ensure the structure of healthcare information. For the HL7 CDA, the structure is specified through an XML schema provided by HL7. Then you can use the XML schema to validate HL7 CDA XML document instances against the XML schema, which determines whether the HL7 CDA documents contain the information required in the appropriate hierarchical structure. Sometimes the structure defined by an XML schema is not sufficient to convey specific requirements, such as an information system at a local institution. Suppose a given institution requires certain information be present in an instance document that is considered optional by the HL7 CDA XML schema. You can approach this problem in two ways. The first option is to modify the existing XML schemas (see Listing 1).


Listing 1. Excerpt of original HL7 CDA XML schema

<xs:element name="code" type="CD" minOccurs="0" maxOccurs="1" />

Modify the existing XML schema to define the element as mandatory instead of optional (see Listing 2).


Listing 2. Excerpt of modified HL7 CDA XML schema

<xs:element name="code" type="CD" minOccurs="1" maxOccurs="1" />

The second option is to complement the XML schemas with Schematron rules (see Resources). Schematron allows the specification of assertions on the content of HL7 CDA documents and is typically applied through XSLT. In this case, you use Schematron to determine the presence of certain elements in an XML document, without modifying the original XML schema (see Listing 3).


Listing 3. Excerpt Schematron rule

<assert test="hl7:code">
  The necessary information on the code system to 
  identify that the blood parameter is missing.
</assert>

Type II (Internal consistency). Constraints of type II evaluate the content of HL7 CDA XML document instances and therefore avoid inconsistency within HL7 CDA XML documents. The content evaluated includes both isolated values and values which depend on other information within the same XML document. An example might be a constraint which determines whether blood parameter values encoded in an HL7 CDA document are within the allowed reference ranges or not. For blood parameters, some reference values can be the same for male and female patients, others can differ by gender. These constraints can be specified the same way as type I constraints, either through modifying the HL7 CDA XML schema or through the implementation of Schematron rules (see Listing 4).


Listing 4. Excerpt of Schematron rule to check a co-occurence constraint

<let name="gender" value="administrativeGenderCode/@code" />

<report test="hl7:code[@code = '718-7'] 
  and (($gender='M' and hl7:value[@value > 17.2 or @value < 13.8]) or 
       ($gender='F' and hl7:value[@value > 15.1 or @value < 12.1))">
  Hemoglobin is not within the reference range, which could imply 
  illness of the patient, that needs to be clarified.
</report>

An example for XML schema is not included, since XML Schema in its current version 1.0 does not support the evaluation of co-occurrence constraints. Its successor, XML Schema version 1.1 (see Resources), provides more flexibility. With XML Schema version 1.1, it is possible to define assertions on the content of the XML document, including co-occurrence constraints but only within a single XML data type. With XML Schema 1.1 assertions, there is no agreed way to support customized messages for non-technical people (see Listing 3).

Type III (Reference data in XML format). Constraints of type III allow the definition of constraints to evaluate dependencies between content stored in different HL7 CDA XML document instances. An example is the ability to define relationships between separate HL7 CDA XML document instances in a workflow. The theoretical background for these relationships is not within the scope of this article but one of these relationships used in this example is "append". "Append" specifies that an HL7 CDA XML document instance complements another existing HL7 CDA XML document instance, which means that neither of these documents by themselves are valid. Moreover, they are valid only if available together. In this case, a constraint ensures that the relationship is not violated and that both HL7 CDA XML documents are available. An example implementation for XML Schema and Schematron is not included, since these technologies do not allow the specification of type III constraints in their current versions. In Consistency Checking of Financial Derivatives Transaction, an approach is given for handling type III constraints through the creation of links across documents. Service Modeling Language from W3C (see Resources) also includes a cross document constraint facility through the deref() extension function.

Type IV (Reference data in non-XML format). The fourth type of constraint is defined as a constraint that evaluates dependencies between values stored in an HL7 CDA XML document instance and values stored in a non-XML data source. For example, a non-XML data source can be a relational table. An example might be that reference values for blood parameters are stored in a relational database table, and that the blood parameter values are validated against the reference values stored in the table. Neither XML Schema nor Schematron allow the specification of type IV constraints in their current versions.

Implementation

In this section, the constraints using XML Schema and Schematron described in the previous section are applied to healthcare information to ensure correctness, completeness, and consistency. Two different approaches are described: one software-based solution and one hardware-based solution.

Software approach.The first approach takes advantage of the native XML support of the database system IBM DB2® called pureXML®. The database is capable of storing XML in a native format and provides a variety of XML-related capabilities such as XML Schema validation and the ability to query XML documents through XQuery. The approach (see Figure 2) consists of three components, which are: the database system, Web services, and the client. The database system supports constraint checking through the ability to register the HL7 CDA XML schema, to validate HL7 CDA XML document instances documents against the registered schema, and to perform XSL transformations to apply Schematron constraints. The scenario allows the submission of HL7 CDA XML document instances to the database, where the constraints are applied to the XML document instances. The document will be inserted into the database where it is stored in a native XML format, together with the validation result.


Figure 2. Overview of software approach
Diagram of software approach, showing database system, Web services, and the client

Listings 5 and 6 show sample constraint failures returned from the database system. An alternative approach to utilizing a database system could be the application of constraints directly on the client using XForms due to their capability of applying Schematron. (See Resources for more information about XForms.)


Listing 5. Excerpt from an XML schema validation failure

XML document contains an element "x" that is not correctly specified. 
Reason code = "37". SQLCODE=-16196, SQLSTATE=2200M, DRIVER=3.50.152


Listing 6. Schematron validation failure

<?xml version="1.0" encoding="UTF-8"?>
The necessary information on the code system to identify the 
blood parameter is missing.

Hardware Approach.The second approach takes advantage of hardware capable of processing XML data called IBM WebSphere® DataPower®. The capabilities used as part of this solution include the validation of HL7 CDA XML document instances against the HL7 CDA XML schema and the application of Schematron rules through XSL transformations. The approach (see Figure 3) consists of four different components, which are: the database system, the Web services that allow the insertion and storage of HL7 CDA XML document instances in the database, the DataPower appliance which performs constraint checking, and the client submitting HL7 CDA XML document instances to the hardware appliance. The client can be any component that issues Web service requests, such as an XForms-based client or even another information system. The scenario allows the submission of HL7 CDA XML documents to the DataPower appliance, where all of the constraints are applied. After validation against the HL7 CDA XML schema or Schematron rules, the DataPower appliance forwards the XML document, along with the validation result, to a Web service which performs the insert of the HL7 CDA XML document instance and the validation result, into the database, where it is stored in a native XML format.


Figure 3. Overview of hardware approach
Diagram of hardware approach, showing database system, Web services, DataPower, and the client

Listings 7 and 8 show sample constraint failures produced from the hardware appliance.


Listing 7. XML schema validation failure

<?xml version="1.0" encoding="UTF-8"?>
<error>
  http://dp:2/: cvc-wildcard 2: unrecognized element {urn:hl7-org:v3}x
</error>


Listing 8. Schematron validation failure

<?xml version="1.0" encoding="UTF-8"?>
<error>
  Hemoglobin is not within the reference range, which could imply 
  illness of the patient, that needs to be clarified.
</error>

Observations

The specification of constraints using XML Schema and Schematron, and the checking of these constraints allow us to discuss different observations found during implementation. The most visible difference between XML Schema and Schematron is the quality of error messages generated during the checking of constraints of type I, which are structural constraints. Looking at the validation of HL7 CDA XML document instances against the HL7 CDA XML schema is, in both approaches illustrated, done through an XML parser. The resulting error messages are in both cases also generated by the XML parser. See Listings 5 and 7 for examples. These error messages are not very useful for non-technical people. The people creating XML documents, most likely through an industry specific editor, or people investigating why XML documents failed constraint checking, are not always technical people.

The user-unfriendliness of error messages generated by XML parsers is a known problem. An alternative approach is described in Customized Document Validation to Support a Flexible XML-based Knowledge Management Framework. The approach describes a validation technique to complement accepted XML schema validation which provides flexibility in the definition of error messages through the provision of special software error handlers for XML schema validation failures. An alternative approach to improve the message quality is for XML parsers to collect XML schema annotation information, which often appears in industry standard schemas, and include it in the error messages. On the other hand, Schematron allows the definition of customized error messages. See Listings 6 and 8 for examples.

Moreover, Schematron can define constraints for a certain context within an XML document instance which allows the design of very concrete and specific error messages for specific types of violations and for specific users and tools. Modifying an XML schema to incorporate additional local constraints might not always be straightforward or possible. As XML schemas evolve, you will need to re-apply additional local constraints to the new XML schema. With Schematron, you can add additional rules to another Schematron rule set more easily than modifying an XML schema. You can apply various Schematron rule sets individually as required by processes and by organizations. Another difference observed is the flexibility in applying the constraints defined in XML Schema or Schematron. When you apply constraints defined using XML Schema, you must follow the "all or nothing" principle, whereas Schematron allows partial application of the rules. The flexibility in applying constraints defined using Schematron allows the application of constraints depending on the current needs. An example is that only a certain type of constraint needs to be checked such as structural constraints. Or another example is that certain sections of an HL7 CDA XML document are checked only.

Another observation made and described in this article is that Schematron and XML Schema are not sufficient to specify all constraints identified. Neither XML Schema nor Schematron allow the specification of constraints across different XML documents (type III) or between an XML document and a non-XML data source (type IV).

Conclusion

This article introduces a classification for constraints. It then shows that XML Schema and Schematron are complementary and can be used to specify structural constraints and constraints on the content of HL7 CDA XML document instances. The implementation section demonstrates that XML Schema is a good and established way to define the structure of XML documents. Although the quality of error messages produced during validation by an XML Schema parser can be improved significantly. Schematron complements XML schema validation, offering message customization and a simple approach to extensibility and rule set composition when schemas evolve. XML Schema 1.1 is bringing in the notion of constraints into XML Schema through assertions within data types but without message customization. The article illustrates that commercial software and hardware solutions exist that can handle both Schematron and XML Schema processing. Areas for further work include performing the various constraint check approaches on large volumes of data and exploring the applicability of XML constraint check methodologies in other industry domains.

In conclusion, the specification of constraints using XML Schema is suitable for enforcing basic document structures, such as the HL7 CDA. Schematron is suitable for partial structure and HL7 CDA content constraints, which is highly appropriate for HL7 Templates. The article describes some constraints (type III and type IV) that are not enforced by XML Schema and Schematron. Schematron fulfills two main requirements in regards of HL7 Templates. It can be used to specify structural constraints in addition to the HL7 CDA XML schema, and it allows the specification of constraints to evaluate the content of HL7 CDA XML documents, which can be constraints for specific clinical situations. Schematron could be the basis for a reference implementation for the validating rule sets for HL7 Templates.


Resources

Learn

  • Layered Constraints: The Proposal for HL7 Healthcare Templates (Liora Alschuler, Robert H. Dolin, Sandy Boyer, Charlie Mead, and Peter Elkin). In XML 2008, 8-13 December 2002, Baltimore, MD, USA, 2002.

  • HL7 Clinical Document Architecture, Release 2 (Robert H. Dolin, Liora Alschuler, Sandy Boyer, Calvin Beebe, Fred M. Behlen, Paul V. Biron, and Amnon Shabo). Journal of the American Medical Informatics Association, 13:30-39, 2006.

  • Consistency Checking of Financial Derivatives Transactions (Daniel Dui, Wolfgang Emmerich, Christian Nentwich, and Bryan Thal). In Objects, Components, Architectures, Services, and Applications for a Networked World (Mehmet Aksit, Mira Mezini, and Rainer Unland, editors), International Conference NetObjectDays 2002, 7-10 October 2002, Erfurt, Germany, Revised Papers, volume 2591 of Lecture Notes in Computer Science. Springer, 2003.

  • The Clinical Document Architecture and the Continuity of Care Record: A Critical Analysis (Jeffrey M. Ferranti, R. Clayton Musser, Kensaku Kawamoto, and W. Ed Hammond). Journal of the American Medical Informatics Association, 13:242-252, 2006.

  • Customized Document Validation to Support a Flexible XML-based Knowledge Management Framework (Timothy P. Hanna, Roberto A. Rocha, Nathan C. Hulse, Guilherme Del Fiol, Richard L. Bradshaw, and Lorrie K. Roemer). In Proceedings of the AMIA Annual Symposium 2005, 22-26 October 2005, Washington, DC, USA, pages 291-295, 2005.

  • Universal Services for pureXML using Data Web Services (Susan Malaika and Christian Pichler, developerWorks, August 2008): Easily enable your pureXML column to be accessed through Web service operations.

  • Industry Formats and Services with pureXML (alphaWorks): Review this demonstration of end-to-end XML data exchange with a DB2 9 pureXML database, retrieval through RESTful generic Web services, and user interaction provided through Atom feeds and XForms-capable browsers.

  • Clinical Document Architecture (Health Level Seven): Read more on this document markup standard that specifies the structure and semantics of clinical documents for the purpose of exchange.

  • ISO Schematron (Schematron) : Learn about this language for making assertions about the presence or absence of patterns in XML documents.

  • XML Schema 1.1 (World Wide Web Consortium): Explore how the XML Schema can define the structure, content, and semantics of XML documents.

  • XForms 1.0 (World Wide Web Consortium): Create device-independent forms with XForms which splits traditional XHTML forms into three parts—XForms model, instance data, and user interface—. With presentation separated from content, you reduce server trips and scripting.

  • SML (Service Modeling Language) (World Wide Web Consortium): Combine SML with XML Schema and Schematron to model complex services and systems, including structure, constraints, policies, and best practices.

  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.

  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

  • The technology bookstore: Browse for books on these and other technical topics.

  • developerWorks podcasts: Listen to interesting interviews and discussions for software developers.

Get products and technologies

Discuss

About the authors

Photo of Susan Malaika

Susan Malaika is a Senior Technical Staff Member in the IBM Information Management Group (part of IBM Software Group). Her specialties include XML, the Web, and databases. She has developed standards that support data for grid environments at the Global Grid Forum. In addition to working as an IBM product software developer, she has also worked as an Internet specialist, a data analyst, and an application designer and developer. She has also co-authored a book on the Web and published articles on transaction processing and XML. She is a member of the IBM Academy of Technology.

Photo of Christian Pichler

Christian Pichler is a Researcher at Research Studios Austria where he is working on different projects around eGovernment and eBusiness. He holds a master's degree in Medicine and Computer Science as well as in Computer Science Management. Currently, he is also a candidate for a PhD degree in Computer Science at the Technical University of Vienna in Austria focusing on Inter-Organisational Systems.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=427800
ArticleTitle=Enforce basic document structure with XML constraint checking
publish-date=09152009
author1-email=malaika@us.ibm.com
author1-email-cc=dwxed@us.ibm.com
author2-email=cpichler@researchstudio.at
author2-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers