For a decade, academia, government, and industry groups have used XML for document storage and data integration. Database and software providers, including IBM, recognized the potential of XML early on: they helped define standards and industry-specific markup languages based on XML, while working to integrate XML with SQL in their database platforms. By first releasing the IBM DB2 XML Extender, Informix Web DataBlade, and Informix XSLT DataBlade, then more deeply integrating XML support into the RDBMS servers, IBM and Informix advanced the integration of XML and SQL database processing.
XML is an important enabling technology for multiple industries, particularly industries with complex integration, archiving, and compliance requirements. In particular, the healthcare industry can exploit service-oriented architectures (SOAs) and SQL/XML-powered databases for building new applications while addressing compliance and standards issues-and for supporting electronic medical records systems.
The rise of XML in healthcare
The rise of XML in the healthcare industry has been driven partly by legislation intended to protect patients' security and privacy, including the Health Insurance Portability and Accountability Act (HIPAA). Enacted by the U.S. Congress to protect insurance coverage, HIPAA includes standards for electronic transactions and provisions for the privacy and security of data and applies to claim, payment, benefit inquiry, claim status, and other transactions. HIPAA also requires the U.S. Department of Health and Human Services to define rules for the dissemination of healthcare information.
Translating those requirements into usable standards is often the work of standards-development organizations; one of the most prominent in the healthcare industry is Health Level Seven (HL7). HL7 produces standards for operations involving the exchange of administrative and clinical data in healthcare domains, including claims processing, imaging, and pharmacy (see sidebar at the bottom of this article, "HIPAA and HL7"). The healthcare industry has also developed specifications for an operational data model and study data tabulation model (Clinical Data Interchange Standards Consortium). These standards apply to communication between internal systems and external entities, such as the U.S. Food and Drug Administration.
New standards in various industries, including healthcare, have prompted development of XML-enabled applications. This new wave of technology means that we can build applications-often composite applications using SOA plumbing-to access medical data with a combination of interoperable services and rich database support for
But efficiently managing large amounts of XML data can be a challenge. With DB2 9, IBM introduced the pureXML solution, which permits storage, indexing, and querying of documents in their native XML format. Several leading institutions have taken advantage of the native XML capabilities of IBM DB2 to build systems that not only exploit healthcare industry standards, but also improve data access and performance.
The challenge of electronic medical records
With sizeable amounts of rapidly and frequently changing data, large healthcare organizations rely heavily on XML. The UCLA Health System is one such organization: a multi-hospital healthcare provider with a diversity of clinical and healthcare applications. It includes the Ronald Reagan UCLA Medical Center, Santa Monica UCLA Medical Center and Orthopaedic Hospital, Mattel Children's Hospital UCLA, and Resnick Neuropsychiatric Hospital at UCLA, as well as the UCLA Medical Group of primary-care and specialty-care offices. The staff of more than 2,000 physicians handles more than 1 million clinic visits and 80,000 hospital visits per year.
A healthcare system of that size must process large amounts of data on a daily basis, including medical records updates; lab results; MRI, computed tomography (CT), and electron beam CT angiography images; admission, discharge, and transfer data; and pharmacy orders. All of this data must be stored securely and reliably (and eventually archived intelligently) while remaining accessible on demand. In addition, the data must be easily searched, transmitted, and organized by a wide variety of employees spread across multiple locations. Add the need to easily and quickly input and update information, and you're looking at a complex piece of IT infrastructure.
The UCLA Health System met those demands with its patient-oriented documentation system (PODS), an electronic medical records (EMR) repository that provides storage and retrieval capabilities for more than 20 million documents. In the bigger picture, PODS is a critical source of patient information for the UCLA document management system extended SOA (xSOA). Along with PODS, the xSOA provides viewer interfaces to a GE BDM pharmacy information system, a CliniComp Essentris acute care system, an Orion Soprano clinical data system, and a forms portal. The xSOA Central Document Bus connects to a GE picture archiving and communication system (PACS), clinical applications and images, and the PODS repository. The Image Bus provides access to patient diagnostic images, whereas a Forms Bus handles the large variety of electronic forms used by UCLA Health Services. An HL7 Message Bus provides HL7-compliant communications.
Combining an SOA with DB2 databases, PODS provides 2,000 doctors and 3,000 nurses with access to patient records. The system supports more than 400 electronic forms for data entry; these forms replace the 1,000 paper forms used previously, helping to eliminate errors due to handwriting misinterpretation or blank fields. The database holds information for about 2 million patients and grows by 12,000 documents per day as new test results, doctor's notes, and other patient-related data are added. PODS includes a document repository and metadata repository, pairing medical record files with a DB2 database. PODS stores medical record image files on file servers; it uses an IBM DB2 database to store corresponding metadata and a network attached storage array for the image files, including PDFs and text.
When a document enters the system, it is stored in the file server and indexed in the DB2 database. Documents are not deleted because the PODS repository also serves as an archive of patient data. To ensure survivability and high availability for 24x7 operations, the PODS architecture includes redundant servers and databases, with data replication to synchronize between database servers.
The DB2 database metadata store currently contains 30 million rows of information. The metadata is stored as XML, using the DB2 9 native XML engine. According to Dr. Charles Wang, architect manager at UCLA Medical Center Computing Services, the more than 400 schemas used for PODS comply with the W3C XML Schema language. The PODS software maps those different schemas into a single, virtual schema for the entire system. The system uses a four-key composition to create a unique identifier for a document paired with its metadata.
To safeguard patient privacy, the PODS design uses a multilevel security model. Besides built-in DB2 security capabilities, the software architecture includes a document and metadata handler that is integrated with a security service. It also features role-based security and a single sign-on capability. Security, concurrency control, parallel processing, and versioning are but a few of the advantages of using DB2 for storage of XML schemas and documents, as opposed to dealing with those problems on an ad hoc basis when using a file system to manage XML documents and schemas.
UCLA Health System builds on SOA
The PODS implementation is a good example of how an SOA enables disparate applications to use essential services- in this case, services for accessing patient information. Documents enter the system via a document service interface and are placed in queues. The PODS architecture includes IBM WebSphere MQ for asynchronous messaging and queuing. For HL7 messaging, the UCLA Health System uses Sun SeeBeyond eGate Integrator which, according to Dr. Wang, is "the enterprise-wide standard for all application interfaces."
The PODS queue manager operates with an input queue, exception queue, and replication queue. The standard services for managing metadata and image files provide application programming interfaces (APIs) for uploading, downloading, querying, and updating documents. The system supports auditing by generating a report of all activity against the database except uploads.
The evolution of PODS
The UCLA Health System PODS implementation supports access to patient documentation with DB2 pureXML capabilities for loading, querying, and updating data. It provides a set of Web services interfaces that enable clinical systems to upload and query data.
The most recent versions of the system are PODS3 and PODS4. Both were built on an SOA, but use different DB2 capabilities for XML processing. For XML messaging in the form of SOAP-based Web services, PODS3 and PODS4 use a combination of Systinet and IBM WebSphere software. However, the PODS4 implementation marked a transition from the DB2 XML Extender to the pureXML capabilities of DB2 9. For example, DB2 9 introduced support for a feature defined by the SQL:2003 standard, an XML column type treated as a first-class data type. You can use the XML type in Data Definition Language (DDL) statements, functions, and stored procedures.
Another benefit of the move to DB2 9 is the hybrid storage engine and a query optimizer that "understands" XML (mapping to relational algebra for queries involving XML). The migration to PODS4 did not change the PODS functional requirements, but pureXML processing simplified metadata processing while meeting the system's response time and scalability goals.
The upload process illustrates the difference between PODS3 and PODS4. When there was an upload of a PDF document, for example, the PODS3 upload process stored the document and an XML metadata file on the EMC file server. It decomposed the metadata for use by the DB2 SQL storage engine as an XCollection, a type implemented by the DB2 XML Extender. The upload also validated the schema using the IBM WebSphere Application Server parser and indexed the path to the PDF and XML files in the DB2 database. The DB2 transaction associated with an upload included generating a unique document ID, logging the upload in the activity history table, and executing a SQL INSERT into 18 tables. The PODS4 upload process treats the XML document metadata differently. Instead of storing metadata across 18 tables, the PODS4 upload process saves it using DB2 columns of type XML and does an INSERT query into four tables.
Simplified processing and standards compliance
Moving from PODS3 to PODS4 with DB2 9 greatly simplified the UCLA Health System's database administration and replication tasks. The PODS3 architecture used DB2 8 with the XML Extender, and a database consisting of 28 tables. The PODS4 database design, using XML columns, required only 10 tables and eliminated 20 stored procedures used by PODS3.
Because the UCLA Health System uses XML for patient metadata, supporting a new electronic form in the PODS collection involves creating the data definition or schema for form content. DBAs dealing with a constant stream of new forms want efficient solutions for defining new types of data, such as test results. Moving to DB2 9 reduced the amount of time required to add new forms and schemas to the system: supporting a new form in PODS4 typically takes two hours, compared with two weeks in PODS3.
In any industry, standards are essential for interoperability and efficient data interchange. XML has become a powerful tool for healthcare providers in part because it offers an effective tool for markup and for defining vocabularies for data interchange and archiving. However, robust applications require a reliable data management infrastructure. PODS illustrates how one healthcare provider is addressing the challenge of creating, storing, and exchanging electronic medical records. With PODS, UCLA Health System-like other healthcare institutions- has embraced XML technology and started down a path that leads to sophisticated electronic medical records, compliance with HIPAA guidelines and HL7 standards, and increased productivity.
HIPAA and HL7
HL7 has been developing standards designed to address HIPAA legislation since 1996, when it formed a Claims Attachment working group to standardize the information needed to process insurance claims. In that same year, HL7 began actively working with XML through its SGML/XML special interest group. The initial deliverable was six recommended attachments for claims processing.
Since then, HL7 has developed a specification for messaging, a Clinical Document Architecture (CDA), and a Reference Information Model (RIM). HL7 has also worked on a standard for electronic submission for CDA Public Health Case Reporting (PHCR) to state and local public health departments. As the standards matured, XML became an increasingly important technical component. For example, the first version of the CDA defined an XML architecture for exchange of clinical documents based on XML Document Type Definitions (DTDs) included in the specification with semantics defined using the HL7 RIM and HL7 registered coded vocabularies. The upcoming version 3 release of the CDA is expected to use only XML encoding.
"IBM DB2 native support of XML allows it to store content in the healthcare industry-standard HL7 CDA format," says Karla Norsworthy, vice president, software standards at IBM. "IBM is committed to healthcare interoperability and innovation through open standards. We have seen the benefits of flexibility, time to market, and innovation that come from widely adopted, open standards such as Java, XML, and healthcare standards developed in organizations such as HL7."