Get started with Industry Formats and Services with pureXML

A fastpath to storing your industry XML content in DB2

The industry bundles for pureXML™, published on alphaWorks, illustrate access to content stored on IBM® DB2® in XML, through small script or Java™-based applications. They are focused on populating, validating, and querying XML content that is pertinent to a specific industry. A pureXML industry bundle is typically a downloadable zip file that contains sample XML messages (XML instance documents), and scripts or Java code that populates one or more DB2 pureXML tables with the XML messages. The scripts also issue validation and query requests on the XML messages, and transform portions of the XML messages into relational tables. In this article, you'll learn about the industry-specific XML exchange formats, and learn how you can easily get started with implementing these solutions by downloading the industry bundles and viewing the interactive demonstrations.

To download the pureXML industry bundles from the Internet, please go to the IBM alphaWorks site: http://www.alphaworks.ibm.com/tech/purexml. Select the Download Now option to retrieve the bundles. There are accompanying interactive demonstrations that illustrate access to stored XML content through a variety of interfaces such as regular Web browsers, feed readers, and XForms viewers. In addition, a selection of RESTful and SOAP-based Web services are exposed that enable access to the XML content.

Susan Malaika (malaika@us.ibm.com), Senior Technical Staff Member, IBM

Photo of Susan MalaikaSusan Malaika is a Senior Technical Staff Member in IBM's Information Management Group (part of IBM Software Group). Her specialties include XML, the Web, and databases. She has developed standards that support data for grid environments at the Global Grid Forum. In addition to working as an IBM product software developer, she has also worked as an Internet specialist, a data analyst, and an application designer and developer. She has also co-authored a book on the Web and published articles on transaction processing and XML. She is a member of the IBM Academy of Technology.



24 May 2007

Introduction

Industry formats provide agreed-upon ways to exchange information between and within companies. Typically industry consortia and governments define the structure of industry-specific XML exchange messages, and any constraints that need to be applied to them. Often consortia provide XML schemas to describe the structure of the messages. The following are just some examples of industry formats and associated consortia or institutions that are illustrated in the pureXML industry bundles or demonstrations:

As of this writing, IBM is the only major software vendor to offer pre-tested, industry-specific software bundles for its DBMS. These free packages are designed to help database administrators and application programmers quickly get started using DB2 pureXML technology to store, manage, and query XML data that conforms to popular industry formats. Scripts are provided to allow for easy customization and enhancements.

  • ACORD: The Association for Cooperative Operations Research and Development (ACORD) develops and maintains various electronic formats for the insurance, reinsurance and related financial services industries. ACORD formats encompass Life & Annuity, Property & Casualty/ Surety and Reinsurance industry segments.
  • CDISC: The Clinical Data Interchange Standards Consortium (CDISC) develops and supports XML formats that enable information system interoperability to improve medical research and related areas of healthcare.
  • FIX: The Financial Information eXchange (FIX) protocol is a messaging format developed specifically for the real-time electronic exchange of securities transactions.
  • FpML: The Financial products Markup Language (FpML) protocol is the XML format for electronic dealing and processing of Over The Counter derivatives.
  • GJXDM: : The Global Justice XML Data Model (GJXDM) is an XML format for criminal justice information exchanges, providing law enforcement, public safety agencies, prosecutors, public defenders, and the judicial branch with a tool to effectively share data and information in a timely manner.
  • HL7:The Health Level 7 (HL7 Edition 2006 v3) is a format for healthcare and is the interface standard for communication between various systems employed in the medical community.
  • HR-XML: The Human Resources (HR-XML) Benefits Enrollment schema supports enrollment and maintenance of human resources in tier-based coverage (such as medical, dental and vision), spending accounts (more commonly known as flexible spending accounts (FSA)), rate-based coverage (such as life, short term disability, and long term disability) and stock purchase plan coverage.
  • NewsML: The News Markup Language is an agreed way to describe news information content so that it can be distributed and reused widely on web sites and other media.
  • NIEM-MCJE: Information Exchange Model Minnesota Criminal Justice Event (NIEM-MCJE) - a common syntax for information exchanged for justice events related to criminal/justice activities, from an initial call for service, to the filing of charges with the courts.
  • MISMO: The Mortgage Industry Standards Maintenance Organization (MISMO) develops, promotes, and maintains voluntary electronic commerce standards for the mortgage industry.
  • MusicXML: MusicXML is an XML-based music notation file format designed for the interchange of scores, particularly between different score writers. The format is suitable for common western musical notation from the 17th century onwards, and is designed as an interchange format for notation, analysis, retrieval, and performance applications.
  • MDDL: Market Data Definition Language (MDDL) is an XML-based interchange format and common data dictionary on the fields that describes financial instruments, corporate events affecting value and tradability, and market-related, economic, and industrial indicators.
  • Tax Form 1120: The U.S. Internal Revenue Service (IRS) e-File Form 1120 (the electronic version of Tax Form 1120) is based on XML. The form is used by corporations filing their taxes. States in the U.S. and the IRS process these forms.
  • SVG: Scalable Vector Graphics (SVG) is a language for describing two-dimensional graphics and graphical applications in XML. For example, architectural diagrams can be represented in SVG.

Often industry format structures (and their schemas) evolve – usually every six months, occasionally more frequently, and sometimes much more frequently. Many organizations using the industry formats, or even their own internal XML formats, devote considerable programming effort to mapping these industry formats into relational data for storing in databases. Each time the format changes, new mappings must be devised and additional programming is required. By storing and manipulating the exchange data as XML, the programming is simpler, and the modifications needed each time the structure changes are reduced.

DB2 pureXML provides the ability to store, update, delete, query, and index well-formed XML. Users can retrieve entire XML documents or document fragments by incorporating XPath, XQuery, and SQL into queries. Users can also register XML schemas and instruct DB2 to validate XML documents against these schemas. The DB2 9 pureXML capability is part of DB2 9. It is also available through DB2 Express-C. The XML capability is also available on DB2 9 for z/OS®.

To download the pureXML industry bundles, go to the link in the Resources section.

There are accompanying interactive demonstrations that illustrate access to stored XML content through a variety of interfaces such as regular Web browsers, feed readers, and XForms viewers. In addition, a selection of RESTful and SOAP-based Web services are exposed that enable access to the XML content. The interactve demonstrations also can be accessed from the Resources section of this article.


The industry bundles with pureXML

The industry bundles make it is easy to store and query industry-defined XML exchange messages as a first step towards demonstrating the benefits of pureXML, for example, for auditing the messages, for speedy querying of the messages, and for exposing the messages through Web applications and feeds, as well as for exchange across organizations.

An industry bundle is composed of test scripts and XML messages to illustrate how to create, index, and populate an XML table, how to query the stored XML using XQuery or SQL/XML and return portions of XML, how to create views on the XML messages, and how to shred the XML into relational format. There are industry bundles for Windows®, Linux® and z/OS platforms.

Having installed the industry bundle, it should be relatively easy to embark on a proof-of-concept project with pureXML, or to make some initial decisions about how to use pureXML in an architectural initiative, or to take the first steps with a pureXML development project.

Figure 1. A pureXML industry bundle
A pureXML industry bundle

An industry bundle has the following benefits:

  • Helps software developers, systems programmers, and database administrators get started with DB2 9 in the context of an industry using the XML messages that are often used as an exchange format in that industry
  • Enables technical IT staff to quickly demonstrate to their colleagues the capabilities of the pureXML support in DB2, including the ability:
    • To store, index, and query XML easily without needing to convert the XML (shred) to relational format
    • To store, index, and query well-formed XML conforming to specific XML schemas in a uniform way
    • To query stored XML and relational data together in a straightforward way
    • To process stored XML as though it were relational so that existing tools and software can still be used
    • To shred XML messages into relational form where required

In summary, the industry bundle shows how an XML exchange format can also be the storage format for the XML data. Benefits of storing the XML data as it is exchanged include being able:

  • To find out what's happening in the system as soon as the XML messages arrive without waiting until the messages have been re-structured and reach other systems
  • To process the XML messages without re-structuring and re-mapping in the face of XML schema changes

Scenarios where storing XML is helpful

Figure 2. pureXML industry format demos approach
pureXML industry format demos approach

The industry formats and services demonstrations illustrate how well-formed XML messages can be stored and queried in a DB2 pureXML database (item 1 in Figure 2 above). They also show how a general purpose services layer (item 2 in Figure 2) can be created to enable access to the stored messages in a variety of ways by exposing a simple set of CRUD (create, replace, update, delete) and query services. Both RESTful and SOAP-based Web services are provided in the demonstration. REST (Representational State Transfer) is a style of building Web applications. In the diagram, these services are referred to as Universal "Quick" Services.

Other ways of accessing the data, such as through Atom feeds and XForms, are illustrated (item 3 in Figure 2). The XForms use the exposed generic CRUD and query services to access the stored messages.

Occasional reports scenarios

All the demonstrations expose simple services accessible through a Web browser to insert, retrieve, query, and delete the stored XML messages. There is a restore option too, in case someone deletes all the data. Moreover, there is an "Own Data" demonstration that you can use as your sandbox, for example,, to try out XQueries on your XML, or to produce Atom feeds on your XML. Note that the "Own Data" demonstration is everyone else’s sandbox too, so other people very occasionally may interfere with your activities. You can try these simple services by selecting the Data Management option in any of the demonstrations. Being able to issue arbitrary XQuery (or SQL/XML) requests on stored XML is helpful, if for example you are storing XML for audit purposes.

Service-oriented architecture scenarios

All the demonstrations expose the same SOAP and RESTful Web services for all the industry formats. You can view these services by selecting the HTTP binding option in any of the demonstrations. Web Services provide a way of describing and publishing a general purpose and agreed interface for accessing data and applications, through the WSDL (Web Services Description Language) notation. The Web Services approach provides loose coupling between clients and the data or applications being accessed and is an important enable for SOA. RESTful (non-SOAP based services) are popular for simple Web 2.0 based applications.

Web 2.0, mash-ups, and dashboard scenarios

All the demonstrations expose the stored XML messages through configurable Atom feeds. You can view the Atom feeds for the industry format you are interested in by selecting the Atom Feeds option in any of the demonstrations. Atom feeds provide an agreed-upon way for publishing summaries of changes to data and for interested parties to easily locate these summaries. Atom also makes it possible for general-purpose software readers to offer a human or programmatic interface to subscribe to changes, to be notified when the changes happen, and to review the changes. RSS is like Atom except it has not been standardized and thus has many variants. Feeds are often used in mash-ups and dashboards. You can watch how one of the demonstrations is used in a mash-up here: http://www.youtube.com/watch?v=ckGfhlZW0BY.

Web browser scenarios

You can view some of the demonstrations through third party viewers that support the format, such as SVG and MusicXML. Some of the demonstrations, for example, HL7, ACORD, MISMO, and Tax 1120, include a format-specific customized user interface, built with XForms and with the services exposed in all the demonstrations. XForms is an agreed way to enable a Web forms interface. An XForms form can load external XML documents, such as documents stored in pureXML, as initial data in the browser, and can submit the results to the server as XML.

By including the browser in the XML pipeline through XFORMS, you can have end-to-end XML, right up to the user's desktop. End-to-end XML eliminates data conversions, thereby reducing processing overhead and makes the modifications easier when the XML structures change.

To find out more about how to navigate through the demonstrations, please see the Getting Started with the Demonstration link in the Resources section.

A small note on how the demonstration is built

The demonstration uses DB2 pureXML columns (collections of XML) to store XML messages. Access to the stored XML is enabled through generic servlets, stored procedures, and a WebSphere® Application Server. The DB2 Web Service Runtime supports the deployment of Web services. It provides dedicated methods to access the data and the Stored Procedures of the database. For generating the Atom feeds a series of stored procedures are used. Feeds can be predefined or created on the fly. A DB2 table is used to store the information about the pre-defined feeds. The XForms are stored in the appropriate Web server directory.

The demonstration currently includes several industry formats, each stored in a DB2 pureXML column in a separate database table. XML validation is not enforced at insert time to allow for flexibility, but some of the demonstrations do provide a schema validation option. All the industry formats are manipulated in similar ways, such as through XForms, Web services, or Atom feeds. These types of manipulation illustrate the ease with which additional XML formats can be introduced into a system. If the structure of the XML format evolves, it can continue to be stored in the same table without any significant modifications, such as re-mapping.

Figure 3. End-to-end XML
End-to-end XML

Summary

This article provides an overview of the industry bundles and the demonstrations that are part of the "IBM Industry Formats and Services with pureXML" technology on alphaWorks. The industry bundles and demonstrations help architects, developers and database administrators get started with pureXML in the context of an industry.


Acknowledgment

Many thanks to Ronny Bartsch, Vijay R Bommireddipalli, Donald Buddenbaum, Anke Diderich, Kevin E Kelly, Jan Kratky, Henning Masuch, Jan-Eike Michels, Demai Ni, Mallarswami R Nonvinkere, Christian Pichler, Jeffrey Rodriguez, Vitor Rodrigues, Stefan Rybacki, Manoj Sardana, Andy B. Smith, Keith Wells, and others, who contributed to or helped make the industry formats and services with pureXML happen.

Resources

Learn

Get products and technologies

  • Download a free trial version of DB2 Enterprise 9.
  • DB2 Express-C download: Now you can use DB2 for free. DB2 Express-C is a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and includes the pureXML function.
  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2, Lotus®, Rational®, Tivoli®, and WebSphere.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=226241
ArticleTitle=Get started with Industry Formats and Services with pureXML
publish-date=05242007