Handling the spectrum of unstructured data - the missing pieces
One of the key goals for many companies in using XML has been to enable the processing of a wide variety of unstructured data that cannot be effectively stored in a conventional relational database (RDB). This goal is an important one, since most of the data in an enterprise is not structured and cannot be stored in a normal RDB. In fact, one survey found that only 10% of the total information in an enterprise is managed in an RDB. This fact implies that 90% of the information in an enterprise is not being systematically managed or utilized. Most organizations cannot use such unstructured data efficiently.
Many XML proponents expected that XML would soon help achieve the goal of converting this unstructured data into a form that could be managed and utilized. Until now, two important pieces to achieve this goal were missing:
- A database that could store XML data and manipulate it as XML data
- A comprehensive tool for developing native XML applications
In the summer of 2006, however, the two missing pieces became available in the form of the third-generation database IBM DB2 9 for (code name "DB2 Viper"), and the native XML application platform "xfy" from Justsystems.
The DB2 9 hybrid database for XML data
DB2 9 is a hybrid database. In addition to having the functionality of an RDB, DB2 9 also has the functionality of an XML database (XMLDB). DB2 9 can store XML data as actual XML data, without change. This is in contrast to previous RDBs, which usually stored XML data by fitting it into a specified table structure, and simply handling the data as a large character string.
DB2 9 can use SQL and XQuery to search relational data and XML data. In addition to being able to use XQuery to search relational data and SQL to search XML data, users can issue a single query to search both relational data and XML data at the same time.
In contrast to single-schema RDBs, DB2 9 can handle schemas very flexibly. For example, DB2 9 can store XML data from multiple schemas, or store schema-less XML data. This schema flexibility enables one search query to search through different XML data, such as XML data with different schemas. Such multi-source searches are often referred to as "federated searches".
One search - multiple data sources
DB2 9 enables federated searches on data with different schemas. This greatly expands an enterprise's ability to use data not effectively storable in traditional RDBs.
For example, consider the various types of data used by an enterprise, such as, email, orders, or proposals. In the past, this kind of qualitatively different data was often not even stored in the same RDB and, if stored, the RDB could only provide low-level search capabilities. Now, however, data in XML format can be stored in a DB2 9 database, and all the data can be queried. This enables users to search multiple types of data in a single search. For example, all the data relating to a single customer can be extracted, even though the data was structured with different schemas.
Figure 1. Accessing multiple schemas in a single query
Observant readers might suspect that the XML data returned from a federated search will be in various unpredictable XML vocabularies. This is correct. And this leads us to the second missing piece: xfy, a platform aimed at processing unpredictable, unknown XML vocabularies.
The xfy application platform: Extracting the value inherent in XML
xfy is a platform for building and executing native XML applications. It handles all types of XML data, even XML compound documents, regardless of how little is known about the XML vocabularies. xfy is implemented in Java™, so it can run in a variety of environments such as Windows, Linux, and Mac OS X.
xfy is able to analyze unknown XML vocabularies returned from a database, and to automatically generate appropriate views for the results. Unlike previous database applications, xfy can display unknown XML vocabularies without requiring any programming by the user. Also, xfy has a rich set of components used for data visualization (for example, scatter charts and spreadsheets). These components provide powerful tools for displaying XML data in useful visual formats.
Figure 2. Data visualization in xfy
You can use xfy to create compound XML documents in which XML data returned from a database is embedded in an XHTML document. xfy can build XML documents that contain the actual search query issued against a database, as well as the search results returned from the database. In effect, this enables xfy to make live XML documents. The search results in the XML document will be updated whenever the database is updated, and the XML document itself will be regenerated when the search query is edited.
The right information to the right person at the right time
In addition to displaying search results, xfy can also generate and issue search queries without requiring programming by users. Users can use a GUI to generate and issue search queries (regardless of whether the target is relational data or XML data), which enables the right information to be delivered to the right person at the right time.
In addition to generating and issuing search queries without programming, xfy also enables views to be defined without programming. Using a GUI to define a view for an existing XML schema, a developer can develop programs and make modifications repeatedly in short development cycles.
Figure 3. Creating an XML DB query
Figure 4. XML DB query
xfy can use XML data as is, without converting it to some other type of data. Using the combination of DB2 9 and xfy greatly simplifies application development, both on the server side and the client side.
Figure 5. DB2 9 and the xfy platform
Synergistic benefits from pureXML with native XML
The arrival of DB2 9 breathes new life into the 90% of enterprise data that is unstructured and not storable in an RDB. Up untill now, this type of data has been effectively dead in the enterprise: neither managed, nor utilized. From now, however, we can expect major changes in the value of such information and in how it is used. Also, the arrival of xfy will dramatically simplify XML application development. Developers will be able to quickly develop information structures to match requested changes to specifications, and will be able to provide really easy-to-use systems that match user needs.
The two products pureXML DB2 9 and native XML xfy are the ideal combination for creating true XML solutions and providing users with the full value inherent in XML. The next article in this series will help you jump in and create an XML document application using xfy and DB2 9.
Learn
- Refer to
"Enable C++
applications for Web service using XML-RPC" a step-by-step guide to exposing C++
methods as services.
-
In the
Architecture area on developerWorks, get the resources
you need to advance your skills in the architecture arena.
-
Browse the
technology bookstore for books on these and other technical topics.
Get products and technologies
-
Download
IBM product evaluation versions
and get your hands on application development tools and middleware products from DB2®, Lotus
®, Rational®, Tivoli®, and WebSphere®.
Discuss
- Participate in the discussion forum.
-
Check out developerWorks
blogs and get involved in the
developerWorks community.
Hideki Hiura is chief scientist at Justsystems, Inc. He is a founder and chairperson of OpenI18N.org/Free Standards Group, an independent, nonprofit organization dedicated to accelerating the use of free and open source software by developing and promoting standards. He is also a founding member of W3C I18N WG. Prior to Justsystem Inc., he was an architect at Sun Microsystems, where he was involved with variety of standards and standard organizations, including ISO, W3C, OMG, The Open Group, OSF, Unix International, X Consortium and Unicode.
Comments (Undergoing maintenance)





