Introduction: DB2 pureXML and XML data
Businesses are increasingly making use of the pureXML capabilities in DB2 for storing, managing, and querying XML data alongside the traditional text and numerical relational database data. DB2 pureXML allows storage of XML with or without schema validation, creation of indexes on individual elements or attributes in XML documents, and retrieval of all or portions of XML documents based on the values within the elements and attributes. These pureXML capabilities have allowed companies to reduce the amount of application code necessary for processing XML, eliminating significant parsing and conversion of XML into other formats to fit relational structures. Applications can now receive, transmit, store, and retrieve XML without intermediate conversions. This streamlined flow is often referred to as end-to-end XML processing(illustrated in Figure 1).
Figure 1. End-to-end XML processing
The use of this powerful XML capability within the database creates a need for tools to assist database users with creating, viewing, and editing raw XML for development and test purposes, for constructing queries and indexes against XML data, and for dealing with XML schemas and other related objects. There are many tools from IBM, as well as outside IBM, that have XML capabilities included in their function list. This article outlines some of the common tasks database users may need to perform with XML data and identifies IBM tools that can assist with these tasks. The tasks are grouped by typical database roles found in today's IT organizations, and the tools that best match the related XML needs are identified.
IBM tooling with XML functionality for DB2 users
IBM offers end-user tools to manage data across the life cycle, including such tasks as database operation and maintenance, tuning, data architecture/design, and application development. Let's take a look at the IBM tools referenced along with a brief description of their primary functionality and the XML-related capabilities:
- DB2 Control Center (free with the DB2 database)
- DB2 system and database object management; provides an interactive query interface using the integrated Command Editor.
- XML: Create and manage XML database objects, and query and view XML data.
- Note: DB2 Control Center has been deprecated, which means it will not be further enhanced and will be removed in a future release, replaced by functionality in other tools.
Figure 2. DB2 Control Center and Command Center
- IBM Optim® Development Studio (ODS) and Data
- Data development,including the creation and execution of queries (SQL and XQuery) and stored procedures, development and deployment of data web services, and support for Java™ development for data access.
- XML: XML/Schema editor, XML mapper, web services, Database Explorer, Schema registration, generation of XML schema from XML.
Figure 3. Optim Development Studio
- IBM InfoSphere™ Data Architect (IDA)
- Logical and physical data modeling - design databases; discover, relate, integrate, and standardize diverse data assets; some model transformations (UML/LDM/PDM).
- XML: Data Modeling, XML schema transformations.
Figure 4. InfoSphere Data Architect
- IBM Rational® Application Developer (RAD) for
- Application development - provides tools related to Java development (EJBs, JPA, JSF, and so on), testing, and deployment, including web services, portal and SOA applications. Includes an integrated WebSphere test environment.
- XML: XML/Schema/XSL editor, XML Mapping editor, Database Explorer; generate relational table definition from XML Schema.
Figure 5. Rational Application Developer
- Rational Software Architect (RSA) for WebSphere
- Model-driven design and development; includes all RAD features, along with modeling and architecture tooling.
- XML: Contains the XML tooling of RAD, and modeling and architecture functions (transformations, modeling, UML, and so on).
Figure 6. Rational Software Architect
- InfoSphere Warehouse Design Studio
- Graphical tool for creating physical data models, DB2 SQL-based warehouse construction data flows, and OLAP cube models.
- XML: Mapping XML source data into warehouse definitions.
Figure 7. InfoSphere Warehouse Design Studio
IBM tools and XML summary
These IBM tools have some XML capabilities in common, and, in some cases, they share component modules. For instance, in each of these tools, you can accomplish basic editing of XML instances or schemas. In addition, all of the tools, with the exception of the DB2 Control Center, are Eclipse-based (an extensible open source development platform), so they all have a similar appearance and feel. The rest of this article helps clarify where one tool may be more appropriate for a given task. Many times, a tool recommendation is driven not by the single task, but by the broader context of the user's role and what else needs to be done. There are other tools from IBM that may include XML capabilities, but for brevity, this article focuses on those listed above.
There are also IBM products that complement DB2 pureXML for XML applications, such as IBM WebSphere Application Server's new Feature Pack for XML, IBM WebSphere Integration Developer, and IBM WebSphere DataPower. This integration is beyond the scope of this article, but they do contribute to integrating end-to-end XML applications. Users of these products may have similar needs for XML-oriented tooling. (More information about these topics is included in the Resources section.)
DB2 users and the impact of XML data
Information technology professionals in separate roles within companies have different, although related and sometimes overlapping, responsibilities. To a great extent, an individual's role determines what tasks he will need to perform with XML data that is in the database. For instance, some professionals will need to create or modify XML data, while others will simply need to understand its structure to be able to effectively query it. Some of these XML tasks will need to be performed by many professionals (for example, view XML data), while others will be specific to a particular role. In some cases, a single tool may provide all the functionality to meet the needs of a given set of responsibilities and may even meet the needs of multiple roles. In other cases, it may require multiple tools to meet the varied needs of a particular role. This section outlines a set of information-processing professional roles and their XML data-related needs. (Note: Although corporations may define or categorize their roles differently, each of these activities will need to be covered by one or more employee groups.) In a subsequent section, we identify the IBM tools that have capabilities to meet those needs.
Database administrator (DBA)
A DBA is responsible for the operation of an organization's database. Tasks include planning and administering the database, tuning for performance and capacity, creating and maintaining indexes and optimizing queries, monitoring, and managing backup and recovery.
XML tasks: Create and maintain tables and views with XML data; create and tune XML indexes; query and view XML documents; store and manage XML schema documents.
Database developer (database programmer)
A database developer creates and optimizes database access, including queries, stored procedures, data web services, and reports, sometimes covering both transaction (OLTP) and warehouse-oriented data (OLAP). In some organizations, this role is performed by a DBA or a software developer.
XML tasks: XML query (SQL/XML, XQuery) and stored procedure development; XML warehouse mapping and population.
Software developer (application programmer)
A software developer is responsible for designing and developing applications or web services to store, access, and retrieve data from databases, and connect to other business applications. Those in this role often use Java, but may also use other high-level or scripting languages.
XML tasks: Write application programs or web services that efficiently store and retrieve XML data from the DB2 database; consider XML, opposed to relational structures.
Data architect (database analyst)
A data architect is responsible for determining and documenting data flows and the structure of data processed and stored, often using data modeling techniques for logical and physical data structures.
XML tasks: Design database structures, including XML columns or relational columns with data extracted from XML (map XML to relational).
Software architect (application architect)
A software architect is responsible for the high-level business modeling and determining architecture for applications that implement the business models. They design application flows and architectures, representing business processes in application design.
XML tasks: Author and maintain XML schemas, XML documents, or both that represent the business, and determine any needed data mapping or transformations.
This article discusses more about these roles and outlines the tools that can help meet their XML needs in the "Database user roles and XML needs" section.
Examining key XML tasks and tool capabilities
Before delving further into the database roles and the related XML needs, let's pause to first describe a few of the common key XML responsibilities that surface when XML data is part of the database. These responsibilities are grouped into database-related, XML native, and application-related categories. This is not intended to be an exhaustive list (there are certainly many other XML activities database users might engage in), but a foundational explanation of the most common XML-related tasks for database users.
Database object-related XML tasks
These types of tasks are related to creating and managing objects in the database, such as tables and indexes.
- Define the database structure to manage XML data
This task involves defining the database schema (tables, columns) to store XML data in a relational database. Fortunately for DB2 users, creating tables to contain XML data is a very straightforward process since XML has become a full-fledged data type in DB2 9 (LUW and z/OS). This means that the same commands and tools that are used to create tables with other data types can also be used to create those containing XML columns. For example,
create table mytable (idcol int xmldoc xml)would create a table with one integer column (
idcol) and one XML column (
xmldoc). Other considerations for tables and databases, such as tablespace definition and bufferpool allocation, are also managed the same for XML as other relational data types, so no new tool requirements are created. It is essential, however, that current tools that database professionals use to perform these tasks have been expanded to include the same functionality for XML data. For DB2 LUW and DB2 for z/OS, XML support has been incorporated in these tools, including DB2 Control Center, Command Line Processor (CLP), or, for z/OS, SPUFI (SQL processor using file input). Finally, if modeling has been done, IDA contains tooling to map logical or physical data models to XML schemas, which can be represented as an XML column in DB2.
- XML-to-relational and relational-to-XML mapping and
There are times when you may need to convert data from one format to another for various reasons. For instance, a business may receive XML from a partner but needs to integrate it into applications that access individual element values, so a tool is needed to map from XML to relational. Or, XML documents stored in the database may need to be made accessible to reporting tools that only recognize columns of discrete data and cannot process or format XML, so a conversion mechanism is needed. Conversely, an application might need to transmit XML format from data sourced in relational columns, so a relational-to-XML map is needed.
ODS contains a specialized XML-to-relational visual mapper for building annotated schemas that identify how a given XML structure will be mapped to relational tables and columns. This annotated schema is then loaded into the database and used to transform incoming XML into the defined relational columns. Other mapping, such as XML-to-relational or relational-to-XML, may be done using the powerful SQL/XML syntax, but these are code-level constructs, not visual tools. Tools like Altova's MapForce are created explicitly for visual mapping from one data format to another, including XML and database formats.
- XML query development and guidance
For many database and application users, SQL is a familiar query language, but the SQL/XML extensions and XQuery are both new to them. For these users, tools that provide a development environment or guidance for building SQL/XML or XQuery queries are needed. This need can be extended to supporting the development of stored procedures to operate on XML data. A related need is for a tool to identify the full XPath for a given element or attribute in an XML document (these documents can get to be large, and it is very easy to incorrectly type a part of the path by mistake, or type lower case when upper is expected, and not realize it until debugging when queries don't produce the expected results).
DB2 Control Center's Command Editor provides a facility to construct and execute XML queries against a database, and view XML results in a viewer. ODS and RAD/RSA tools also provide a query development environment that supports SQL/XML and XQuery, including basic syntax checking and query outlining, as well as the ability to execute against a database, display XML results, and build stored procedures (see Figure 8). These tools also have a basic XML path discovery capability, but it is associated with the XSLT editor, so it is not really useful for query development. Beyond IBM, XML-oriented tools, such as XML IDE products, provide this capability in their editor products.
Figure 8. Creating an XQuery statement in ODS
- Create or modify an index on XML data
Once XML documents are stored in an XML column in DB2, and applications begin to determine common usage patterns, you may want to create indexes on frequently accessed elements or attributes in the XML documents to improve query response time (much as with relational data). DB2 pureXML allows indexes on any element or attribute of XML documents.
The most straightforward way to create indexes is using a command line interface that allows queries to be run. You can use the DB2 Control Center (Command Center) and Optim Development Studio command interfaces for this purpose. There is a wizard in DB2 Control Center that guides users through the steps to create an index on an XML column in a DB2 table. Index creation wizards in other tools aren't yet able to handle the definition of the XPath for XML index creation.
- Store XML schema documents in DB2's XML Schema
DB2 pureXML provides the ability to validate XML documents against an XML schema. To make this possible, the schemas must be loaded (registered) into DB2's XML Schema Repository (XSR). While the commands to load a schema into the XSR are straightforward, unfortunately schemas themselves are often complex, consisting of many files linked together, in worst cases, reaching hundreds. Loading such multi-file schemas can be tedious and error-prone (did I already load schema file 33?), so users look to tools to assist with identifying all of the component files and automating or scripting the load process.
IBM Data Studio and ODS (as well as RAD and RSA) provide a facility to aid in registering XML schemas in DB2's XSR. These tools automatically discover one level of imports or includes, but files part of further nesting will need to be manually loaded. The tool can either register directly or create the script to be run later. More sophisticated XML schema tools can be found in XML IDE products, some of which connect to DB2 and guide the registration process, such as XMLSpy or oXygen.
Native XML-related tasks
These tasks are oriented towards creating and editing XML objects, somewhat independent of the ultimate database usage.
- Author and view XML documents
XML data needs to originate somewhere. The responsibility may fall to a database user to create XML documents (instances) or to create the XML schemas that define the structure of the XML documents. Furthermore, once the XML schemas or instance data are stored in DB2 pureXML, users executing queries against this XML data may return (formatted) XML documents that will need to be viewed in a clear (formatted) manner.
The tools that view or edit XML documents are commonly referred to as XML editors (or viewers, if read-only). Often, these tools have color coding for syntax highlighting and provide automatic completion of text or formatting to ease reading. A schema editor is a special variation of an XML editor that recognizes XML schema (XSD) formats. Other special "smart editor" variants may include XSLT, DTD, and WSDL. Some editors may even provide wizards to guide users through the document creation steps.
DB2 Control Center provides an XML viewer and simple editor, whereas Optim Development Studio and Rational Application Developer provide more sophisticated XML and schema editors.
- Ensure well-formed or valid XML
DB2 pureXML is very flexible in being able to store XML documents of varying sizes and structures in any XML column. The only requirement to store an XML document in an XML column in DB2 is that it be well-formed. DB2 itself will verify the well-formedness of XML documents that are inserted, rejecting those that don't meet the requirement. DB2 can also optionally verify that an XML document is valid with respect to a given XML schema. But there may be times when a database or application person wants to check a given XML document or documents for well-formedness or validation prior to inserting into DB2.
There are a few alternatives for tooling to help with these XML checks. One option is simply to create a test database table and let DB2 do the checking with its built-in parser. Another alternative to check if a document is well-formed is to simply open the XML document in a browser (Internet Explorer, Firefox), and it will show an error if problems exist. A better choice is to open with an XML editor, including those in the IBM tools, which can show the error and allow you to fix the problem. XML editors can also validate an XML document against a schema if you have it available.
- XML schema development
At times, a business receives XML from a partner or customer as part of an agreed-to exchange format or industry standard. Other times, someone within the company needs to create an XML format that will model the important data that represents the business transaction or activity. This data, when XML, is normally captured by an XML schema document (XSD), which defines the required and optional content, structure, and controls or limits of the data in each conforming XML document. Tools to assist the creation of valid XSD documents and to create sample XML instances from the XSD will make this job easier. XML editors, enhanced to recognize and guide XSD syntax, are examples of such tools.
- Map and transform XML
There are times that you need to create a sample XML document instance from a given XML schema (XSD), a schema from an XML instance, or both. Other transform examples involving XML are XML-to-XML and XSD-to-XSD.
Fortunately, most XML editors, including those in the IBM tools referred to in this paper, have this capability built-in. ODS also has capabilities for Java database application development and XML query development.
- Generate test data from instance or schema
Once a business has decided to store XML data, they know that DB2 can process and store any XML structure as long as it is well-formed. Often, the database or application team has an XML document example (instance) or a schema but wants to create a varied set of data for query and application testing (for example, create 1000 rows of varying XML content based on a single XML document or schema). What is needed is a "smart" XML editor or wizard that allows creation of many XML instances from a single instance (or schema) by enabling identification of specific elements and attributes that should be supplied with random/varying data or may be present or absent.
One could use the XML editors that are part of IBM tools to create test data by editing individual instances. This could be quite tedious if the desire is to have hundreds, or more, test documents. The best source for large-volume XML test data generation tooling would be tools that specialize in XML (rather than database or applications). A few of the XML IDEs mentioned below have recently come out with test data capabilities.
These tasks involve developing applications that transmit and access XML data that is stored in a database.
- Design and develop applications to store and query XML in
The ability to store and query XML in DB2 enables new or enhanced application opportunities. Developers need to create applications that utilize SQL/XML or XQuery to store, manipulate, and query XML data in DB2. This can involve design and development tooling specific to XML data handling or conversion.
RAD is the tool for developers, and it includes the ability to review and edit XML, and build and test XML-based queries for DB2. ODS provides tooling to help Java developers interact with the database, including queries and stored procedures against XML data.
- Design and develop web services to process, store, and query XML in
Similar to the prior section, this time developing Web services, which utilize SQL/XML or XQuery to store or query XML data in a DB2 database. These specialized web services are sometimes referred to as Data web services (DWS). Specific needs within this area would be to provide web services interfaces, such as in the form of a Web Services Description Language (WSDL) or to verify web services endpoints for message connectivity.
Both ODS and RAD have web service creation tooling. For example, ODS enables the creation and deployment of a web service from a query or stored procedure using drag-and-drop features.
Summary of XML Tasks and Tools
The table below provides a summary view of the XML capabilities of the various IBM tools reviewed above. The table shows that some capabilities are provided by all or most of the tools while others are specific to one or two tools.
Table 1. XML capabilities mapped to IBM tools
|XML capability||DB2 Control Center 9.7||Data Studio IDE 2.2||ODS 2.2||IDA 7.5.2||RAD 7.5||RSA 7.5||ISW Design Studio 9.7|
|XML editors||XML Editor (text, syntax format)|
|Schema (XSD) Editor|
|Annotated Schema Mapper|
|Database||Connect to DB2|
|DB Explorer (db objects)|
|Create database, tables (XML)|
|Create index (XML)|
|Export and import XML|
|Extract data - query XML|
|Query Editor (SQL, XQuery)|
|Stored Procedure Builder|
|Data Web Services Developer|
|Correlate SQL to Java code|
|Register XML schema to DB2|
|Visual Explain on XML queries|
1 Tool can manually create index (not with wizard), but will not show in Database Explorer.
Database user roles and XML needs
Often individuals and sets of XML responsibilities are assigned fall into groups according to individuals' roles within the company. For that reason it is often best to first look at the IBM product oriented to the individual's overall responsibility when searching for XML tooling assistance. The table below summarizes the XML task responsibilities of five data management roles, and identifies the IBM product tooling aligned most closely to each role.
Table 2. XML tasks mapped to database user roles
|XML tasks||Database administrator (DBA)||Data developer||Software developer||Database architect||Software architect|
|XML tables, views|
|Manage XML schema in DB2|
|XML-to-relational||XML-to-relational, or other||Data-model-to-XSD transforms||Data-model-to-XSD transforms|
|Role-aligned product tooling||DB2 Control Center (or ODS)||Optim Development Studio||Rational Application Developer||InfoSphere Data Architect||Rational Software Architect|
As stated earlier, there are a variety of responsibilities surrounding the creation, use, and maintenance of XML data, and each come with a corresponding set of tool needs. The next section examines how to choose a tool to meet XML task needs and relates the guidance to the identified database roles.
Which tool to choose
With so many tools and overlapping capabilities, it may be challenging to figure out which tool to choose for a given task when more than one has the needed functionality. There probably is no definitive "right" tool, but there are two factors that should play a big part in which tool you choose for a specific task:
- Are you already familiar with one of the tools that provides this functionality and perhaps have one already installed? When in doubt, stay with what you know. Even though a new tool might provide better or easier-to-use capability, it is unlikely to overcome the time needed to purchase, install, and learn the new tool.
- What other tasks are you going to do as part of your broader role? For instance, if you need to view or edit XML documents and beyond that will primarily develop Java applications, you might consider RAD. On the other hand, if you have the same need, but you primarily deal with database objects (tables, queries, and so on), then you should consider Optim Development Studio or simply use Control Center.
The above guidance works quite well if you have an occasional need for an XML tool capability in the context of a broader set of responsibilities.
If, however, a set of XML tasks becomes your primary responsibility for some period, then it makes sense to prioritize the specific functionality over the other capabilities that a tool provides. In those cases, you should devote some time to research the most important functionality you need and determine which tools will make that job easiest. In those cases, it might make sense to consider a tool specifically built for handling XML, if not for that explicit task.
XML tool recommendation based on database user roles
Now that we have reviewed what is involved in tasks that surface when database users deal with XML data, let's return to look at the various roles and their XML activities, and highlight the tools that best fit their XML needs.
Database administrator (DBA) - database operations
- XML tasks: Creating and maintaining tables and views with XML data; creating and tuning XML indexes; querying and viewing XML documents; storing and managing XML schema documents.
- General tool guidance: DB2 Control Center, Optim Database Administrator.
Database developer (database programmer) - database access development
- XML tasks: Develop XML query and stored procedures; design XML warehouse.
- General tool guidance: Optim Development Studio.
Software developer - application development to store and retrieve XML (and other) data
- XML tasks: Write application code that efficiently stores and retrieves XML data from the DB2 database.
- General tool guidance: Rational Application Developer.
Database architect - database modeling and physical layout
- XML tasks: Design database schemas, including XML columns or relational columns extracted from XML (map XML to relational).
- General tool guidance: InfoSphere Data Architect
Software architect - design business models and application architectures
- XML tasks: Develop and maintain XML schemas and/or XML documents.
- General tool guidance: Rational Software Architect.
Non-IBM XML tooling
While this article outlined the XML capabilities in various offerings of IBM tooling, there are numerous products available outside of IBM that provide XML capabilities as well. These include both freeware and priced software. This article does not attempt to list features and functions of these non-IBM products, but it does summarize a few categories of tools and gives some examples of products offered in each category. This list is intended for awareness only. Inclusion or omission of specific products is not intended as an indication of a recommendation regarding a particular product.
- XML integrated development environment (IDE): These tools are built to provide an integrated suite of capabilities for authoring, maintaining, and converting XML data, including both instances and schemas. They often include, or have extensions for, XQuery and SQL/XML development. Examples of popular XML IDEs are Altova XMLSpy, SyncRO Soft <oXygen/> XML editor, and Progress Software Stylus Studio. There are also several freeware XML IDEs available. Some of these, such as XMLSpy and Oxygen, also provide direct connection to DB2 pureXML and other databases to extract XML data, run queries, and store schemas.
- XML editors: Similar to IDEs, but with more limited functionality focused around creating and editing XML documents, schemas, or both. Many XML editors, of varying complexity, are available for free. Examples of XML editors are XMLPad, XML Marker, XML Viewer, and XML Buddy. For a comprehensive (if not overwhelming) list of XML editors, see the list on XML.com (see Resources).
Database users who are being introduced to XML data at times need a tool to assist with tasks related to XML creation, viewing, editing, or transformation. These capabilities exist within existing IBM tools that have been enhanced to handle aspects of XML. This article has outlined a variety of such tasks and listed the IBM products that provide assistance with these tasks. When a database user needs to consider a variety of tasks related to XML, the choice of products to support those tasks becomes a bit more challenging. Often, no one tool provides all of the capabilities needed, so multiple tools might be required. It is usually best to choose the main tool by considering the primary role of the database user and by taking advantage of its XML capabilities as far as possible. Finally, if XML becomes a major part of an individual's tasks, this article has outlined some non-IBM products, such as XML editors and integrated development environments, which are built around providing XML capabilities for editing, viewing, and transforming XML data, and might be a good complement to the database-specific products reviewed here.
Glossary of terms
- IDE: Integrated development environment; a set of tightly integrated tools for development.
- Schema: In XML, a document describing a set of rules that XML documents must conform to.
- XML: eXtensible Markup Language; a language for flexibly encoding documents.
- XQuery: A query language for XML.
- XSD: XML Schema Definition language for describing XML schemas.
- IBM tools with XML:
- IBM XML solution products:
- More information about the non-IBM tools mentioned:
- XML integration scenarios with DB2 pureXML:
- "Exposing DB2 9 pureXML using WebSphere Integration Developer" (developerWorks, September 2007)
- "WebSphere DataPower and DB2 pureXML, Part 1: XML schema and content validation using DataPower and DB2 pureXML (developerWorks, June 2008)
- "Build an intelligent eForms solution based on DB2 pureXML, Lotus Forms, and Web services" (developerWorks, April 2009)
- "Programming XML Across the multiple tiers: Use XML in the middle tier for performance, fidelity, and development ease: Develop an all-XML solution using JDBC 4.0, SQLXML, and the WebSphere Server XML Feature Pack (developerWorks, March 2009)
- developerWorks Information Management zone: Learn more about Information Management. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- DB2 Express-C: Download DB2 Express-C, a free community edition of DB2 that includes pureXML capability.
- DB2 for Linux, UNIX, and Windows: Download an evaluation copy of DB2 LUW.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Participate in developerWorks blogs and get involved in the My developerWorks community; with your personal profile and custom home page, you can tailor developerWorks to your interests and interact with other developerWorks users.