When you design your databases, you must
decide whether your data is better suited to the XML model or the
relational model. Take
advantage of the hybrid nature of DB2® databases
that supports both relational and XML data in a single database.
While this discussion explains some of the main differences between
the models and the factors that apply to each, there are numerous
factors that can determine the most suitable choice for your implementation.
Use this discussion as a guideline to assess the factors that can
impact your specific implementation.
Major differences between XML data and relational
data
- XML data is hierarchical; relational data is represented in a
model of logical relationships
- An XML document contains information about the relationship of
data items to each other in the form of the hierarchy. With the relational
model, the only types of relationships that can be defined are parent
table and dependent table relationships.
- XML data is self-describing; relational data is not
- An XML document contains not only the data, but also tagging for
the data that explains what it is. A single document can have different
types of data. With the relational model, the content of the data
is defined by its column definition. All data in a column must have
the same type of data.
- XML data has inherent ordering; relational data does not
- For an XML document, the order in which data items are specified
is assumed to be the order of the data in the document. There is often
no other way to specify order within the document. For relational
data, the order of the rows is not guaranteed unless you specify an
ORDER BY clause on one or more columns.
Factors influencing data model choice
What
kind of data you store can help you determine how you store it. For
example, if the data is naturally hierarchical and self-describing,
you might store it as XML data. However, there are other factors that
might influence your decision about which model to use:
- When you need maximum flexibility
- Relational tables follow a fairly rigid model. For example, normalizing
one table into many or denormalizing many tables into one can be very
difficult. If the data design changes often, representing it as XML
data is a better choice. XML schemas can be evolved over time, for
example.
- When you need maximum performance for data retrieval
- Some expense is associated with serializing and interpreting XML
data. If performance is more of an issue than flexibility, relational
data might be the better choice.
- When data is processed later as relational data
- If subsequent processing of the data depends on the data being
stored in a relational database, it might be appropriate to store
parts of the data as relational, using decomposition. An example of
this situation is when online analytical processing (OLAP) is applied
to the data in a data warehouse. Also, if other processing is required
on the XML document as a whole, then storing some of the data as relational
as well as storing the entire XML document might be a suitable approach
in this case.
- When data components have meaning outside a hierarchy
- Data might be inherently hierarchical in nature, but the child
components do not need the parents to provide value. For example,
a purchase order might contain part numbers. The purchase orders with
the part numbers might be best represented as XML documents. However,
each part number has a part description associated with it. It might
be better to include the part descriptions in a relational table,
because the relationship between the part numbers and the part descriptions
is logically independent of the purchase orders in which the part
numbers are used.
- When data attributes apply to all data, or to only a small subset
of the data
- Some sets of data have a large number of possible attributes,
but only a small number of those attributes apply to any particular
data value. For example, in a retail catalog, there are many possible
data attributes, such as size, color, weight, material, style, weave,
power requirements, or fuel requirements. For any given item in the
catalog, only a subset of those attributes is relevant: power requirements
are meaningful for a table saw, but not for a coat. This type of data
is difficult to represent and search with a relational model, but
relatively easy to represent and search with an XML model.
- When the ratio of data complexity to volume is high
- Many situations involve highly structured information in very
small quantities. Representation of that data with a relational model
can involve complex star schemas in which each dimension table is
joined to many more dimension tables, and most of the tables have
only a few rows. A better way to represent this data is to use a single
table with an XML column, and to create views on that table, where
each view represents a dimension.
- When referential integrity is required
- XML columns cannot be defined as part of referential constraints.
Therefore, if values in XML documents need to participate in referential
constraints, you should store the data as relational data.
- When the data needs to be updated often
- You update XML data in an XML column only by replacing full documents.
If you need to frequently update small fragments of very large documents
for a large number of rows, it can be more efficient to store the
data in non-XML columns. If, however, you are updating small documents
and only a few documents at a time, storing as XML can be efficient
as well.