Â© Copyright International Business Machines Corporation 2001. All Rights Reserved.
In this two-part article, I will describe the object-relational technology that is one of DB2's best kept secrets. Object-relational support provides the necessary "plumbing" that provides extensibility to relational databases. All members of the DB2 family support large object types, user-defined functions, and user-defined distinct types. The focus of this paper is on the newest technology based on user-defined structured types, which provides for the persistent storage of objects and includes other object-oriented features such as inheritance, the specification of behavior using methods, object identifiers, navigation by reference, and dynamic dispatch of methods (polymorphism). In addition, index extensions allow user-defined indexing specifications to be created on structured and distinct type data for improved performance. JDBC 2.0 and SQLJ Part 2 support provide a tighter integration between the programming environment and the database server. DB2's long term commitment to object-relational functionality is illustrated by the fact that many IBM technologies rely on that support, including content management, spatial data, and XML document search.
In Part 1, I introduce the object-relational features in DB2 with a special focus on user-defined structured types and type hierarchies, and on using tables to store instances of those types (such as typed tables). I also describe how you can use "object views" on legacy relational data. Part 2 will describe how structured type objects can be stored in columns, how methods can be invoked on objects, and on the SQLJ and JDBC extensions that will be useful for integrating these database objects into your Java applications.
A little history...
In 1995, DB2 delivered its first object-relational extensions, ahead of any other major database vendor. These extensions were provided to address the pressing needs of many applications to move beyond what a "normal" relational database could handle. Whereas the majority of transactional data could be mapped to the usual data types (money to decimal, names to character strings, dates to date strings), the arrival of imaging systems and text management systems meant that data values were very large. Video and audio data was even larger. It was very difficult to handle this type of data using the existing technology.
To address this application requirement DB2 Universal Database" (UDB), in its Version 2 release, delivered new data types that could store very large amounts of data in a single data value. This relieved the applications from the burden of piecing together character strings to form a document. Instead, the entire document or image could be stored in one of the new large object types: character large object (CLOB), binary large object (BLOB), and double-byte character large object (DBCLOB). To make it easier to handle LOBs in an application, there are built-in functions for manipulating LOBs and technology for handling LOB data in client applications as files or through locators. Operations on LOBs are highly optimized by deferring LOB materialization as much as possible.
Figure 1: Table definition includes types for multimedia objects.
In addition to the new types, it also became possible to make the database a lot smarter in understanding the purpose of the data that was stored in it. Users could define distinct types, types that are based on existing types but which are distinct by name and semantics from all other types defined in the system. This was an important step toward defining data in business terms and toward encapsulating business semantics into the database.
Because distinct types are based on predefined types, they are tightly integrated into the database manager for performance. They share the same efficient code used to implement built-in functions, comparison operators, indexes, etc. for predefined types. The DB2 optimizer can optimize effectively because it has a deep understanding of the underlying data type representation.
The behavior for distinct types are defined by user-defined functions. User-defined functions are just application logic (or a DB2 built-in scalar function) that operate on data values and return a result. Thus, with a combination of user-defined functions and distinct types, simple or large object data types become meaningful for a particular application domain. The database engine can enforce business rules because it understands that it cannot allow comparison of Dollars and Yen without doing the necessary conversions (via a user-defined function) first. DB2 also supports overloading of function names, and chooses the appropriate function based on an efficient "best fit" algorithm.
Another feature introduced in DB2 is the ability to return tables (record sets) from functions instead of scalar values. This function is powerful because it can make any data source appear like a DB2 table, which can be then used in normal SQL queries, join operations, grouping, and so on.
DB2 also shipped types and functions for audio, image, video, and text. These prepackaged types and behaviors are called DB2 Extenders". Of course, these types are a small sample of what is possible to define in DB2, but these are typical of what is used in a content-based world. Indeed, the extensibility of DB2 is part of what powers IBM's Content Manager products.
The addition of large object types let applications handle large, complex types in familiar ways; that is, by using the relational database to store and retrieve data values. In addition, being able to name those types and to define behaviors for those types relieves the application from supporting semantic checking and business logic.
Nevertheless, all data that is stored in a large object type is essentially flat -a string of characters or bits-that may very well bear no relation to the business object that is stored there. This can put the burden on the application to read the large object into the application's work area, to "parse" the object to the place that needs to be changed, and then to store the object back in the database.
For some types of data, it may make more sense to treat the object as a named collection of attributes. This particular breed of user-defined type is called a structured type and was introduced into DB2 UDB Version 5.2 in 1998. The initial functionality provides not only the ability to create types with structure, but allows the definition of type hierarchies and the equivalent storage mechanism of table hierarchies. The use of "references" to define relationships between objects and to navigate among objects was also introduced in that release.
Structured types provide a way to create a schema that is more natural for those who are comfortable with OO programming languages-they include the properties of inheritance and the establishment of "is-a" relationships (value substitutability). Structured types are a more natural fit for populating Java objects in the application program.
Figure 2: Structured types reused in various places.
To allow you to define behavior for the structured type, you can use user-defined functions. However, DB2 Version 7 delivers the ability to define methods for defining object behavior. Methods are very much like user-defined functions in that they can be written in an external language like Java or C, or in SQL. Methods, however, are always associated with a particular structured type and are invoked using different syntax. Methods are especially suitable for implementing behavior that really can be seen as "owned" or attributed to a specific type. The behaviors for the type are carried with the type in whatever context it is used. In contrast, functions support a more symmetric view on its parameters.
Any of DB2's object-relational features can be used with traditional SQL. One application might need just LOB support, another might want to make full use of structured types, methods, and inheritance. In addition, it is even possible to use object-oriented modeling on relational data by using the object view support described in this paper.
In this section, we focus on the basics of using typed tables. By creating an object model, it is much easier to think about mapping the model to the available constructs in DB2.
Figure 3: UML-like model of relations among people and departments.
Think about the following:
Will it benefit you to take advantage of type hierarchies and inheritance? With DB2's structured types, you can define type hierarchies. Just as with class hierarchies in object-oriented programming languages, objects that are modelled using DB2 structured types inherit attributes from their supertypes (types that are above it in the hierarchy). In the model above, we are using inheritance for the various types of people objects that we represent.
In Version 7, you have the choice of storing structured type objects as rows of typed tables or in individual columns of regular or typed tables. How do you choose? Generally, you would store your objects as rows in a table when those objects are "primary" entities in the object model, such as our people and departments.
Other objects that are good candidates for being stored in typed tables are those that must be referenced and shared by other objects (a business address that gets used by employees, shipping, and so on) or those whose existence is independent of another object (deleting a department does not mean that the manager goes away). If the relationship is one of containment and dependency, then the object can be stored as a column (Note: this is described in further detail in Part 2).
When objects are stored as rows in a table, each column of the table contains one attribute of the object and you can access the attributes of the object much as you would any column in a DB2 table.
How should you map the attributes to DB2 data types? Not only will you need to think about how much space an attribute will use, but you need to consider the types of information that you are trying to get back from any attribute or combination of attributes.
The general mapping of attributes to data types is generally straightforward in cases where you have things like names and salaries, which map easily to character and numeric data types. For very large objects, such as an image or very long text description, you might need to choose one of the DB2 data types that are designed specifically for storing very large objects. Those data types are CLOB (character large object), BLOB (binary large object), or DBCLOB (double-byte character large object).
You can even use structured types to define attributes. This is called nesting. You might want to do this in those cases where the attribute has structure, like an address, or when your columns can hold different subtypes, such as geometries (shape, lines, points, polygons, circles, rectangles, etc). We will discuss nesting more in Part 2.
When you create a user-defined structured type, the syntax is very similar to that of creating a table; you specify the name of the type, its attribute names and the data types of the attributes. If you create subtypes under a structured type, those subtypes will automatically inherit the attributes of the type above it in the hierarchy (the supertype).
Figure 4: Mapping model to type hierarchies.
Table 1. SQL definitional statements
|SQL definition statement||Action|
Create the root type of the hierarchy first.
Create the subtype employee UNDER the Person type. All "person" attributes are inherited by employees. All you need to specify is additional attributes that are unique to employees (or to types created UNDER employees).
Executives get bonuses.
The department type has no subtypes or supertypes. REF(employee) indicates that the manager is an employee, and information about employees is in the employee type.
Method specifications are included with the type definition (or can be added later with ALTER TYPE).
As specified in the SQL99 standard, the method body is specified in a separate CREATE METHOD statement.
Now that department is defined, create a cyclic reference (departments referencing employees (i.e. as managers) and employees working in departments)).
Students are the other leg of our person hierarchy. They share attributes of person, but also have an additional attribute to indicate the major field of study.
Table hierarchies are used to store objects and to allow SQL to change those objects. Conceptually, table hierarchies mirror the associated type hierarchies. However, types are like class libraries that can be used in different contexts, so it may be that not all of your types in a type hierarchy will be in the same table hierarchy. The only requirement is that the table hierarchy is a subset of the type hierarchy.
When you create a table, you specify an additional column for the table that stores the object identifer values for each object (each row) in the table. This object identifier (OID) is created as the first column in the table, followed by the attribute columns. The OID is used in dereference operations; that is, it is used to "grab" onto a row and turn it into a structured type instance. This is described in more detail in Part 2, where we describe how to invoke methods on objects stored in rows.
SQL definitional statements
Table 2. Table hierarchies
|SQL definition statement||Action|
The CREATE TABLE statement is pretty simple. For the root table in the hierarchy, we have to give the name of the column that stores the OID values. In this case, we have given it the creative name of 'oid'.
For any subtables, the clause INHERIT SELECT PRIVILEGES
indicates that all users who hold select privileges on the
root table when that table is created are also granted those
privileges on the subtables.
Here we create the department table. The SCOPE option on the column options for manager indicates that employee objects referred to from the manager column of the department table will reside in the employee table or any subtable of the employee table.
Here we create the scope of the reference going in the other direction. All departments referred to in the employee table will reside in the department table (or in any subtables subsequently created).
Inserting into a typed table applies to the specified target table (unlike SELECT, DELETE, and UPDATE, which apply to all tables below the target table in the hierarchy). The INSERT statement is unchanged for typed tables, other than you must remember to include the object identifier and to cast it to the appropriate REF type.
Table 3. Inserting into a typed table
This department has an object identifier of '1'. The character constant is cast to the REF(Department) type.
The dept attribute could be added as a constant (cast to REF(department)). In this case, a subquery is used to populate the column value.
After the employee table is populated, we can assign Harry Potts as the manager of department 100 as follows:
SET Manager=(SELECT oid from employee_table where name='Harry Potts')
The addition of type hierarchies and references bring with them several enhancements to query, including the ability to easily traverse references among objects and support for the implicit consideration of subtables
An executive is an employee is a person....
A table hierarchy has the property of value substitutability; that is, it establishes an "is-a" relationship. A query on a table is really a query on the collection of objects in the table and all of its subtables. SELECT, DELETE, and UPDATE statements all apply on the selected table and on any of its subtables. However, only attributes that are defined for the selected table are returned. When you select all columns from Person_table, all objects in Person_table and any of its subtables are retrieved, but only those attributes (columns) that belong to person (including those inherited from its supertypes) are returned.
SELECT P.* FROM
Figure 5: Only attributes that are found in the person type are returned.
If the query was instead SELECT * FROM Employee_table, all of the Employee attributes for only employees and executives would be returned. This formulation is certainly a lot simpler than forcing all different types of people into the same table and trying to apply search predicates for the different type values.
OUTER union of a table hierarchy: selecting all attributes
If you want to see all attributes from the selected table and its subtables, use an OUTER union on the table hierarchy: For example, SELECT * FROM OUTER(Employee_table) returns a result with all attributes of employees and also includes the bonuses that only executives have. Nulls are returned for attributes that are not applicable to the returned row.
Figure 6: The OUTER keyword returns all attributes.
SELECT * FROM
Use ONLY in the FROM clause to restrict the retrieval to objects that are in the specified table, not in any subtables.
SELECT * FROM ONLY (Person_table);
The dereference operator (->) is used in path expressions. Path expressions provide a much easier syntax than the traditional method of using joins to select attributes from related objects. Instead of formulating the correct join conditions, the path expression allows you to traverse the reference-based relationship between one object and another. For example, employees and departments have a referential relationship. Departments have managers (who happen to be employees) and employees have departments. To find the employee name and salary, as well as the corresponding department ID and manager, for those departments whose budgets are over 200000 dollars, you could issue the following query:
SELECT E.name, E.salary, E.dept->ID, E.dept->manager->name AS MANAGER FROM Employee_table E WHERE E.dept->budget >200000
Even more complex relationships can easily be queried using path expressions. For example, assume that you want to find the names of all employees whose manager's manager is Lou Gerstner. You can do it this way and avoid having to write a 5-way join:
SELECT E.name FROM Employee_table E WHERE E.dept->manager->dept->manager->name='Lou Gerstner';
You can even use path expressions between typed and regular (untyped) tables if the untyped table includes REF(data type) as a column.
A relationship that is represented by a reference is not the same as one that is represented by referential integrity (although it is certainly possible to use referential integrity on references, too). Internally, DB2 treats the path expression like a left outer join, which means that the path expressions take advantage of DB2's sophisticated join optimization. DB2 also checks to make sure that the ID executing the query has read access on all the referenced tables.
In relational systems, views play an important role in keeping the data independent from the application and in controlling access to sensitive data. These uses are so critical that it is important that the same capabilities are available on the object-relational infrastructure while at the same time preserving an object-relational nature. In other words, method invocation, substitutability, path expressions should all work. In addition, there is the requirement to turn "legacy" data into object-relational table/view hierarchies.
Object views fulfill these requirements.
Figure 7: Relationship between table hierarchy view hierarchies.
Object views are "virtual typed tables" that can have a structure and behavior that is different than the underlying base tables. An object view definition uses structured type definitions as their basis, just as typed tables do. These view definitions can be used to restrict access to a subset of rows of a table hierarchy or hide some of the columns of the table hierarchy by allowing only access through the view hierarchy.
If only a subset of the rows or tables are selected for the view, then the already existing structured type definitions may be used as the basis of the view definition. However, if either the set of attributes/columns or the behavior provided through the view needs to change, then a new type hierarchy needs to be defined as the basis for the view hierarchy. For example, assume that we want to create a set of views that contains no salary, wage or address information, which excludes executives and students, and which includes only those departments that have large budgets. Our type definitions to create such a type hierarchy would merely be a simpler (fewer attributes) version of those used in our existing person type hierarchy. The body of the view definitions tells how to populate those views from the base tables.
Notice the use of the ONLY keyword to ensure that each view is populated only from the appropriate subtable of the original hierarchy. Without ONLY, each view would be populated with all objects at or below it in the hierarchy, which is not what we want in this case.
Also note that since the OID column is a reference type and reference types are strongly typed, the cast from the OID of the table to the OID of the view cannot be done directly; the cast must first be to the base type and then from the base type (VARCHAR) to the type of the view (vperson_t).
Table 4. Type and view definitions
|Type definitions||View definitions|
Now, because the reference-based relationship in the original typed tables (between employee_table and dept_table) are carried over to the corresponding object views queries like the following are possible:
SELECT name, dept->ID
Because two of the employees belong to departments with budgets under 100000, null is returned for the department ID for those employees.
DB2 has an exceptional ability to render views updatable. This capability is important for making views truly useful in avoiding base table access. This capability extends as well to object views, even object views that are created from multiple base tables.
Object views provide a lot of functionality: they are updatable, the view bodies (the semantics of the view) are separate from the type definition, and objects seen through those views can be manipulated and methods can be invoked on them via their object identities and references, just as can objects in a base table. Because object views provide such rich functionality, they can also be used to create an "OO look and feel" to business data that has always resided and been manipulated in a strictly relational manner. Old applications can continue to work on the base tables (or on untyped views), while new applications can run against the object views.
Figure 8: Showing the relational schema of suppliers and parts.
Figure 9: Representing parts and suppliers using object views.
The following steps can be used to create object views on legacy data based on the schema shown above:
- Create structured types to match the tables.CREATE TYPE
REF USING INTEGER
MODE DB2SQL; CREATE TYPE Parts_t AS
Price DECIMAL (5,2),
REF USING VARCHAR (10);
Create the supplier type to map to the Supplier relational table. Note that we will be using the SuppID column as the object identifier in the object view; therefore, it cannot be specified as part of the type definition. Because the SuppID column is defined as an INTEGER, use the REF USING clause to specify INTEGER as the underlying type of the object identifier.
Similarly, for the parts type, the object identifier column is mapped to the data type of the primary key of the Parts table (column Partno of data type VARCHAR(10)).
- Now we can create the typed views. CREATE VIEW v_suppliers of
(REF IS Supp_ID USER GENERATED)
AS SELECT supplier_t(Supp_ID), Name
FROM Suppliers; CREATE VIEW v_parts OF parts_t
(REF IS Partno USER GENERATED,
Supplier WITH OPTIONS SCOPE
AS SELECT parts_t(Partno), Name,
Price, Supplier_t(supplier), qty
The SuppID column of the base table is the object identifier column of the view but must be cast to the REF type in the view definition.
Here is the creation of the parts view. Additional predicates can be added to the SELECT to further restrict which rows populate the view.
Now, path expressions can be used:
SELECT Partno, Supplier->Name AS Supplier_Name FROM v_parts WHERE Price > .05;
Methods can be invoked, too, as will be shown in Part 2 in the next edition of DB2DD.
The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries:
DB2 Universal Database
The following terms are trademarks of other companies:
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Â©Copyright International Business Machines Corporation 2001. All rights reserved.
- DB2 Universal Database V7, Application Development Guide. See
in particular the chapter entitled "Object-Relational
- DB2 Universal Database V7, SQL Reference.
- Michael J. Carey, Serge Rielau, Bennet Vance. "Object View Hierarchies
in DB2 UDB". EDBT 2000: 478-492,1999
- Michael J. Carey, Donald D. Chamberlin, Srinivasa Narayanan, Bennet
Vance, Doug Doole, Serge Rielau, Richard Swagerman, Nelson Mendonca
Mattos. "O-O, What's Happening to DB2?" SIGMOD Conference 1999:
- Michael J. Carey, Donald D. Chamberlin, Srinivasa Narayanan, Bennet
Vance, Doug Doole, Serge Rielau, Richard Swagerman, Nelson Mendonca
Mattos. "O-O, What Have They Done to DB2?" VLDB 1999:
- Chen, Weidong et al. "High Level Indexing of User-Defined Types."
Proceedings of the 25th VLDB Conference, Edinburgh,
- N. M. Mattos, J. Kleewein, M. Tork Roth, K. Zeidenstein, "From
Object-Relational to Federated Databases," Datenbanksysteme in
Buro, Technik und Wissenschaft. March 1-3, 1999: 185-209.
Kathy Zeidenstein has worked at IBM for a bazillion years. Currently, she is working as a technology evangelist for InfoSphere Guardium data activity monitoring, based out of the Silicon Valley Lab. Previously, she was an Information Development Manager for InfoSphere Optim data lifecycle tools. She has had roles in technical enablement, product management and product marketing within the Information Management and ECM organizations at IBM.