Skip to main content

Meet the Experts: Matt Huras and Version 8 Enhancements for DB2 on Linux, UNIX, and Windows

Matt Huras, Distinguished Engineer, IBM Toronto Laboratory
Photo: Matt Huras
Matt Huras is a lead architect and development manager of the data management group in DB2(r) Universal Database(tm) development at the IBM Toronto Laboratory. His focus is on the DB2 UDB storage engine, including indexing, data management, buffering, OS services, utilities, and high availability. Matt joined IBM in 1984 and has been a lead designer with the DB2 development team since 1991. He holds a BaSc in Computer Engineering from the University of Waterloo ('84) and an M.Eng, also in Computer Engineering, from the University of Toronto ('92). He can be reached at huras@ca.ibm.com.

Summary:  Meet this IBM Distinguished Engineer, who gives a sneak preview of his Data Management Technical Conference talk on DB2 Version 8 enhancements, with a focus on new features like multidimensional clustering.

Date:  23 Jul 2002
Level:  Intermediate
Activity:  567 views

Q&A

DB2DD: Matt, thanks very much for talking with us and giving us a little sneak preview of what you will be talking about at the Data Management Technical Conference in Anaheim. Before we get into details, can you give us a little more background of how you ended up working on DB2 and your technical experience with working the product?
Matt: I have a long history with DB2: I led the original design of the DB2/6000 V1 work to revamp the processing model of the engine, splitting out the logic of the OS/2(r) product (which ran completely in the user process' address space) into a thin user portion and a larger server portion. Later, I led the work to convert the engine to use a multithreading design, and later still, I led the effort to merge the PE V1 and DB2 Common Server code base to form DB2 Universal Database.

DB2DD: The title of your talk is "A Technical Sneak Preview of DB2." What highlights of Version 8 will you be covering?
Matt: There's a lot going into the release, more than I can cover. The areas that I will focus on can be grouped into three major categories:

  • The first category covers high-availability features. This is an extremely important focal point for our customers as DB2 becomes the enterprise server of choice. Customers need data to be available almost constantly, so we made some enhancements to make that possible.
  • The second group of enhancements I'll talk about are focused on the area of making the administration task easier and includes improved administration tools, performance advice, and other usability and serviceability enhancements. Again, this is of critical importance to our customers who need to get more productivity out of their administrators by allowing them to focus on expanding the business and less on day-to-day tasks that can more easily be automated.

DB2DD: Is this related to the SMART stuff we hear about?
Matt: Some of the upcoming enhancements do reflect advancements in SMART, but there are also a lot of other usability enhancements, such as graphical tools that help administrators do their jobs. By making administrative tasks much easier, we hope that more people will take advantage of advanced technical features, including partitioning.

DB2DD: And what is the third category of enhancements our readers can expect to see in the next release?
Matt: The third category of enhancements include continuous improvement in the bread-and-butter areas of performance and scalability. Included in this category are such things as the DB2 Common Client, key infrastructure improvements to improve scalability and memory usage, null and default value compression, a new index type, and something that I think will please a lot of people, multidimensional clustering.

DB2DD: Let's start with that one, then. What is multidimensional clustering?
Matt: Multidimensional clustering is a unique new feature that allows data to be clustered according to several different "dimensions" simultaneously. Although I'm using the term "dimension" in the data warehousing sense (as a way to view data from different dimensions such as by region or by year) this feature is not limited to data warehousing. It's kind of like being able to define multiple different clustering indexes on the same table, so that range scans on those all those indexes all get optimal disk access patterns. As you know, this is not possible today, but with Version 8, this will be possible through multidimensional clustering.

DB2DD: It sounds interesting, but I'm not sure I completely understand. Could you expand on that a bit?
Matt: It's really a lot easier to explain this with pictures. If you look at Figure 1, you can see how clustering works today. All indexes are record-based, which means that clustering can only occur in one dimension, such as only by year, or only by region, not by both year and region. Even with the one-dimension clustering, there is no guarantee that you will retain clustering, because it tends to degrade after the free space is exhausted (at least until you do the next reorganization of the data). These effects are shown in the diagram. The Region index is defined as the clustering index; you can see that most of its record IDs are clustered, but some aren't. The Year index cannot be clustered, and this is certainly reflected in its record IDs.


Figure 1. Today, clustering can degrade over time
clustering can degrade over time

As a result, a query such as SELECT * FROM T1 WHERE YEAR=99 may result in a significant amount of random and/or sequential I/O, and would typically pull in pages that can contain a mix of qualifying and unqualifying rows.

DB2DD: I see. And how does multidimensional clustering change this picture?
Matt: OK, look at Figure 2, which shows multidimensional clustering. With multidimensional clustering, tables are managed by block according to defined clustering dimensions. You won't run into problems with degraded clustering either. Clustering is guaranteed because each INSERT transparently places the row in an existing block that satisfies all dimensions, or else a new block is created to store the row. Queries in clustering dimensions only do I/Os absolutely necessary for selected data.


Figure 2. With multidimensional clustering, clustering is maintained
multidimensional clustering

Dimension indexes are also block-based. Each key in these indexes points to a consecutive set of pages (i.e., a block) and each row in these pages qualify to that key. Note that these block indexes will be typically much smaller than record-based indexes, as a single key represents lots of rows. By the way, record-based indexes can still be created on tables that are clustered multidimensonally. This can be useful, for example, for random single-row access.

DB2DD: Can you give some examples of the types of query that benefit from multidimensional clustering?
Matt: Sure. Let's create a table called SALES and use the new ORGANIZE BY clause to tell DB2 how we want the data to be clustered.

 
CREATE TABLE sales 
   (Customer VARCHAR(80), 
    Region     CHAR(5) 
    Year        INT) 
ORGANIZE BY DIMENSIONS (Region, Year) 

Now, assume that we want to issue the following query:

 
SELECT * FROM sales 
   WHERE 
    Region = 'West' AND Year = 00 

This query can be satisfied with multidimensional clustering by performing an ANDing operation on the region and year block indexes. For example, let's say the Region index indicates that all the rows that have Region='West' are in the pages of blocks 5,6 and 11. Let's also say that the Year index indicates that all the rows that have Year=00 are in the pages of blocks 11, 99 and 1025. ANDing these block lists together yields a block list consisting of one block, block 11. A single big-block I/O of block 11 is then done, and a mini-relational scan of that block returns all the qualifying rows (no predicates are required as all rows are guaranteed to qualify).

At the conference, I'll talk a lot more about the internals and about other situations in which multidimensional clustering can really improve query processing.

DB2DD: Can you briefly touch on some of the high-availability enhancements that are in the next release?
Matt: There are a lot of fantastic features coming, including online table reorg, online index reorg, online index create, dynamic configuration parameters, the ability to drop containers online, and that's really just the tip of the iceberg. At the conference, I'll spend most of my time talking about the ability to reorganize and load online; that is, without having to disrupt online transactions for the operations.

DB2DD: Can we get back to SMART for a minute? What are DB2's focus areas for SMART in the next release?
Matt: I can touch on this, but everyone should try to see Jessica Escott's talk at the conference about this. She will go into much more detail. The main focus areas for DB2 include the ability to be self-healing, the ability to provide expert design advice, the ability to recover automatically, and the ability to do self-maintenance.

One of the things we are doing is adding an automated health management framework, which will allow people to define criteria for particular "alarm" conditions and to provide the action to occur when that condition is met. For example, if a table space reaches a threshold of 75% full, it can trigger an e-mail to be sent or a script to be run, or a page to made, etc. Other features in this category include an auto-configuration capability. This capability uses advanced algorithms, and knowledge of the machine resources and the workload to set dozens of parameters for the user, including nontrivial settings like memory configuration for buffer pools, sort, lock list, log buffer size, prefetchers and page cleaners, etc. At a couple of customer sites, we ran Beta tests with which we improved the performance of customer production workloads that had been hand-tuned by experienced DBAs.

This is only the beginning. In summary, there will be other features that will:

  • Help you get up and running with a good configuration in seconds
  • Help you make the most of your schema
  • Notify you when the system needs your help
  • Keep DB2 up and running in the event of failure
  • Make problem resolution and recovery an easier process

DB2DD: Janet Perna mentioned that DB2 was going to have a lot of good stuff targeted for developers, especially in terms of development environments. What can you tell us about that?
Matt: Connie Nelin and Abdul Al-Azzawe are going to cover this area in detail at the conference. Abdul will talk about the new DB2 Development Center, which makes it easier to develop stored procedures, user-defined functions (UDFs), including the special MQSeries®, XML, and OLE DB2 table UDFs. There will be support for structured data types, an integrated SQL debugger, and import, export, and deployment of projects, among other things. It will also integrate well with Microsoft® Visual Studio tools including Visual Basic, Visual C++, and Visual InterDev.

And Connie will talk about our integration into WebSphere® Studio tools for developing Web services and messaging applications.

DB2DD: I guess the DB2 development team has been busy.
Matt: I've barely touched on all of the innovative new features that are coming. This release is designed to provide significant advances in availability, ease of use and cost of ownership, application development, and performance and scalability. I hope everyone can make it to the conference this year, where they can feel free to ask me any questions they have. I'm really looking forward to it.


About the author

Photo: Matt Huras

Matt Huras is a lead architect and development manager of the data management group in DB2(r) Universal Database(tm) development at the IBM Toronto Laboratory. His focus is on the DB2 UDB storage engine, including indexing, data management, buffering, OS services, utilities, and high availability. Matt joined IBM in 1984 and has been a lead designer with the DB2 development team since 1991. He holds a BaSc in Computer Engineering from the University of Waterloo ('84) and an M.Eng, also in Computer Engineering, from the University of Toronto ('92). He can be reached at huras@ca.ibm.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13775
ArticleTitle=Meet the Experts: Matt Huras and Version 8 Enhancements for DB2 on Linux, UNIX, and Windows
publish-date=07232002
author1-email=
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers