Indexes are an integral part of getting the optimal performance from
your database. An index is an access method for a table. When an SQL
statement is run against a table, the database server must access the
table to find the rows that match the criteria of the SQL statement.
In general, the database server has two ways it can access the table:
a sequential scan or an index scan. The access method used has a lot
of determining factors, including the selectivity of the criteria in
the SQL statement (filters in the
clause). The selectivity tells the database server if the SQL
statement wants .1% of the table returned or 100% of the table
A sequential scan accessing a table is just like you reading a book. If you want to know the whole story, you have to start on page 1 and read until you get to the end. If you start anywhere in the middle, or stop before reaching the end, you could miss something along the way. So a sequential scan needs to start on the first page of the table and read every page till it hits the end of the table. Because of the cost of sequential scans, unless the selectivity of the SQL statement needs it, sequential scans are avoided. This is especially true in OLTP applications. OLTP applications usually get small subsets or data from larger tables.
Since OLTP environments avoid sequential scans, a faster access method is needed instead. That access method is known as an index. Using the book analogy again, books have indexes in the back of them. Why? For faster lookup. If you were writing an SQL statement and couldn't remember the syntax, you could pull out the Guide to SQL: Syntax and start looking through it for the particular SQL statement you were writing. If you wanted to make sure you didn't miss it, you would start on page 1 and look at every page until you found what you were looking for. Or you could go to the index of the book, look up your topic, and the index would point you directly to the page that the topic is discussed on. Which would be faster? If the topic just so happened to be on page 3, then a sequential scan would be faster; but if the topic was on page 695, then the index lookup would be faster. Say that the index in the back of the book was three pages long. During an index lookup, at most you would have to look at four pages, three index pages, and the actual page where the topic is discussed. So as long as the topic isn't discussed within the first five pages of the book, then the index lookup would be faster then the sequential scan method. However, remember that a sequential scan cannot stop when it first finds the topic in the book because that same topic might be talked about on another page in the book as well. If the sequential scan stopped at the first one, then it would miss the rest. Whereas in an index, you can reference all the pages where the topic exists in the same index entry.
Many people have asked me, "If indexes are so great, why don't we index everything?"
The answer is: every advantage has a disadvantage. We've talked about the advantage of indexes. Let's take a look at the disadvantages of indexes:
- Disk space – just like an index in a book takes pages of the book, so does the index of a table
- Performance - an index slows down modifications to the table (INSERTS, UPDATES, and DELETES).
Because of the price of disk space these days, the first disadvantage is almost a moot point. But anything that slows down performance, like the second disadvantage, is something that should be looked into further.
The reason why modifications to the table are slowed down is the extra overhead that has to be done. Back to the book analogy, if you add another topic to the book, you had better update the index with information from that topic. If you don't update the index, then anybody who is interested in learning more about that new topic will either have to do a sequential scan of the book to find it, or just think that the book doesn't cover it and go buy somebody else's book.
So when an SQL statement makes a change to the table (new row inserted, existing row updated, or get rid of a row by deleting it), you need to update the index immediately so that if another user tries to use the index to find the rows they are looking for, their result set will be accurate based upon the rows in the table. The index must completely and accurately represent all the rows in the table at all times. If it gets out of sync with the table, then it becomes a liability and can cause integrity problems for the users.
So when a new row is inserted into the table, if that table has one index on it, you have effectively doubled the amount of work that needs to be done before the insert can complete. You have to insert the row into its data page, and you have to update the appropriate index page with that row's information. Add another index, and you have tripled the work (one data page insert, one index page insert for index #1, and one index page insert for index #2). Add a third index on the table, and you have four times the amount of work to complete, as opposed to inserting a row into a table that has no indexes. The same thing happens for both updates and deletes.
So the more indexes a table has, the more work that has to be done while modifying the data, slowing down performance of those operations.
Now that you have seen what indexes are used for and their advantages and disadvantages, let's look at how to create indexes on your tables.
CREATE INDEX SQL statement is used to
create a new index:
CREATE INDEX SSN_idx ON customer(SSN);
By default, an index allows duplicate values. Just like a topic in a book could be discussed on more than one page, for which all page numbers are listed in the index, an indexed value in a table could exist in more than one row of the table, so the index has to point to all the rows where that value exists.
You can put a unique constraint on the table by specifying that each index value has to be unique (can only exist once in the table). This is effectively the same as creating a UNIQUE constraint on the table, as described in the "Constraints" section.
CREATE UNIQUE INDEX SSN_idx ON customer(SSN);
An index can exist on more than one column.
CREATE INDEX cust_idx ON customer (LName, Birthday);
When an index is on more than one column, the first column listed in the index is the main sort order of the index, followed by the rest of the columns.
Let's use the phone book to explain this idea. The phone book is ordered by last name (main sort order), but there are multiple listings for Jones in the phone book. So when you look at the pages of Joneses, you see that the phone book goes to a secondary sort order with the Jones values, and that sort order is the first name. Well, there are many John Joneses, so then the phone book goes to another sort level and orders by middle initial. If there are duplicate John J Joneses, then it goes to another sort level, and so on.
This is the same way that multi-column indexes work in IDS. If there are duplicates in the main (first column) sort level, then those duplicates are ordered by the second column listed, and so on. The maximum key size of the index (the sum of all indexed column data type sizes) is 390 bytes.
Just like the
CREATE TABLE SQL statement,
CREATE INDEX SQL statement has a
CREATE INDEX cust_idx ON customer (SSN) IN dbspace1;
An index can be created in ascending or descending order, with ascending order being the default. Since IDS can traverse the index in a bidirectional fashion, specifying the order of the index is not that important.
Listing 24. Example with a single column index
CREATE INDEX cust_idx ON customer (SSN DESC);
Listing 25. Example with a multi-column index
CREATE INDEX cust_idx ON customer (birthday DESC, age);
The example in Listing 25 will create the index with birthday in descending order. When duplicate birthdays are found, they would be ordered by the age value in ascending order, since ascending is the default and no order was specified for age.
Clustered indexes are a way of ordering the table the same as the
index. An index always has an order to it. The order of the index is
decided by the
ASC keywords in the
CREATE INDEX statement. However, by
default, a table has no order, and the rows get stored into the pages
of the table in whatever order they are inserted. As rows are deleted
from the table, new rows being inserted fill in the gaps left by the
deletions. This can lead to a very unorganized (unordered) table. The
disadvantage of an unorganized table comes with duplicate or
For example, say that you want to get all the customer information out of your customer table where the last name of the customer is equal to 'Jones'. If you have more than one customer with the last name of 'Jones', you have to find all the rows that match the last name of 'Jones' no matter what page they exist on. If you have 10 rows with last name of Jones, worst case scenario, you could actually end up reading 10 different pages to get the data (one row per page). What would happen if you would "cluster" all the like values together? If you could fit 10 rows per page, you could end up reading only one page to get all 10 rows with the value of Jones. Think of a page as an I/O. Which is better for performance? 10 I/Os or 1 I/O? Since I/O is one of the slowest pieces and can take valuable time, the less I/O, the better. So 1 I/O is much better for performance than 10 I/Os.
The same thing can go for range searches.
For example, say that you want to get all the customers that have ordered from you within a certain time frame. So you want to return all the rows where the order date is between two dates. If the table is not ordered by date, then you could end up searching all over the table for the rows that fit between those two dates. But if you order the table by date, then all of the rows with the same date will be "clustered" together in a set of pages. This allows for better performance because only a set of pages have to be read, rather than what could amount to the whole table.
The table is clustered (ordered) based upon an index. To change the
order of the table you have to cluster an index. The clustering of an
index is done through the
CREATE INDEX statement:
CREATE CLUSTER INDEX cust_idx ON customer(SSN);
Note: A table can only be clustered (ordered) in one way, based upon one index. So each table can only have one clustered index.
Indexes are also allowed to be fragmented. When creating an index on a fragmented table, by default, the index fragmentation will be the same as the table fragmentation. If the table has three fragments, then the index will have three fragments. Each index fragment will only index data in its associated data fragment.
Indexes can also be given an explicit fragmentation scheme that doesn't
have to match the table's fragmentation scheme. This can only be done
FRAGMENT BY EXPRESSION clause. The
FRAGMENT BY ROUND ROBIN clause is not valid
CREATE INDEX SQL statement.
FRAGMENT BY EXPRESSIONclause
CREATE TABLE customer ( SSN char(11), LName char(50), Age integer, Birthday date ) FRAGMENT BY EXPRESSION (Age < 50) IN dbs1, (Age > 51) IN dbs2; CREATE INDEX cust_idx1 on customer(Age); CREATE INDEX cust_idx2 on customer (SSN) in dbspace5; CREATE INDEX cust_idx3 ON customer(Birthday) FRAGMENT BY EXPRESSION (Birthday < '01/01/1980') in dbspace1, (Birthday BETWEEN '01/01/1980' AND '01/01/2000') IN dbspace2, (Birthday > '01/01/2000') in dbspace3;
In Listing 26, the indexes will differ as such:
- The cust_idx1 index will be fragmented in the same way as the table. Since the table has two fragments, the index will have two fragments, and they will be stored in the same dbspaces as the data.
- The cust_idx2 index will not be fragmented, and the whole index will be stored in dbspace5.
- The cust_idx3 index will have three fragments based upon the expression putting each index fragment into its appropriate dbspace.
Just like on tables, the expression used for fragmentation of indexes
can be very complex. However, the general rule of thumb is to keep it
as simple as possible while still meeting performance requirements.
Indexes can also use the
to allow for multiple partitions in a single dbspace.
Putting a clustered index on a fragmented table will only cause each fragment of the table to become clustered, not the whole table to become clustered.