Create tables and indexes more efficiently in IDS 9.4

If you are an Informix® DBA, do you want to know how to create tables and indexes more efficiently? In order to achieve better performance, you need to think of the physical aspect of your database before you create tables and indexes. The author elaborates on each of those considerations and illustrates with real-life examples.

Share:

Jianing Fan (cjf035@motorola.com), Senior Software Engineer, Motorola

Jianing Fan is a software engineer at Motorola specializing in relational database management systems. He is an Informix Certified Professional, Oracle Certified Professional, and has over 10 years of database and system experience as a developer, system administrator, and DBA. Jianing can be reached at cjf035@email.mot.com.



20 January 2005

Introduction

Anyone who knows a little SQL (Structured Query Language) knows how to create tables and indexes in a database. But how do you create tables and indexes more efficiently in terms of system and database performance? Here are few things to consider before creating tables and indexes:

  • Dbspace
  • Extent size
  • Lock mode
  • Constraint key words

This article explores in detail each of those considerations and illustrates with real-life examples collected in a working environment.


Dbspace

Dbspace is the link between logical database objects and physical disk location. When you create logical database objects such as tables and indexes, you need physical disk space to hold those objects and that is called dbspace. Dbspace is composed of one or more chunks of disk space. For Informix versions before 9.40, there was a limitation on chunk size; each chunk could not exceed 2 GB. This is no longer a restriction. When database sever is first initialized, we have two options; we can either configure the database server to use smaller chunks (chunks less than or equal to 2 GB), or to use larger chunks (chunks larger than 2 GB).

Dbspace is created and administered by the Informix onspaces utility. Dbspace can be created either on cooked file systems or on a raw disk device. Cooked file systems refer to disk devices that have been formatted and mounted and raw disk device refers to those that are not mounted. For better performance and easier database administration for UNIX® systems, Informix recommends using raw disk device for dbspace, since a raw disk device can avoid operation system buffering and creating symbolic links to physical disk devices. This onspaces utility can only be used by users with database administrator (DBA) privileges. Also, do not forget to do a full backup on your Informix instance after you have created new dbspaces. For details on how to create and administer dbspaces, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.

When the Informix database server first initializes, it will automatically create one dbspace to store all database objects. That is usually called rootdbs. The following list shows major Informix database objects:

  • Logical log data. Data generated by logical logs
  • Physical log data. Data generated by physical logs
  • Table data. Data in database tables
  • Index data. Data generated by indexes on tables
  • Large binary objects data, such as large text and internal graphics
  • Temporary data. Data generated by sorting and altering tables and building indexes

To achieve better database performance, Informix documentation recommends creating a different dbspace for each database object. For example, you should create one for logical logs, one for physical logs, one for tables, one for indexes, and so on. Informix documentation also recommends spreading dbspace across disks to get balanced disk I/O. One way to achieve this is to apply modern RAID technology to stripe and concatenate slices of disks. Another way is to configure chunks on different disks. Refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4 for details.

As soon as those dbspaces are properly created, you need to know which one is for tables and which one is for indexes so that you can follow the recommendation not to put different database objects in the same dbspace. You can designate tables and indexes to the appropriate dbspace using the IN clause in the CREATE TABLE and CREATE INDEX. Here is an example:

CREATE TABLE element 
  (
    workspace_id integer,
    element_id integer,
    cpbsc_instance integer,
    acgid_bsc integer,
    dap_cluster_id integer,
    acgid_dap integer,
    acg_class smallint,
    preconfig_flag smallint,
    cptimr_inst integer,
    acgglob_inst integer,
    tdapfr_primary integer,
    tdapfr_secondary integer,
    barred_state_tmr integer
  ) IN data1;

CREATE INDEX elmentidx1 on element (workspace_id,cpbsc_instance)
    IN idx1 ;

The above example designates the element table to data1 dbspace and its index to idx1 dbspace.

The IN clause can also be used for moving tables and indexes from one dbspace to another if you identified one dbspace has much more disk I/O than others and identified tables that are the sources of disk I/O, in other words, tables that are accessed heavily by users. The procedure is as follows:

  • Save the schema of the table and indexes you plan to move using Informix dbschema utility
  • Unload data from the table you plan to move
  • Drop the table
  • Recreate the table with the same schema you saved
  • Reload the table data
  • Recreate indexes with the schema you saved
  • Execute update statistics high for that table to activate the indexes

Before you drop the table, you need to study its relationship to other tables and pay special attention to its foreign key constraints. You need to be very careful in handling its foreign key constraints; if not properly handled, you will mess up the integrity of the whole database.


Extent size

An extent is a block of physically contiguous pages on disk used to store database objects. The default extent size when you create tables is 16 KB. This is usually too small for most database tables. When one extent is filled up, the Informix server will automatically allocate more extents until it has enough extents to store data for the whole table. For the sake of performance, the Informix documentation recommends putting all data for a table into one or two big extents rather than a number of small extents. There are two reasons for this recommendation:

  • If a table has more than one extent, there is no guarantee that those extents are contiguous; extents may be scattered throughout the dbspace where the table resides. Contiguity of physical disk pages is important to performance. When pages of data are contiguous, the time used to access data on disk is minimized and the database can read rows sequentially. If tables have too many extents, it is very likely that those extents are interleaved. This harms performance a great deal, because when you retrieve data for a certain table, the disk head needs to seek multiple, noncontiguous extents belonging to this table instead of one large extent with contiguous physical pages. This slows down disk-seeking speed rather significantly.
  • Another reason is to avoid an automatic allocation of extents for tables, which is a very expensive operation and will use a lot of system resources, such as CPU and memory.

To implement the Informix documentation recommendations, you need to specify extent size when you create tables so that the Informix server won't use the default extent size, which is 16 KB. The EXTENT SIZE and NEXT SIZE clause in the CREATE TABLE statement allows you to specify the first and second extent size that the Informix server will allocate for every table created.

But how do you estimate extent sizes? This is a difficult task, especially for OLTP databases where tables are constantly updated and inserted. Basicaly, you need to know how many rows the table will contain, the row size and the page size of your system. And then you must do some mathematical operations on those numbers. Informix documentation provides us detailed guidelines on how to achieve this:

  • Determine each index length
  • Determine total index length
  • Account for index overhead
  • Determine initial table size
  • Size index space needed
  • Convert index space to kilobytes
  • Determine page length in bytes; subtract 28 to obtain usable page space
  • Determine row length
  • Determine how many whole rows can fit on a page
  • Determine the number of data pages for the initial table
  • Size the space need for data pages in the initial table
  • Convert data size to kilobytes
  • Determine size of the initial extent in kilobytes
  • Determine table growth
  • Size next extents

Let's now walk through an example. Follow this guideline step by step to estimate first and next extent sizes for the item table. Here is the structure of the item table:

Click to see code listing

Column			Data type 		Length in bytes*

Item_num		smallint		2
Order_num		integer			4	
Stock_num		smallint		2
Manu_code		char(3)			3
Quantity		smallint		2
Total_price		money(8)		5

* you may get length for each data type in IBM Informix Dynamic Server Administrator's Reference, Version 9.4.

Let's now assume this table has a single index on order_num and a composite index on stock_num and manu_code. This table will initially have 20,000 rows and will increase to 35,000 rows in a few months.

Here is the result of each caculation:

step	description	                	calculation (in bytes)

1	Determine each index length		index on order_num = 4+9 = 13     
                                        	index on stock_num and manu_code  = 2+3+9 = 14
2       Determine total index length    	13 + 14 = 27 
3       Account for index overhead      	27 * 1.25 = 33.75
4       Determine initial table size            20,000 rows
5       Total index space needed        	20,000 * 33.75 = 675,000
6       Convert index space to kilobytes        67,5000/1,024 = 660 (Kbytes)
7       Determine page length in bytes
          subtract 28 for page overheads        2,048 minus 28 = 2,020 
8       Determine row length                    2 + 4 + 2 + 3 + 2 + 5 + 4 = 22
9       Determine how many whole rows per page  2020/22 = 91
10      Determine the number of data pages 
          for the initial table                 20,000/91 = 220 
11      Total space needed for data 	        220 * 2,048 = 450,560
12      Convert data size to kilobytes          450,560/1,024 = 440 
13      Determine size of the initial extent    440 + 660 = 1,100 
14      Determine table growth                  35,000 more rows
15      Estimate next extent size               Index space needed = 35,000 * 33.75 = 1,181,250
                                                Convert to Kbytes = 1,181,250/1024 = 1,154
                                                Additional data pages needed = 35,000/91 = 385
                                                Convert to bytes = 385 * 2,048 = 788,480
                                                Convert to Kbytes = 788,480/1,024 = 770
                                                Total space needed = 770 + 1,154 = 1,924

Based on our calculation, you identified the first and next extent sizes needed for this table and now you are able to specify first and next extent sizes in CREATE TABLE statement:

CREATE TABLE item(
Item_num		smallint,
Order_num		integer	,	
Stock_num		smallint,
Manu_code		char(3)	,
Quantity		smallint,
Total_price		money(8)) 
EXTENT SIZE 1100 NEXT SIZE 1924;

The Informix server will automatically allocate the first two extents for the table when it is created and if your estimation is right, those two extents would hold most of if not all of the table's data. The calculations above are very mechanical and can be easily implemented in a Microsoft® Excel spreadsheet to automate the whole process. You can also see from the above calculation that the basis for your estimation is the number of rows a table will contain. This is of course based on business rules, but also depends heavily on relationship between tables in the database. For instance, you have two tables, customer and address, and you know that there are 100,000 customers and each customer may have at most four addresses. So you can easily say that the address table may have 100,000 *4 rows at most.

How do you then verify your estimate and see if your estimate is close to the reality? You can use the Informix oncheck utility to get a tblspace report for the table after it is created and loaded with data. Here is the command:

oncheck -pt prod1:item

And here is the sample output:

TBLspace Report for prod1:item

    Physical Address               9:652153
    Creation date                  08/31/2004 11:41:05
    TBLspace Flags                 802        Row Locking
                                              TBLspace use 4 bit bit-maps
    Maximum row size               220
    Number of special columns      0
    Number of keys                 0
    Number of extents              1
    Current serial value           1
    First extent size              1100
    Next extent size               1924
    Number of pages allocated      4107
    Number of pages used           886
    Number of data pages           885
    Number of rows                 7960
    Partition partnum              7340289
    Partition lockid               7340289

    Extents
         Logical Page     Physical Page        Size
                    0          9:299243        4107

The above output tells you that item table has only one extent. You can use this statistic to optimize your estimation algorithm if the tblspace report shows too many extents for a table and then you can make your estimate match the reality more closely.

The tblspace report actually has two parts, the first part is extent information about the table and the second part contains the extent information on indexes. You cannot specify extent size for index direct as you do for tables; when indexes are created, the Informix server will automatically allocate extents for indexes based on the extent sizes you specified for the tables. You can, however, use the oncheck utility to see how many extents you got for the index. Here is the sample output:

                  Index  item_idx1 fragment in DBspace prod1_idx

    Physical Address               9:907200
    Creation date                  08/31/2004 12:48:45
    TBLspace Flags                 802        Row Locking
                                              TBLspace use 4 bit bit-maps
    Maximum row size               220
    Number of special columns      0
    Number of keys                 1
    Number of extents              1
    Current serial value           1
    First extent size              224
    Next extent size               32
    Number of pages allocated      224
    Number of pages used           36
    Number of data pages           0
    Number of rows                 0
    Partition partnum              7340624
    Partition lockid               7340289

    Extents
         Logical Page     Physical Page        Size
                    0          9:831443         224

The above output showed there is only one extent for the index item_idx1. For details how to use oncheck, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.


Lock mode

Lock mode is of vital importance in concurrency control; it is the mechanism that IDS uses to control Informix sessions access database tables simultaneously. There are three lock modes, namely page lock, table lock, and row lock. Page lock locks the whole physical disk page (2 KB in size) and all rows or records contained on the page, table lock locks the whole database table and all its rows and row lock locks only one row or record at one time. It is obvious that table lock allows least concurrency and if you lock the table in exclusive mode, practically no other person can access the table when you use it; not even to read or query the table.

Which lock mode should you use? The answer to this question depends heavily on your business environment. Let's say, for example, your business is a bank and you want to avoid using table lock; you want to give more concurrency to the system users to allow more people to update their accounts and balance information at the same time.

There are some tradeoffs in selecting lock mode. Table lock allows the least concurrency, but saves system and database resources. Row lock, on the other hand, allows the most concurrency. Users can update different rows on the same table at the same time, but it also uses a lot more locks. Say the table contains two million rows, and if the table is in row lock, Informix may possibly use up to two million locks for this table. The maximum number of locks Informix may use is specified by the LOCKS parameter in the Informix configuration file and is theoretically limited to eight million. Locks are pre-allocated when Informix is initialized, and if Informix reaches its limit, Informix will automatically increase 100,000 locks a maximum of 14 times. In other words it will automatically allocate 100,000 * 14 = 1.4 million more locks, and if this limit has been exceeded, Informix will prompt the error messages in Informix message log and abort the transaction. In some serious cases, Informix will crash.

How do you specify lock mode? In the CREATE TABLE statement. Here is an example:

CREATE TABLE item(
Item_num		smallint,
Order_num		integer	,	
Stock_num		smallint,
Manu_code		char(3)	,
Quantity		smallint,
Total_price		money(8)) 
LOCK MODE ROW;

Page lock is the default lock mode when creating tables, so you don't have to specify anything. Table lock is used only inside transactions; so even if your table is created with row lock mode, it can be changed during the transactions. However, if your database is created without logging, then table lock mode cannot be used. For details on how to select the proper lock mode, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.


Constraint key words

Many people like to use constraint key words in the CREATE TABLE statement to create primary keys, foreign keys, unique and other constraints. They think it is an easy and convinent way to create constraints. Following is an example:

CREATE TABLE sub_accounts (
sub_acc INTEGER PRIMARY KEY,
ref_num INTEGER NOT NULL,
ref_type INTEGER NOT NULL,
sub_descr CHAR(20),
FOREIGN KEY (ref_num, ref_type) REFERENCES accounts
(acc_num, acc_type));

Unfortunately, this is not good for database performance and administration. When Informix reads the above CREATE TABLE statement, it will automatically create two indexes, one for primary key on sub_acc and one for foreign key on ref_num and ref_type. It will then assign each with a system number as its index name, which has no meaning to developers and designers. To make matters worse, those indexes are created inside data dbspace and that is not what Informix doccumentation recommends because of the negative effects on general database performance. In addition, when you drop the primary key or foreign key constraints for this table, those indexes will be dropped automatically; so you will need to recreate the index if you want to use them in your queries. If the table has millions of rows, this could be a time-comsuming process.

So the proper procedure to create or add constraints to a table is as follows:

  • Create table without constraint key words
  • Create index on the columns that will be used by the constraints
  • Alter table to add constraints

Here is a good example:

CREATE TABLE item(
Item_num		smallint,
Order_num		integer	,	
Stock_num		smallint,
Manu_code		char(3)	,
Quantity		smallint,
Total_price		money(8)
) 
IN data1; 

CREATE UNIQUE INDEX item_idx1 ON item (item_num) IN idx1; 

ALTER TABLE item ADD CONSTRIANT PRIMARY KEY (item_idx1);

In this case, you can put index and data in different dbspace and avoid dropping primary key constraints together with index.


Conclusion

As discussed above, before you create tables and indexes, you need to think a little about database administration and performance so that you can create tables and indexes more efficiently. You need to ask yourself a few questions:

  • Where should I put new tables and indexes?
  • How large would the tables grow and how should I specify first and next extent size?
  • Which lock mode should the new tables use?
  • How should I create various constraints for the table?

The good CREATE TABLE statement should specify answers to all those questions:

CREATE TABLE item(
Item_num		smallint,
Order_num		integer	,	
Stock_num		smallint,
Manu_code		char(3)	,
Quantity		smallint,
Total_price		money(8)
) IN data1 EXTENT SIZE 6700 NEXT SIZE 1100 LOCK MODE ROW; 

CREATE UNIQUE INDEX item_idx1 ON item (item_num) IN idx1; 

ALTER TABLE item ADD CONSTRIANT PRIMARY KEY (item_idx1).

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=33319
ArticleTitle=Create tables and indexes more efficiently in IDS 9.4
publish-date=01202005