Create tables and indexes more efficiently in IDS 9.4
Anyone who knows a little SQL (Structured Query Language) knows how to create tables and indexes in a database. But how do you create tables and indexes more efficiently in terms of system and database performance? Here are few things to consider before creating tables and indexes:
- Extent size
- Lock mode
- Constraint key words
This article explores in detail each of those considerations and illustrates with real-life examples collected in a working environment.
Dbspace is the link between logical database objects and physical disk location. When you create logical database objects such as tables and indexes, you need physical disk space to hold those objects and that is called dbspace. Dbspace is composed of one or more chunks of disk space. For Informix versions before 9.40, there was a limitation on chunk size; each chunk could not exceed 2 GB. This is no longer a restriction. When database sever is first initialized, we have two options; we can either configure the database server to use smaller chunks (chunks less than or equal to 2 GB), or to use larger chunks (chunks larger than 2 GB).
Dbspace is created and administered by the Informix onspaces utility. Dbspace can be created either on cooked file systems or on a raw disk device. Cooked file systems refer to disk devices that have been formatted and mounted and raw disk device refers to those that are not mounted. For better performance and easier database administration for UNIX® systems, Informix recommends using raw disk device for dbspace, since a raw disk device can avoid operation system buffering and creating symbolic links to physical disk devices. This onspaces utility can only be used by users with database administrator (DBA) privileges. Also, do not forget to do a full backup on your Informix instance after you have created new dbspaces. For details on how to create and administer dbspaces, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.
When the Informix database server first initializes, it will automatically create one dbspace to store all database objects. That is usually called rootdbs. The following list shows major Informix database objects:
- Logical log data. Data generated by logical logs
- Physical log data. Data generated by physical logs
- Table data. Data in database tables
- Index data. Data generated by indexes on tables
- Large binary objects data, such as large text and internal graphics
- Temporary data. Data generated by sorting and altering tables and building indexes
To achieve better database performance, Informix documentation recommends creating a different dbspace for each database object. For example, you should create one for logical logs, one for physical logs, one for tables, one for indexes, and so on. Informix documentation also recommends spreading dbspace across disks to get balanced disk I/O. One way to achieve this is to apply modern RAID technology to stripe and concatenate slices of disks. Another way is to configure chunks on different disks. Refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4 for details.
As soon as those dbspaces are properly created, you need to know which one is for tables and which one is for indexes so that you can follow the recommendation not to put different database objects in the same dbspace. You can designate tables and indexes to the appropriate dbspace using the IN clause in the CREATE TABLE and CREATE INDEX. Here is an example:
CREATE TABLE element ( workspace_id integer, element_id integer, cpbsc_instance integer, acgid_bsc integer, dap_cluster_id integer, acgid_dap integer, acg_class smallint, preconfig_flag smallint, cptimr_inst integer, acgglob_inst integer, tdapfr_primary integer, tdapfr_secondary integer, barred_state_tmr integer ) IN data1; CREATE INDEX elmentidx1 on element (workspace_id,cpbsc_instance) IN idx1 ;
The above example designates the element table to data1 dbspace and its index to idx1 dbspace.
The IN clause can also be used for moving tables and indexes from one dbspace to another if you identified one dbspace has much more disk I/O than others and identified tables that are the sources of disk I/O, in other words, tables that are accessed heavily by users. The procedure is as follows:
- Save the schema of the table and indexes you plan to move using Informix dbschema utility
- Unload data from the table you plan to move
- Drop the table
- Recreate the table with the same schema you saved
- Reload the table data
- Recreate indexes with the schema you saved
- Execute update statistics high for that table to activate the indexes
Before you drop the table, you need to study its relationship to other tables and pay special attention to its foreign key constraints. You need to be very careful in handling its foreign key constraints; if not properly handled, you will mess up the integrity of the whole database.
An extent is a block of physically contiguous pages on disk used to store database objects. The default extent size when you create tables is 16 KB. This is usually too small for most database tables. When one extent is filled up, the Informix server will automatically allocate more extents until it has enough extents to store data for the whole table. For the sake of performance, the Informix documentation recommends putting all data for a table into one or two big extents rather than a number of small extents. There are two reasons for this recommendation:
- If a table has more than one extent, there is no guarantee that those extents are contiguous; extents may be scattered throughout the dbspace where the table resides. Contiguity of physical disk pages is important to performance. When pages of data are contiguous, the time used to access data on disk is minimized and the database can read rows sequentially. If tables have too many extents, it is very likely that those extents are interleaved. This harms performance a great deal, because when you retrieve data for a certain table, the disk head needs to seek multiple, noncontiguous extents belonging to this table instead of one large extent with contiguous physical pages. This slows down disk-seeking speed rather significantly.
- Another reason is to avoid an automatic allocation of extents for tables, which is a very expensive operation and will use a lot of system resources, such as CPU and memory.
To implement the Informix documentation recommendations, you need to specify extent size when you create tables so that the Informix server won't use the default extent size, which is 16 KB. The EXTENT SIZE and NEXT SIZE clause in the CREATE TABLE statement allows you to specify the first and second extent size that the Informix server will allocate for every table created.
But how do you estimate extent sizes? This is a difficult task, especially for OLTP databases where tables are constantly updated and inserted. Basicaly, you need to know how many rows the table will contain, the row size and the page size of your system. And then you must do some mathematical operations on those numbers. Informix documentation provides us detailed guidelines on how to achieve this:
- Determine each index length
- Determine total index length
- Account for index overhead
- Determine initial table size
- Size index space needed
- Convert index space to kilobytes
- Determine page length in bytes; subtract 28 to obtain usable page space
- Determine row length
- Determine how many whole rows can fit on a page
- Determine the number of data pages for the initial table
- Size the space need for data pages in the initial table
- Convert data size to kilobytes
- Determine size of the initial extent in kilobytes
- Determine table growth
- Size next extents
Let's now walk through an example. Follow this guideline step by step to estimate first and next extent sizes for the item table. Here is the structure of the item table:
Column Data type Length in bytes* Item_num smallint 2 Order_num integer 4 Stock_num smallint 2 Manu_code char(3) 3 Quantity smallint 2 Total_price money(8) 5 * you may get length for each data type in IBM Informix Dynamic Server Administrator's Reference, Version 9.4.
Let's now assume this table has a single index on order_num and a composite index on stock_num and manu_code. This table will initially have 20,000 rows and will increase to 35,000 rows in a few months.
Here is the result of each caculation:
step description calculation (in bytes) 1 Determine each index length index on order_num = 4+9 = 13 index on stock_num and manu_code = 2+3+9 = 14 2 Determine total index length 13 + 14 = 27 3 Account for index overhead 27 * 1.25 = 33.75 4 Determine initial table size 20,000 rows 5 Total index space needed 20,000 * 33.75 = 675,000 6 Convert index space to kilobytes 67,5000/1,024 = 660 (Kbytes) 7 Determine page length in bytes subtract 28 for page overheads 2,048 minus 28 = 2,020 8 Determine row length 2 + 4 + 2 + 3 + 2 + 5 + 4 = 22 9 Determine how many whole rows per page 2020/22 = 91 10 Determine the number of data pages for the initial table 20,000/91 = 220 11 Total space needed for data 220 * 2,048 = 450,560 12 Convert data size to kilobytes 450,560/1,024 = 440 13 Determine size of the initial extent 440 + 660 = 1,100 14 Determine table growth 35,000 more rows 15 Estimate next extent size Index space needed = 35,000 * 33.75 = 1,181,250 Convert to Kbytes = 1,181,250/1024 = 1,154 Additional data pages needed = 35,000/91 = 385 Convert to bytes = 385 * 2,048 = 788,480 Convert to Kbytes = 788,480/1,024 = 770 Total space needed = 770 + 1,154 = 1,924
Based on our calculation, you identified the first and next extent sizes needed for this table and now you are able to specify first and next extent sizes in CREATE TABLE statement:
CREATE TABLE item( Item_num smallint, Order_num integer , Stock_num smallint, Manu_code char(3) , Quantity smallint, Total_price money(8)) EXTENT SIZE 1100 NEXT SIZE 1924;
The Informix server will automatically allocate the first two extents for the table when it is created and if your estimation is right, those two extents would hold most of if not all of the table's data. The calculations above are very mechanical and can be easily implemented in a Microsoft® Excel spreadsheet to automate the whole process. You can also see from the above calculation that the basis for your estimation is the number of rows a table will contain. This is of course based on business rules, but also depends heavily on relationship between tables in the database. For instance, you have two tables, customer and address, and you know that there are 100,000 customers and each customer may have at most four addresses. So you can easily say that the address table may have 100,000 *4 rows at most.
How do you then verify your estimate and see if your estimate is close to the reality? You can use the Informix oncheck utility to get a tblspace report for the table after it is created and loaded with data. Here is the command:
oncheck -pt prod1:item
And here is the sample output:
TBLspace Report for prod1:item Physical Address 9:652153 Creation date 08/31/2004 11:41:05 TBLspace Flags 802 Row Locking TBLspace use 4 bit bit-maps Maximum row size 220 Number of special columns 0 Number of keys 0 Number of extents 1 Current serial value 1 First extent size 1100 Next extent size 1924 Number of pages allocated 4107 Number of pages used 886 Number of data pages 885 Number of rows 7960 Partition partnum 7340289 Partition lockid 7340289 Extents Logical Page Physical Page Size 0 9:299243 4107
The above output tells you that item table has only one extent. You can use this statistic to optimize your estimation algorithm if the tblspace report shows too many extents for a table and then you can make your estimate match the reality more closely.
The tblspace report actually has two parts, the first part is extent information about the table and the second part contains the extent information on indexes. You cannot specify extent size for index direct as you do for tables; when indexes are created, the Informix server will automatically allocate extents for indexes based on the extent sizes you specified for the tables. You can, however, use the oncheck utility to see how many extents you got for the index. Here is the sample output:
Index item_idx1 fragment in DBspace prod1_idx Physical Address 9:907200 Creation date 08/31/2004 12:48:45 TBLspace Flags 802 Row Locking TBLspace use 4 bit bit-maps Maximum row size 220 Number of special columns 0 Number of keys 1 Number of extents 1 Current serial value 1 First extent size 224 Next extent size 32 Number of pages allocated 224 Number of pages used 36 Number of data pages 0 Number of rows 0 Partition partnum 7340624 Partition lockid 7340289 Extents Logical Page Physical Page Size 0 9:831443 224
The above output showed there is only one extent for the index item_idx1. For details how to use oncheck, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.
Lock mode is of vital importance in concurrency control; it is the mechanism that IDS uses to control Informix sessions access database tables simultaneously. There are three lock modes, namely page lock, table lock, and row lock. Page lock locks the whole physical disk page (2 KB in size) and all rows or records contained on the page, table lock locks the whole database table and all its rows and row lock locks only one row or record at one time. It is obvious that table lock allows least concurrency and if you lock the table in exclusive mode, practically no other person can access the table when you use it; not even to read or query the table.
Which lock mode should you use? The answer to this question depends heavily on your business environment. Let's say, for example, your business is a bank and you want to avoid using table lock; you want to give more concurrency to the system users to allow more people to update their accounts and balance information at the same time.
There are some tradeoffs in selecting lock mode. Table lock allows the least concurrency, but saves system and database resources. Row lock, on the other hand, allows the most concurrency. Users can update different rows on the same table at the same time, but it also uses a lot more locks. Say the table contains two million rows, and if the table is in row lock, Informix may possibly use up to two million locks for this table. The maximum number of locks Informix may use is specified by the LOCKS parameter in the Informix configuration file and is theoretically limited to eight million. Locks are pre-allocated when Informix is initialized, and if Informix reaches its limit, Informix will automatically increase 100,000 locks a maximum of 14 times. In other words it will automatically allocate 100,000 * 14 = 1.4 million more locks, and if this limit has been exceeded, Informix will prompt the error messages in Informix message log and abort the transaction. In some serious cases, Informix will crash.
How do you specify lock mode? In the CREATE TABLE statement. Here is an example:
CREATE TABLE item( Item_num smallint, Order_num integer , Stock_num smallint, Manu_code char(3) , Quantity smallint, Total_price money(8)) LOCK MODE ROW;
Page lock is the default lock mode when creating tables, so you don't have to specify anything. Table lock is used only inside transactions; so even if your table is created with row lock mode, it can be changed during the transactions. However, if your database is created without logging, then table lock mode cannot be used. For details on how to select the proper lock mode, refer to IBM Informix Dynamic Server Administrator's Guide, Version 9.4.
Constraint key words
Many people like to use constraint key words in the CREATE TABLE statement to create primary keys, foreign keys, unique and other constraints. They think it is an easy and convinent way to create constraints. Following is an example:
CREATE TABLE sub_accounts ( sub_acc INTEGER PRIMARY KEY, ref_num INTEGER NOT NULL, ref_type INTEGER NOT NULL, sub_descr CHAR(20), FOREIGN KEY (ref_num, ref_type) REFERENCES accounts (acc_num, acc_type));
Unfortunately, this is not good for database performance and administration. When Informix reads the above CREATE TABLE statement, it will automatically create two indexes, one for primary key on sub_acc and one for foreign key on ref_num and ref_type. It will then assign each with a system number as its index name, which has no meaning to developers and designers. To make matters worse, those indexes are created inside data dbspace and that is not what Informix doccumentation recommends because of the negative effects on general database performance. In addition, when you drop the primary key or foreign key constraints for this table, those indexes will be dropped automatically; so you will need to recreate the index if you want to use them in your queries. If the table has millions of rows, this could be a time-comsuming process.
So the proper procedure to create or add constraints to a table is as follows:
- Create table without constraint key words
- Create index on the columns that will be used by the constraints
- Alter table to add constraints
Here is a good example:
CREATE TABLE item( Item_num smallint, Order_num integer , Stock_num smallint, Manu_code char(3) , Quantity smallint, Total_price money(8) ) IN data1; CREATE UNIQUE INDEX item_idx1 ON item (item_num) IN idx1; ALTER TABLE item ADD CONSTRIANT PRIMARY KEY (item_idx1);
In this case, you can put index and data in different dbspace and avoid dropping primary key constraints together with index.
As discussed above, before you create tables and indexes, you need to think a little about database administration and performance so that you can create tables and indexes more efficiently. You need to ask yourself a few questions:
- Where should I put new tables and indexes?
- How large would the tables grow and how should I specify first and next extent size?
- Which lock mode should the new tables use?
- How should I create various constraints for the table?
The good CREATE TABLE statement should specify answers to all those questions:
CREATE TABLE item( Item_num smallint, Order_num integer , Stock_num smallint, Manu_code char(3) , Quantity smallint, Total_price money(8) ) IN data1 EXTENT SIZE 6700 NEXT SIZE 1100 LOCK MODE ROW; CREATE UNIQUE INDEX item_idx1 ON item (item_num) IN idx1; ALTER TABLE item ADD CONSTRIANT PRIMARY KEY (item_idx1).
- IBM Informix Dynamic Server Administrator's Guide, Version 9.4
- IBM Informix Dynamic Server Getting Started Guide, Version 9.4
- IBM Informix Dynamic Server Performance Guide, Version 9.4
- IBM Informix Dynamic Server Administrator's Reference, Version 9.4
- IBM Informix Dynamic Server Backup and Restore Guide, Version 9.4