IBM Support

Performance impact after increasing columns varchar size

Technical Blog Post


Abstract

Performance impact after increasing columns varchar size

Body

We have an issue, where one of our larger tables has a column which is too short. Some of the data that is coming is up to 1024 characters. To fix this we would have to alter the column from 192 to 1024 via something like: ALTER TABLE MYTAB ALTER COLUMN MYCOL SET DATA TYPE VARCHAR(1024) . The question are:\

 

 

1) given this table can be quite large, such as 50 million rows, do you think increasing the size of this column will have any noticeable performance impact? I do not think this particular column is valued in many cases.

 

So there will be performance impact - imagine simple situation when DB is stopped. Once you start database, bufferpools are empty. So in order to do select on that table we need to read data from the disk. In DB2 we store data in pages (ex 16K), and read via extents (ex 32 pages at one singe IO).

So before alter on that extent you have given number of rows (lets assume that 100 rows per extent). So in order to do full select you need to read all extents for that table. However, once you alter, the space consumed by single row will increase (so lets assume that single extent will contain only 80 rows). As you can see, we then need more extents (pages) to store entire data. Consequence is obvious - more time to read.

Side note:

  • above is true not only for full select, but all selects. Even if you do some positioned select (like SELECT ... WHERE ROWID = '')... still we need to read entire extent which contains this row.

  • of course your data is varchar, so we can still have small length of actual values (and increase will not happen as "varchar" contains only needed amount of characters which is opposite to "char" which is always fixed length). However in boundary condition when your values are indeed around 1024 characters, increase will be as explained

  • please note that above is true for all operations. In SELECT we read bigger data, but INSERT or UPDATE need to store bigger data.

Side note 2:

  • additional consideration is that in DB2 columnar storage we have slightly different.
    • once you are row-organized, no matter what you select we read entire row (so even if you select columns other that extended one, we still need to read full row)

    • however if your tables are column-organized, the additional overhead will happen only on statements which operates on extended column

While above is in DB2 language, please note simple explanation: now your table is at given size (lets say 100GB). Once you alter, it will be 110GB. So your questions is as simple as this: how much more time I need in my storage to provide more data?

 

 

2) if we alter the table column size, do we need to rebuild/drop/recreate the index on this column? See current definitions below

 

Recommendation here is to REORG table along with indexes. Note that REORG of indexes will effectively rebuild those. So you need to execute something like:

db2 reorg table ... db2 reorg indexes all for table ...

See there: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0001966.html

[{"Business Unit":{"code":"BU029","label":"Data and AI"}, "Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":""}]

UID

ibm11140094