Without much fuss Large Object (LOB) support has gotten a big overhaul in DB2 9.5 and DB2 9.7.
In this article I'll try to bring all the various improvements together.
- Cursor blocking
When DB2 does cursor blocking that means that instead of shipping one row at a time upon every fetch DB2 will send a number of rows in one chunk. Subsequent fetches by the application are then served from this cache of resultset rows until the set is exhausted and the next block is delivered.
The benefit is obvious. In most cases the number of network trips can be reduced by an order of magnitude which has a major impact on the performance of a loop through the application.
Prior to DB2 9.5 cursor blocking was disabled if a LOB column was fetched.
- LOB inlining within the result set
Prior to DB2 9.5 a resultset containing a LOB column would always only ship the so-called LOB-locator.
That is instead of retrieving the LOB the client would merely receive a ticket.
When the LOB is actually being looked at another trip across the network was needed to retrieve the actual LOB data.
In DB2 9.7 any LOB that is "small", that is it fits into a VARCHAR, is passed back to the client inline within the row of the result set.
- LOB inlining in the data page.
Prior to DB2 9.7 any LOB other than a NULL was always stored outside of the row.
This had to three effects:
Especially the inability to use the bufferpool can have severe impact on the performance compared to using e.g. a VARCHAR column.
- LOBs were never buffered in the buffer pool
- Extra storage is consumed for the "pointer" which tells the row where the LOB value resides.
- LOBs were never compressed
In DB2 9.7 an individual LOB value is inlined into the row if it isn't bigger than the user defined inline length.
The assumption here is that while a LOB maybe defined to be megabyte sized in reality many actual LOB values are very small and could easily fit into a VARCHAR.
When moving an existing application from DB2 9.1 up to 9.5 or 9.7 you can expect a very significant uplift in speed just from the way result sets are processed. This is automatic.
The inline LOBs in the datapage however only kick in automatically for very small LOB and only over time.
That is a LOB will only be inlined for new or updated rows and then only if it is so small that it is smaller than the pointer would be which is often only a few dozen bytes depending on the defined size of the LOB.
So in order to benefit from inline LOBs you have to manually specify the appropriate length.
Here is how that is done.
CREATE BUFFERPOOL bp32 PAGESIZE 32 K;
CREATE TABLESPACE tb32 PAGESIZE 32 K BUFFERPOOL bp32;
CREATE TABLE bio(id INTEGER NOT NULL PRIMARY KEY, doc CLOB(2M) INLINE LENGTH 1004)
With this definition all biographies that are at most 1000 bytes long will be inlined in the datapage, benefit from the buffer pool and compression if enabled. The extra 4 bytes are to store the length of the LOB.
But how do you know which value is right here? Should you aim to inline all bios? Certainly that would be ideal, right?
It is important to keep in mind that there is a reason why LOBs are not normally in the bufferpool. They can be big!
If make the inline length too big then two things will happen:
- Table scans get slower because less rows fit on a page
- Buffer pool hit ration goes down unless you have enough room to increase the buffer pool.
In fact, tests have shown that given a fixed bufferpool the sweetspot for inlining LObs can be as low as 20% inlined depending on the LOB sizes. That is with more LOBs inlined in this test the transaction rate went down because the bufferpool hit ration dropped from 98% towards 90% as more LOBs got inlined.
It was still higher than without inlining, but the highest throughput was not anywhere near 100% inlining.
Personally, and given some leeway with the bufferpool hit ratio I am a believer in the 80/20 rule.
So I try to inline 80% of the LOBs.
Now, the obvious problem is? How do I find which ratio matches with which inline length?
Here is a little routine that can, for an existing table tell us the correlation.
--#SET TERMINATOR @
CREATE OR REPLACE PROCEDURE profile_lob
DECLARE txt CLOB(128K) DEFAULT '';
DECLARE res CURSOR WITH RETURN TO CLIENT FOR STMT;
SET txt =
'SELECT MIN(len) AS "LENGTH(""' || colname || '"")", ' ||
' percentile ' ||
' FROM ' ||
'(SELECT SMALLINT(TRUNCATE(DECFLOAT(ROW_NUMBER() OVER (ORDER BY LENGTH("' || colname || '") ' ||
' ASC NULLS FIRST)) ' ||
' * 200 / COUNT(*) OVER(), -1) / 2) AS percentile, ' ||
' LENGTH("' || colname || '") AS len ' ||
' FROM "' || tabschema || '"."' || tabname || '")' ||
'GROUP BY percentile ORDER BY percentile';
PREPARE stmt FROM txt;
--#SET TERMINATOR ;
Putting the procedure to the test on SYSIBM.SYSTABLES.PACKED_DESC:
db2 => CALL profile_lob('SYSIBM', 'SYSTABLES', 'PACKED_DESC');
Result set 1
21 record(s) selected.
Return Status = 0
So if we were allowed to inline columns in a system table and we chose 80% inlining we would need to set it to 4376 + 4 bytes.
between 10% and 15% of the values are actually NULL, so there is no need to do anything.
To increase the inline length for a LOB column use the ALTER TABLE statement:
ALTER TABLE bio ALTER COLUMN doc SET INLINE LENGTH 20000;
Note that the inline length can only ever be increased, but never be decreased.
Also increasing the inline length will not change any data already stored, only row will inline LOBs using the new length or updates to the LOB column itself.
To increase inlining for the existing rows you can either update all the eligible rows:
UPDATE bio SET doc = doc WHERE LENGTH(doc) + 4 <= 20000;
Ensure you have enough log space to execute the operation.
Also performing such an update may spill many rows out of the page begging for a REORG to be done.
Another option is to be pre-emptive about the reorg and do that instead of running the update statement
REORG TABLE bio LONGLOBDATA;