IBM Support

VARCHAR Data Type and How the Allocate Parameter Impacts Performance and Storage

Troubleshooting


Problem

This document contains a series of questions and answers related to the VARCHAR data type and how the allocate parameter impacts input/output operations and storage.

Resolving The Problem

This document contains a series of questions and answers related to the VARCHAR data type and how the allocate parameter impacts input/output operations and storage.

Q1: If I insert only 50, 30, or 80 bytes into a VARCHAR column (defined as 254 with ALLOCATE(0) ), will I incur two I/Os?

A1: Yes, it will be two I/0s because ALLOCATE(0) was specified. To avoid an extra I/0, use ALLOCATE(30) or (50). We suggest allocating enough so most of your input will fit and only the long ones will overflow the allocated part.

The Information Center has some documentation on this topic, such as the following:

In many applications, performance must be considered. If you use the default ALLOCATE(0), it will double the disk unit traffic because ALLOCATE(0) requires two reads. The first one reads the fixed-length portion of the row and the second one reads the overflow space. The variable-length implementation, with the carefully chosen ALLOCATE, minimizes overflow and space and maximizes performance. The size of the table is 28% smaller than the fixed-length implementation. Because 1% of rows are in the overflow area, the access requiring two reads is minimized. The variable-length implementation performs about the same as the fixed-length.

Q2: Why does the system not use the Length to determine that it is 50 bytes and at least insert the 50 bytes into the 'fixed' area and not put anything into the overflow?

A2: You must define the file to ALLOCATE(50) before it will use the fixed part. When it is defined as ALLOCATE(0), then all inserts must go in the overflow piece.

Q3: If I Index the VARCHAR column, will the Index entry be 50 bytes or 254 bytes?

A3: The key will be about 50 bytes (with some extra for encoding) and not 254 because we will not put trailing padding in keys for VARCHAR fields.

Q4: If I use ALLOCATE(30), will the index entry be 30 bytes or 254 bytes?

A4: The key size will be whatever size the data is. If 30 bytes key, it will be about 30 bytes; however, the allocate length does not affect key size.

Q5: It sounds like the SQE is 'More-Aggressive' with Asynchronous operations and might offset (improve) performance when it comes to VARCHAR. In this case, it might negate the need for ALLOCATE greater than zero. Is this true?

A5: This is hard to measure because there are too many factors (including how much other data is being touched as part of a particular query). In any case, we do not think it is ever optimal to put in ALLOCATE(0) because it requires two I/0s no matter what.

Q6: What happens in journal entries on the table with a column that has:

a. ALLOCATE(0) VARCHAR(254)
b. ALLOCATE(30) VARCHAR(254)

A6: Journal always uses the actual length of the field to journal the entry. So ALLOCATE(0) and ALLOCATE(30) VARCHAR(254) are the same. It would use the actual length of the data in the entry and the ALLOCATE length would not make a difference.

Q7: How are the following SMAPP journal entries handled from a storage perspective?

a. ALLOCATE(0) VARCHAR(254)
b. ALLOCATE(30) VARCHAR(254)

A7: SMAPP works the same as regular journaling, where the actual length is what is journaled.
Q8: How to I see if my allocate is doing an adequate job?

A8: You can use the SQL statement:
select table_name, COLUMN_NAME, AVERAGE_COLUMN_LENGTH,
MAXIMUM_COLUMN_LENGTH,                                
       LENGTH_AT_90TH_PERCENTILE, OVERFLOW_ROWS       
  from qsys2.syscolumnstat                            
  where table_schema = ucase('xxx') and                    
        LENGTH_AT_90TH_PERCENTILE is not null         
  order by 2 desc                                     
Where xxx  is the library you want to check on
This only works on R740 on DB fixpack 23 or higher, and R750 on DB Fixpack 3 or higher.
For current information, you may need to refresh statistics through ACS via: 
  1. From Schemas and your Schema, go to Tables
  2. Rick click on your Table
  3. Go to Statistic Data
  4. Add in the columns to refresh statistic data, and choose a collection option
Q9: Is there a tool to help me find a good allocate value?

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000001i3CAAQ","label":"IBM i Db2-\u003EDDS - Data Definition Specifications"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0;and future releases"}]

Historical Number

392625078

Document Information

Modified date:
05 December 2024

UID

nas8N1015325