Enabling data caching for star schema queries

You can enable data caching to improve the performance of queries on star schemas.

About this task

Begin program-specific programming interface information. When data caching is enabled for star schema queries, Db2 caches data from work files that are used by star schema queries. Data caching provides the following advantages:
Immediate data availability
During a star join operation, work files might be scanned many times. If the work file data is cached in the dedicated virtual memory pool, that data is immediately available for join operations.
Reduced buffer pool contention
Because the virtual memory space for data caching is separated from the work file buffer pool, contention with the buffer pool is reduced. Reduced contention improves performance particularly when sort operations are performed concurrently.
The default virtual memory pool size is 20 MB. To set the pool size, use the SJMX, or DXPOOL parameter on the DSNTIP8 installation panel.

Procedure

To determine the best setting of the MAX DATA CACHING parameter for star schema queries:

  1. Determine the value of A. Estimate the average number of work files that a star schema query uses.
    In typical cases, with highly normalized star schemas, the average number is about three to six work files.
  2. Determine the value of B. Estimate the number of work file rows, the maximum length of the key, and the total of the maximum length of the relevant columns. Multiply these three values together to find the size of the data caching space for the work file, or the value of B.
  3. Multiply (A) × (B) to determine the size of the pool in MB.

Example

The following example shows how to determine the size of the virtual memory for data caching. Suppose that you issue the following star join query, where SALES is the fact table:
SELECT C.COUNTRY, P.PRDNAME, SUM(F.SPRICE)
  FROM SALES F, TIME T, PROD P, LOC L, SCOUN C
  WHERE F.TID = T.TID AND
        F.PID = P.PID AND
        F.LID = L.LID AND
        L.CID = C.CID AND 
        P.PCODE IN (4, 7, 21, 22, 53)
  GROUP BY .COUNTRY, P.PRDNAME; 

The following table shows the EXPLAIN output for this example query.

Table 1. EXPLAIN output for a star schema query
QBLOCK
NO
PLAN
NO
TNAME METHOD
JOIN_
TYPE
ACCESS
TYPE
ACCESS
NAME
PRIMARY
ACCESS
TYPE
1 1 TIME 0 S R    
1 2 PROD 1 S R   T
1 3 SALES 1 S I XSALES  
1 4 DSN_DIM_TBLX(02) 1   R   T
1 5   3        
2 1 LOC 0   R    
2 2 SCOUN 4   I XSCOUN  

For this query, two work files can be cached in memory. These work files, PROD and DSN_DIM_TBLX(02), are indicated by PRIMARY_ACCESSTYPE=T.

  1. In this example, the star join query uses two work files, PROD and DSN_DIM_TBLX(02). Therefore B = 2.
  2. Both PROD and DSN_DIM_TBLX(02) are used to determine the value of B.
    Recommendation: Average the values for a representative sample of work files, and round the value up to determine an estimate for a value of C:
    • The number of work file rows depends on the number of rows that match the predicate. For PROD, 87 rows are stored in the work file because 87 rows match the IN-list predicate. No selective predicate is used for DSN_DIM_TBLX(02), so the entire result of the join is stored in the work file. The work file for DSN_DIM_TBLX(02) holds 2800 rows.
    • The maximum length of the key depends on the data type definition of the table's key column. For PID, the key column for PROD, the maximum length is 4. DSN_DIM_TBLX(02) is a work file that results from the join of LOC and SCOUN. The key column that is used in the join is LID from the LOC table. The maximum length of LID is 4.
    • The maximum data length depends on the maximum length of the key column and the maximum length of the column that is selected as part of the star join. Add to the maximum data length 1 byte for nullable columns, 2 bytes for varying length columns, and 3 bytes for nullable and varying length columns.

      For the PROD work file, the maximum data length is the maximum length of PID, which is 4, plus the maximum length of PRDNAME, which is 24. Therefore, the maximum data length for the PROD work file is 28. For the DSN_DIM_TBLX(02) work file, the maximum data length is the maximum length of LID, which is 4, plus the maximum length of COUNTRY, which is 36. Therefore, the maximum data length for the DSN_DIM_TBLX(02) work file is 40.

    Consequently, for PROD, B = (87) ×(4 + 28) = 2784 bytes. For DSN_DIM_TBLX(02), B = (2800) × (4 + 40) = 123200 bytes.

    The average of these two estimated values for B is approximately 62 KB. Because the number of rows in each work file can vary depending on the selection criteria in the predicate, the value of B should be rounded up to the nearest multiple of 100 KB. Therefore B = 100 KB.

  3. The size of the pool is determined by multiplying (B) × (C) or, in this example, (2) × (100 KB) = 0.2 MB.

End program-specific programming interface information.