Enabling data caching for star schema queries
You can enable data caching to improve the performance of queries on star schemas.
About this task
- Immediate data availability
- During a star join operation, work files might be scanned many times. If the work file data is cached in the dedicated virtual memory pool, that data is immediately available for join operations.
- Reduced buffer pool contention
- Because the virtual memory space for data caching is separated from the work file buffer pool, contention with the buffer pool is reduced. Reduced contention improves performance particularly when sort operations are performed concurrently.
Procedure
To determine the best setting of the MAX DATA CACHING parameter for star schema queries:
Example
SELECT C.COUNTRY, P.PRDNAME, SUM(F.SPRICE)
FROM SALES F, TIME T, PROD P, LOC L, SCOUN C
WHERE F.TID = T.TID AND
F.PID = P.PID AND
F.LID = L.LID AND
L.CID = C.CID AND
P.PCODE IN (4, 7, 21, 22, 53)
GROUP BY .COUNTRY, P.PRDNAME;
The following table shows the EXPLAIN output for this example query.
QBLOCK
NO |
PLAN
NO |
TNAME | METHOD | JOIN_
TYPE |
ACCESS
TYPE |
ACCESS
NAME |
PRIMARY
ACCESS TYPE |
---|---|---|---|---|---|---|---|
1 | 1 | TIME | 0 | S | R | ||
1 | 2 | PROD | 1 | S | R | T | |
1 | 3 | SALES | 1 | S | I | XSALES | |
1 | 4 | DSN_DIM_TBLX(02) | 1 | R | T | ||
1 | 5 | 3 | |||||
2 | 1 | LOC | 0 | R | |||
2 | 2 | SCOUN | 4 | I | XSCOUN |
For this query, two work files can be cached in memory. These work files, PROD and DSN_DIM_TBLX(02), are indicated by PRIMARY_ACCESSTYPE=T.
- In this example, the star join query uses two work files, PROD and DSN_DIM_TBLX(02). Therefore B = 2.
- Both PROD and DSN_DIM_TBLX(02) are used to determine the value
of B.Recommendation: Average the values for a representative sample of work files, and round the value up to determine an estimate for a value of C:
- The number of work file rows depends on the number of rows that match the predicate. For PROD, 87 rows are stored in the work file because 87 rows match the IN-list predicate. No selective predicate is used for DSN_DIM_TBLX(02), so the entire result of the join is stored in the work file. The work file for DSN_DIM_TBLX(02) holds 2800 rows.
- The maximum length of the key depends on the data type definition of the table's key column. For PID, the key column for PROD, the maximum length is 4. DSN_DIM_TBLX(02) is a work file that results from the join of LOC and SCOUN. The key column that is used in the join is LID from the LOC table. The maximum length of LID is 4.
- The maximum data length depends on the maximum length of the key
column and the maximum length of the column that is selected as part
of the star join. Add to the maximum data length 1 byte for nullable
columns, 2 bytes for varying length columns, and 3 bytes for nullable
and varying length columns.
For the PROD work file, the maximum data length is the maximum length of PID, which is 4, plus the maximum length of PRDNAME, which is 24. Therefore, the maximum data length for the PROD work file is 28. For the DSN_DIM_TBLX(02) work file, the maximum data length is the maximum length of LID, which is 4, plus the maximum length of COUNTRY, which is 36. Therefore, the maximum data length for the DSN_DIM_TBLX(02) work file is 40.
The average of these two estimated values for B is approximately 62 KB. Because the number of rows in each work file can vary depending on the selection criteria in the predicate, the value of B should be rounded up to the nearest multiple of 100 KB. Therefore B = 100 KB.
- The size of the pool is determined by multiplying (B) × (C) or, in this example, (2) × (100 KB) = 0.2 MB.