To achieve optimal performance of queries in a partitioned
database environment, the table rows that are needed to satisfy the
join condition must be co-located. That is, these joins must use table
rows that exist on one node without having to access rows from a table,
or parts of a table, on another node.
About this task
Spatial
joins are often used in applications that find customers within particular
polygons or determine the flood risk of customers. Here is an example
of a query to determine flood risk:Select
c.name,
c.address,
f.risk
from customers as c,
floodpoly as f
where db2gse.st_within(c.location, f.polygeom) = 1
In
this example, the customers table is quite large
and distributed across multiple partitions. The flood polygon information
is small. To implement the query efficiently, ensure that the entire
flood polygon table is available on each partition that contains customer
data.
Procedure
- Import the polygon data onto a single partition.
- Create a materialized query table (MQT) that is replicated
across the partitions that contain the customer data.
For
example, after the table floodpoly has been created
and loaded, you can create the replicated MQT with a statement like
this:
CREATE TABLE floodpoly
AS ( SELECT * FROM floodpoly)
DATA INITIALLY DEFERRED
REFRESH DEFERRED
ENABLE QUERY OPTIMIZATION
MAINTAINED BY SYSTEM
DISTRIBUTE BY REPLICATION IN nodestbs
REFRESH TABLE floodpoly;
The replicated
MQT with a spatial column can be either user-maintained or system-maintained,
but it must use the REFRESH DEFERRED option, not REFRESH IMMEDIATE.
-
To have the Db2® query compiler choose
the MQT instead of the base table, you can set the following Db2 parameters:
UPDATE DB CFG USING DFT_REFRESH_AGE ANY;
UPDATE DB CFG USING DFT_MTTB_TYPES ALL;
What to do next
Use the Db2
explain tools to determine how efficiently the query will run.