Improving query performance on spatial data in a partitioned environment

To achieve optimal performance of queries in a partitioned database environment, the table rows that are needed to satisfy the join condition must be co-located. That is, these joins must use table rows that exist on one node without having to access rows from a table, or parts of a table, on another node.

About this task

Spatial joins are often used in applications that find customers within particular polygons or determine the flood risk of customers. Here is an example of a query to determine flood risk:
Select 
  c.name,
  c.address,
  f.risk
from customers as c,
         floodpoly as f
where db2gse.st_within(c.location, f.polygeom) = 1

In this example, the customers table is quite large and distributed across multiple partitions. The flood polygon information is small. To implement the query efficiently, ensure that the entire flood polygon table is available on each partition that contains customer data.

Procedure

  1. Import the polygon data onto a single partition.
  2. Create a materialized query table (MQT) that is replicated across the partitions that contain the customer data.
    For example, after the table floodpoly has been created and loaded, you can create the replicated MQT with a statement like this:
    CREATE TABLE floodpoly 
    AS ( SELECT * FROM  floodpoly) 
    DATA INITIALLY DEFERRED 
    REFRESH DEFERRED 
    ENABLE QUERY OPTIMIZATION 
    MAINTAINED BY SYSTEM 
    DISTRIBUTE BY REPLICATION IN nodestbs
    
    REFRESH TABLE floodpoly;
    The replicated MQT with a spatial column can be either user-maintained or system-maintained, but it must use the REFRESH DEFERRED option, not REFRESH IMMEDIATE.
  3. To have the Db2® query compiler choose the MQT instead of the base table, you can set the following Db2 parameters:
    UPDATE DB CFG USING DFT_REFRESH_AGE ANY;
    UPDATE DB CFG USING DFT_MTTB_TYPES ALL;
    For more information about these parameters, see UPDATE DATABASE CONFIGURATION command.UPDATE DATABASE CONFIGURATION command in the Command Reference.

What to do next

Use the Db2 explain tools to determine how efficiently the query will run.