Examples of filtering using a geohash
These examples demonstrate how to improve query performance by filtering using geohashs.
Example 1
For this example, geohash filtering will be added to a query to improve performance.
The tables used in the example are the US_STATES table and the WALKING_PATHS table. The US_STATES table contains a row for each state in the United States. The WALKING_PATHS table contains a row for every walking path in a national park in the United States. These tables are defined as follows:
CREATE TABLE US_STATES (STATE_ID CHAR(2) PRIMARY KEY,
STATE_FULL_NAME VARCHAR(50),
STATE_GEO QSYS2.ST_POLYGON);
CREATE TABLE WALKING_PATHS (WALKING_ID VARCHAR(10) PRIMARY KEY,
WALKING_GEO QSYS2.ST_LINESTRING);
SELECT WALKING_ID,
STATE_FULL_NAME,
QSYS2.ST_ASTEXT(WALKING_GEO),
QSYS2.ST_ASTEXT(STATE_GEO)
FROM US_STATES S, WALKING_PATHS W
WHERE S.STATE_ID = 'MN'
AND QSYS2.ST_CONTAINS(STATE_GEO, WALKING_GEO) = 1;
- First, note the primary key and geometry column of each of the base tables.
- In US_STATES, the primary key is STATE_ID and the geometry column is STATE_GEO which contains a ST_POLYGON that represent the geometry of the state.
- In WALKING_PATHS, the primary key is WALKING_ID and the geometry column is WALKING_GEO which contains a ST_LINESTRING that represents the geometry of the path.
- Determine the size of the geometries to be covered, and use this information to decide which depth to use. The geohash covers of the two geometries that are compared must be computed using the same depth. For this example, a depth of 13 is selected. This is the recommended depth when working with a very large region, like a state.
- Create a geohash filter table that corresponds to each geospatial column in a base table. Define
a foreign key to enforce the dependency of the rows in the geohash table to the corresponding row in
the base table.
Use the ST_GEOHASHCOVER table function to populate each of the geohash filter tables with the geohash values that correspond to the geohash covers of the geometries. Multiple geohash values are generally returned for each geohash cover that is generated. This means that multiple rows will be inserted into the table for each row containing a geometry value in the base table.
-- Create and populate the geohash values that correspond to the US_STATES table CREATE TABLE US_STATES_HASH (STATE_ID CHAR(2), GEOHASH BIGINT, FOREIGN KEY (STATE_ID) REFERENCES US_STATES(STATE_ID)); INSERT INTO US_STATES_HASH (SELECT S.STATE_ID, T.GEOHASH FROM US_STATES S, TABLE(QSYS2.ST_GEOHASHCOVER(S.STATE_GEO, 13)) T); -- Create and populate the geohash values that correspond to the WALKING_PATHS table CREATE TABLE WALKING_PATHS_HASH (WALKING_ID VARCHAR(10), GEOHASH BIGINT, FOREIGN KEY (WALKING_ID) REFERENCES WALKING_PATHS(WALKING_ID)); INSERT INTO WALKING_PATHS_HASH (SELECT P.WALKING_ID, T.GEOHASH FROM WALKING_PATHS P, TABLE(QSYS2.ST_GEOHASHCOVER(P.WALKING_GEO, 13)) T);
- Rewrite the original query to use the filter tables to reduce the amount of data that needs to
be compared by the ST_CONTAINS scalar function. The tables containing the geohash values are joined
to the corresponding base tables. An additional predicate is added to the WHERE clause so only rows
where there is a state geohash value equal to a walking path geohash are compared by the ST_CONTAINS
function.
SELECT W.WALKING_ID, STATE_FULL_NAME, QSYS2.ST_ASTEXT(WALKING_GEO) AS PATH_GEOMETRY, QSYS2.ST_ASTEXT(STATE_GEO) AS STATE_GEOMETRY FROM US_STATES S JOIN US_STATES_HASH SH ON S.STATE_ID = SH.STATE_ID, WALKING_PATHS W JOIN WALKING_PATHS_HASH WH ON W.WALKING_ID = WH.WALKING_ID WHERE S.STATE_ID = 'MN' AND SH.GEOHASH = WH.GEOHASH AND QSYS2.ST_CONTAINS(STATE_GEO, WALKING_GEO) = 1;
Example 2
In this example, we want to find every city (a point) that is less than 10 kilometers in distance from a certain trail (a linestring).
CREATE TABLE US_CITIES (CITY_ID VARCHAR(10),
CITY_NAME VARCHAR(50),
STATE_ID CHAR(2),
CITY_GEO QSYS2.ST_POINT,
CITY_GEOHASH BIGINT);
There are several ways to populate the CITY_GEOHASH column. For this example, the geohash will use a depth of 23, approximately the size of a forest or lake. Here are a few ways to generate the geohash value.
- To set geohash values for every row in a table that has a point value defined for a city but
does not yet have a geohash value calculated, use an UPDATE statement to set the CITY_GEOHASH
column.
UPDATE US_CITIES SET CITY_GEOHASH = (SELECT GEOHASH FROM TABLE(QSYS2.ST_GEOHASH(CITY_GEO,23))) WHERE CITY_GEOHASH IS NULL;
- A trigger can be used to populate the CITY_GEOHASH column when a new city is inserted into the
US_CITIES table. The following create trigger statement uses the value assigned to the CITY_GEO
column to generate the corresponding geohash value. Once this trigger has been created, every
future insert into the US_CITIES table will fire this trigger, assigning a value to the CITY_GEOHASH
column as part of the insert operation.
CREATE TRIGGER GEOHASH_CITIES BEFORE INSERT ON US_CITIES REFERENCING NEW AS N FOR EACH ROW BEGIN SET N.CITY_GEOHASH = (SELECT GEOHASH FROM TABLE(QSYS2.ST_GEOHASH(N.CITY_GEO,23))); END;
CREATE VARIABLE TRAIL QSYS2.ST_LINESTRING;
SET TRAIL = QSYS2.ST_LINESTRING('linestring (-92.51864276919596 44.05903689443092,
-92.57146126853864 44.10851586848976,
-92.64386916754349 44.20339334725808)');
For this example, filtering will be added to the following query to improve performance. The query returns a row for every city that is less than 10 kilometers from the trail that is contained in the TRAIL global variable.
SELECT * FROM US_CITIES C
WHERE QSYS2.ST_DISTANCE(C.CITY_GEO, TRAIL) < 10000;
- A geohash value is available in the US_CITIES table for each city. We know it was calculated with a depth of 23.
- The global variable is a linestring and needs a geohash cover to compare with the geohash value for each city. To have a meaningful compare, the depth for the geohash cover must also use a depth of 23. The geohash cover values for the linestring will be generated when the query is run.
- Rewrite the original query to use the CITY_GEOHASH column to reduce the amount of data that
needs to be considered for the ST_DISTANCE calculation. For this example, an exact geohash value for
the linestring is not needed, so the faster ST_FUZZYGEOHASHCOVEREXTEND table function is used to
generate the geohash minimum bounding rectangle to use as the filter. The EXTEND version of the
function is used to get the geohash regions that form a 10 kilometer buffer around the linestring
for the trail. The extended geohash values are needed since the query wants to find all cities
within a 10 kilometer distance of the trail. The rewrite of the query is as follows:
WITH TRAIL_GEOHASH AS (SELECT * FROM TABLE(QSYS2.ST_FUZZYGEOHASHCOVEREXTEND(TRAIL, 23, 10000))) SELECT * FROM US_CITIES C, TRAIL_GEOHASH T WHERE C.CITY_GEOHASH = T.GEOHASH AND QSYS2.ST_DISTANCE(C.CITY_GEO, TRAIL) < 10000;