Examples of filtering using a geohash

These examples demonstrate how to improve query performance by filtering using geohashs.

Example 1

For this example, geohash filtering will be added to a query to improve performance.

The tables used in the example are the US_STATES table and the WALKING_PATHS table. The US_STATES table contains a row for each state in the United States. The WALKING_PATHS table contains a row for every walking path in a national park in the United States. These tables are defined as follows:

CREATE TABLE US_STATES (STATE_ID CHAR(2) PRIMARY KEY, 
                        STATE_FULL_NAME VARCHAR(50), 
                        STATE_GEO QSYS2.ST_POLYGON);

CREATE TABLE WALKING_PATHS (WALKING_ID VARCHAR(10) PRIMARY KEY, 
                            WALKING_GEO QSYS2.ST_LINESTRING);
The following query is being enhanced to use geohashes. It returns a row for every walking path that is completely contained inside the state of Minnesota (MN) in the United States.
SELECT WALKING_ID, 
       STATE_FULL_NAME, 
       QSYS2.ST_ASTEXT(WALKING_GEO), 
       QSYS2.ST_ASTEXT(STATE_GEO) 
  FROM US_STATES S, WALKING_PATHS W
  WHERE S.STATE_ID = 'MN'
    AND QSYS2.ST_CONTAINS(STATE_GEO, WALKING_GEO) = 1;
  1. First, note the primary key and geometry column of each of the base tables.
    • In US_STATES, the primary key is STATE_ID and the geometry column is STATE_GEO which contains a ST_POLYGON that represent the geometry of the state.
    • In WALKING_PATHS, the primary key is WALKING_ID and the geometry column is WALKING_GEO which contains a ST_LINESTRING that represents the geometry of the path.
    The primary key will be used to enforce a one to many relationship between the base table and the geohash table we are about to create. This guarantees all the entries in the geohash table have a parent row in the base table. While a primary key is not required, it is a best practice for a relational data model.
  2. Determine the size of the geometries to be covered, and use this information to decide which depth to use. The geohash covers of the two geometries that are compared must be computed using the same depth. For this example, a depth of 13 is selected. This is the recommended depth when working with a very large region, like a state.
  3. Create a geohash filter table that corresponds to each geospatial column in a base table. Define a foreign key to enforce the dependency of the rows in the geohash table to the corresponding row in the base table.

    Use the ST_GEOHASHCOVER table function to populate each of the geohash filter tables with the geohash values that correspond to the geohash covers of the geometries. Multiple geohash values are generally returned for each geohash cover that is generated. This means that multiple rows will be inserted into the table for each row containing a geometry value in the base table.

    -- Create and populate the geohash values that correspond to the US_STATES table
    CREATE TABLE US_STATES_HASH (STATE_ID CHAR(2), 
                                 GEOHASH BIGINT, 
                                 FOREIGN KEY (STATE_ID) REFERENCES US_STATES(STATE_ID));
    INSERT INTO US_STATES_HASH (SELECT S.STATE_ID, T.GEOHASH 
                                  FROM US_STATES S,
                                       TABLE(QSYS2.ST_GEOHASHCOVER(S.STATE_GEO, 13)) T);
                                              
    -- Create and populate the geohash values that correspond to the WALKING_PATHS table
    CREATE TABLE WALKING_PATHS_HASH (WALKING_ID VARCHAR(10), 
                                    GEOHASH BIGINT, 
                                    FOREIGN KEY (WALKING_ID) REFERENCES WALKING_PATHS(WALKING_ID));
    INSERT INTO WALKING_PATHS_HASH (SELECT P.WALKING_ID, T.GEOHASH 
                                     FROM WALKING_PATHS P,
                                          TABLE(QSYS2.ST_GEOHASHCOVER(P.WALKING_GEO, 13)) T);
  4. Rewrite the original query to use the filter tables to reduce the amount of data that needs to be compared by the ST_CONTAINS scalar function. The tables containing the geohash values are joined to the corresponding base tables. An additional predicate is added to the WHERE clause so only rows where there is a state geohash value equal to a walking path geohash are compared by the ST_CONTAINS function.
    SELECT W.WALKING_ID, 
           STATE_FULL_NAME, 
           QSYS2.ST_ASTEXT(WALKING_GEO) AS PATH_GEOMETRY, 
           QSYS2.ST_ASTEXT(STATE_GEO) AS STATE_GEOMETRY  
      FROM US_STATES S JOIN US_STATES_HASH SH ON S.STATE_ID = SH.STATE_ID, 
           WALKING_PATHS W JOIN WALKING_PATHS_HASH WH ON W.WALKING_ID = WH.WALKING_ID
      WHERE S.STATE_ID = 'MN' AND
            SH.GEOHASH = WH.GEOHASH AND
            QSYS2.ST_CONTAINS(STATE_GEO, WALKING_GEO) = 1;

Example 2

In this example, we want to find every city (a point) that is less than 10 kilometers in distance from a certain trail (a linestring).

The table used in the example is the US_CITIES table, which contains a row for each city in the United States. The geometry used to represent a city is an ST_POINT. Since a point is represented by a single geohash value, the geohash value can be stored in an additional column in the table. The create table statement for the US_CITIES table is:
CREATE TABLE US_CITIES (CITY_ID VARCHAR(10), 
                        CITY_NAME VARCHAR(50), 
                        STATE_ID CHAR(2), 
                        CITY_GEO QSYS2.ST_POINT, 
                        CITY_GEOHASH BIGINT);

There are several ways to populate the CITY_GEOHASH column. For this example, the geohash will use a depth of 23, approximately the size of a forest or lake. Here are a few ways to generate the geohash value.

  1. To set geohash values for every row in a table that has a point value defined for a city but does not yet have a geohash value calculated, use an UPDATE statement to set the CITY_GEOHASH column.
    UPDATE US_CITIES SET CITY_GEOHASH = (SELECT GEOHASH FROM TABLE(QSYS2.ST_GEOHASH(CITY_GEO,23))) 
                                WHERE CITY_GEOHASH IS NULL;
  2. A trigger can be used to populate the CITY_GEOHASH column when a new city is inserted into the US_CITIES table. The following create trigger statement uses the value assigned to the CITY_GEO column to generate the corresponding geohash value. Once this trigger has been created, every future insert into the US_CITIES table will fire this trigger, assigning a value to the CITY_GEOHASH column as part of the insert operation.
    CREATE TRIGGER GEOHASH_CITIES
      BEFORE INSERT ON US_CITIES
      REFERENCING NEW AS N FOR EACH ROW
      BEGIN
        SET N.CITY_GEOHASH = (SELECT GEOHASH FROM TABLE(QSYS2.ST_GEOHASH(N.CITY_GEO,23)));
      END;
The other construct that is used in this example is an ST_LINESTRING global variable which will contain the trail value. The variable is defined like this:
CREATE VARIABLE TRAIL QSYS2.ST_LINESTRING;
At some point in the application, the global variable is assigned a linestring value that corresponds to a specific state trail. This is not a piece of information that is stored in a table, so no geohash value is permanently associated with it.

SET TRAIL = QSYS2.ST_LINESTRING('linestring (-92.51864276919596 44.05903689443092,
                                             -92.57146126853864 44.10851586848976, 
                                             -92.64386916754349 44.20339334725808)');

For this example, filtering will be added to the following query to improve performance. The query returns a row for every city that is less than 10 kilometers from the trail that is contained in the TRAIL global variable.

SELECT * FROM US_CITIES C
  WHERE QSYS2.ST_DISTANCE(C.CITY_GEO, TRAIL) < 10000;
  1. A geohash value is available in the US_CITIES table for each city. We know it was calculated with a depth of 23.
  2. The global variable is a linestring and needs a geohash cover to compare with the geohash value for each city. To have a meaningful compare, the depth for the geohash cover must also use a depth of 23. The geohash cover values for the linestring will be generated when the query is run.
  3. Rewrite the original query to use the CITY_GEOHASH column to reduce the amount of data that needs to be considered for the ST_DISTANCE calculation. For this example, an exact geohash value for the linestring is not needed, so the faster ST_FUZZYGEOHASHCOVEREXTEND table function is used to generate the geohash minimum bounding rectangle to use as the filter. The EXTEND version of the function is used to get the geohash regions that form a 10 kilometer buffer around the linestring for the trail. The extended geohash values are needed since the query wants to find all cities within a 10 kilometer distance of the trail.
    The rewrite of the query is as follows:
    WITH TRAIL_GEOHASH AS 
       (SELECT * FROM TABLE(QSYS2.ST_FUZZYGEOHASHCOVEREXTEND(TRAIL, 23, 10000)))
    SELECT * FROM US_CITIES C, 
                  TRAIL_GEOHASH T
    WHERE C.CITY_GEOHASH = T.GEOHASH AND
       QSYS2.ST_DISTANCE(C.CITY_GEO, TRAIL) < 10000;