Working with spatial indexes
In most database applications it is important to define indexes on columns in order to avoid scanning all the rows of a table and checking each row to see if it meets the criteria of a query. This is especially important with spatial queries, because the spatial checking is more complex than checking simple numeric or character values.
In a relational database, an index is typically created on the values in a column where there is a simple linear sequencing of values from smallest to largest. This allows a binary tree or B-tree index to be created, which allows the database to quickly navigate to a particular value or sequential range of values.
Spatial data is inherently two-dimensional (or greater), which is not directly applicable to a B-tree index, so different techniques have been developed. The approach used in Spatial Extender is a grid index, which associates spatial values with a square grid. Line and polygon spatial values may intersect more than one grid cell, while a point spatial value can intersect at most one grid cell. When defining the grid cell sizes, you need to trade off the need to minimize the number of cells that intersect a spatial value against the need to minimize the number of spatial values contained within a grid cell. In order to efficiently handle spatial values with significantly different sizes, you can specify up to three grid cell sizes. The grid cells are then indexed by Spatial Extender using the DB2 B-tree index.
A spatial index is created with the DB2
create index command with an additional
extend using clause, as shown in Listing 9.
Listing 9. Using the create index command
create index test.countiesidx on test.counties(shape) extend using db2gse.spatial_index(1.2, 3.0, 14.0);
The units to specify the grid cell sizes are the same units as the coordinate system, which are degrees of latitude and longitude for the counties data. As you might expect, the counties have quite a range of sizes, so it makes sense to specify different grid cell sizes that best fit the different county sizes.
A degree of latitude is approximately 70 miles or 110 kilometers. A degree of longitude varies from approximately 70 miles at the equator to about 45 miles or 72 kilometers at 50 degrees of latitude.
Determining an optimal grid cell size or grid cell sizes may not be obvious. In general, for line and polygon spatial values, the grid cell size should be a little larger than the size that would hold an average size line or polygon. For point spatial values, the grid cell size should be about 1/10 of the width of a typical query region.
In order to simplify this determination, two tools are provided that analyze the spatial values in a table and provide recommended grid cell sizes to use when creating the index. Don't worry about getting this precisely correct; the performance will be good for a fairly wide variation in the index parameters.
The gseidx command-line tool is provided with the installation of Spatial Extender and is available on all of the supported platforms. Significant information about spatial indexes and the details of gseidx is provided in the DB2 Infocenter in the section Using indexes and views to access spatial data.
Listing 10 shows a simple example of using gseidx.
Listing 10. Contents of advise_counties.sql
!gseidx "connect to sample get geometry statistics for column test.counties(shape) advise";
This command can be executed using the command
db2 -tvf advise_counties.sql and will
result in the output shown in Listing 11.
Listing 11. Output of db2 -tvf advise_counties.sql command
Number of Rows: 3141 Number of non-empty Geometries: 3141 Number of empty Geometries: 0 Number of null values: 0 Extent covered by data: Minimum X: -178.217598 Maximum X: -66.969271 Minimum Y: 18.921786 Maximum Y: 71.406235 Query Window Size: Suggested Grid Sizes: Index Entry Cost: -------------------- ----------------------------- ---------------------- 0.01: 1.2, 3, 14 4.6 0.02: 1.2, 3, 14 4.6 0.05: 1.2, 3, 14 4.7 0.1: 1.2, 3, 14 4.9 0.2: 1.2, 3, 14 5.3 0.5: 1.2, 3, 14 6.6 1: 1.2, 3, 14 9.3 2: 1.2, 3, 14 16 5: 1.8, 3.6, 13 49 10: 2.9, 8.7, 26 130 20: 4.6, 14, 49 390 50: 12, 72, 0 1700
The first two sections of Listing 11 provide statistics about the spatial data that was analyzed. The middle three columns of the last section provide suggested grid sizes to use in creating a spatial index.
The first column contains values that correspond to typical query window sizes. In the case of an application that is displaying a map, this is the typical width of the region that would be displayed. For example, if a user would most commonly display a map that was 10 miles across, this would correspond to about 0.25 degrees. Looking at the table, you can see that the suggested grid sizes are the same for query window sizes of 0.2 and 0.5 degrees, so you would use (1.2, 3, 14) to create the index.
The last column isn't particularly important. It is an estimate of the number of index entries that would be referenced in satisfying a query.
You can also download a Java-based spatial index advisor from the Spatial Extender web site. This tool provides a user interface to select a spatial table and column to analyze. See Resources for where to obtain this tool.
An example of the user interface for the Java index advisor is shown in Figure 1.
Figure 1. Java index advisor user interface
The algorithm used in this index advisor varies from the gseidx tool and results in different recommendations. However, the suggested values should work as well. The cost value is the estimated number of index pages that would be referenced, not the number of index entries as reported by the gseidx tool.