Manage spatial data with IBM DB2 Spatial Extender, Part 1: Acquiring spatial data and developing applications

Tips and techniques for developing efficient spatial applications

This tutorial series describes common tasks to manage spatial data with IBM DB2® Spatial Extender, including importing and creating spatial data, constructing and executing spatial queries, working with IBM, third party, and open source spatial tools, tuning performance, and considering special circumstances in a data warehouse environment. In this first article in the series, learn how to acquire spatial data and build applications. Learn how to use shapefiles, spatial data tables, and spatial indexes.

Share:

David Adler (dadler@us.ibm.com), Senior Software Engineer, IBM China

David AdlerDavid Adler has been responsible for the development of spatial database technology in IBM for over 20 years, the past 10 years in DB2 Spatial Extender Development.



16 February 2012

Also available in Chinese Portuguese Spanish

Before you start

Learn what to expect from this tutorial, and how to get the most out of it.

About this series

For over 10 years, the DB2 Spatial Extender has provided the capability of managing and analyzing spatial data stored in DB2 relational tables. Table columns can be defined with spatial types for points, lines, and polygons, which can represent objects such as the following:

Point locations
Customers, retail stores, electrical transformers, cell towers
Lines
Highways, coastline, delivery routes, electrical transmission lines
Polygons
Sales/service territories, flood/fire risk, states/provinces/counties

SQL queries can incorporate spatial functions to analyze spatial relationships, such as finding customers within a floodzone. Listing 1 shows an example of code that incorporates spatial functions.

Listing 1. Query that incorporates spatial functions
SELECT cust_name, cust_addr
FROM customers, floodzones
WHERE ST_Within(cust_loc, flood_loc) = 1

The result of this query produces a set of customer names and addresses whose houses could be at risk of flooding. This information is useful in calculating insurance rates or sending out an offer to purchase flood insurance.

This series of tutorials takes you through common tasks when working with spatial data in the DB2 Spatial Extender. This includes importing and creating spatial data, constructing and executing spatial queries, working with IBM, third party, and open source spatial tools, tuning performance, and considering special circumstances in a data warehouse environment.

Although the focus of this tutorial series is DB2 Spatial Extender on the Linux®, UNIX®, and Windows® operating systems, many of the concepts are equally applicable to other IBM database offerings with spatial capability, including the following:

  • Spatial Support for DB2 for z/OS®
  • Informix® Spatial Datablade
  • Netezza® Spatial

Tutorials in this series

The following additional tutorials are planned for this 5-part series.

Part 2: Implementing typical spatial use cases
There are a number of typical use cases in spatial application that include spatial joins, distance queries, geofencing, and nearest-n queries. This tutorial provides examples for implementing these use cases efficiently.
Part 3: Spatial application tuning
Application performance is always an important consideration. This tutorial looks at factors in spatial application performance and tools for understanding query behavior. Some of the factors considered include inline data, data clustering, compression, and generalization.
Part 4: Spatial analysis in a data warehouse environment
Achieving good scalability in an InfoSphere Warehouse shared-nothing environment requires additional considerations. This tutorial addresses the techniques to get good spatial query co-location. It also provides special considerations when working with Esri middleware.
Part 5: Spatial visualization tools from IBM, third parties, and open sources
Visualization of spatial data is often an important aspect of a spatial application. This tutorial introduces some of the tools that are available, including the free IBM Geobrowser, Esri software, web-mapping services, such as Google maps, and open-source software that uses the GeoTools driver for DB2.

About this tutorial

This tutorial introduces you to the technology provided by DB2 Spatial Extender. The tutorial also offers approaches to populating DB2 tables with spatial data: a critical step before spatial queries can be implemented. This tutorial also addresses environments for executing spatial queries, including the DB2 command line processor, IBM Data Studio, and embedded in an application using JDBC, CLI, and other interfaces.

Objectives

When you have finished this tutorial, you will:

  • Understand important spatial concepts relating to spatial types and coordinate systems
  • Know approaches to importing and creating spatial data
  • Understand spatial indexing
  • Be familiar with environments for executing spatial queries

Prerequisites

This tutorial targets application architects and developers who have a working knowledge of DB2 and the SQL language.

Download the sample SQL scripts, Java application, and sample data from the Downloads section to a convenient directory in order to follow the examples in this tutorial.

System requirements

In order to work through the steps of the tutorial, you need a working DB2 environment with Spatial Extender installed and configured. See Resources for information about setting up this environment.


Getting started

DB2 Spatial Extender overview

Installing and setting up DB2 Spatial Extender provides the following main features and components:

Spatial datatypes
A set of datatypes that can be used to define table columns that will contain spatial data. This includes ST_Point, ST_Linestring, and ST_Polygon for atomic spatial values. This also includes ST_MultiPoint, ST_MultiLinestring, and ST_MultiPolygon for homogeneous collections of spatial values.
Spatial functions and predicates
A large number of SQL UDFs to create spatial values, return information about spatial values, identify spatial relationships, and perform operations on spatial values. These spatial UDFs can be incorporated into SQL queries, which exploit all the capabilities of the SQL language.
Spatial index
A spatial index mechanism is provided to support the two-dimensional nature of spatial data. Tools are available to help specify a spatial index.
Spatial command line processor (CLP) - db2se
The db2se CLP provides a convenient command line interface to spatial stored procedures for operations such as spatially enabling a database and importing or exporting spatial data.

See Resources for more details about the Spatial Extender in the DB2 Infocenter.

Spatial reference systems and coordinate systems

Spatial data is typically represented by tuples of coordinate values, most often x, y, although support is provided for z and m coordinates as well. This tutorial considers data with only x and y coordinate values. It will also consider data that uses only latitude and longitude values in degrees as decimal values. Note that when working with latitude and longitude, longitude corresponds to x and latitude corresponds to y.

In order to correctly perform spatial operations, every spatial value must have a coordinate system associated with it that describes the relationship of the coordinate values to a position on the earth's surface. Although Spatial Extender provides over 3000 different pre-defined coordinate systems, this tutorial addresses only the two most commonly used: NAD83 for coordinates in North America and WGS84 for worldwide coordinates. Note that most GPS devices report coordinates using WGS84.

In order to efficiently store and process spatial data, Spatial Extender represents coordinates internally as 64-bit integers. Spatial Extender manages this through the use of a spatial reference system (SRS) that specifies offsets and a scale factor used to convert the user representation in a DOUBLE value back and forth from the internal representation. Each SRS also has a coordinate system associated with it. It is actually the SRS that is associated with each spatial value. An SRS may be referred to by either its 128-character-value name (referred to as srsName or SRS_NAME) or its 32-bit-integer identifier (referred to as srid or SRS_ID). The srid is used in SQL statements, and the srsName is used with the db2se CLP, most often when importing spatial data.

Spatial Extender provides the SRS named NAD83_SRS_1 with srid 1 for data using the NAD83 coordinate system. Spatial Extender provides the SRS name WGS84_SRS_1003 with srid 1003 for data using the WGS84 coordinate system. Refer to the documentation in the DB2 Infocenter if you need to work with a different coordinate system.


Importing from shapefiles

Obtaining shapefiles

The de-facto standard for storing and exchanging spatial data in a file format is the shapefile representation created by Esri. Esri provides shapefiles of sample data to download and use with Spatial Extender (see Resources), although the shapefile needed for this tutorial is included in Downloads for your convenience. Esri shapefiles represent the following types of data:

  • Country boundaries
  • State boundaries
  • County boundaries
  • Zipcode boundaries and centroids
  • Major highways
  • Cities, airports, hospitals, schools
  • Rivers

You can download many other shapefiles from other sources. There is no charge to use some shapefiles, while some require a license fee.

A shapefile is actually a collection of related files that have the same name but different file extensions. The most common files, and the ones used by Spatial Extender, include the following file types:

<name>.shp
This is a required binary file that contains a header describing the type of data (points, lines, polygons, and other geometries), the number of records, and the actual coordinate data.
<name>.dbf
This is a required binary file that contains the character and numeric attributes associated with each record. This actually uses the DBASE format and can be viewed with applications that support this format. There are significant limitations with this format. The attribute names are limited to 11 characters in length, and character data can be no longer than 255 bytes.
<name>.prj
This is an optional text file that contains the definition of the coordinate system associated with the spatial data. The coordinate system in the .prj file must match the coordinate system of the SRS of the column into which the data is being loaded. If this file is not present, Spatial Extender assumes that the data is in the coordinate system of the specified SRS.
<name>.shx
This is an optional binary file that provides an index into the .shp file. The main use of this file with Spatial Extender is to start importing at a record other than the very first record.

You aren't required to know the details of these files, but if you are interested, follow the link to the Esri specification in the Resources section.

Getting details about shapefiles

You can use the db2se shape_info Spatial Extender command to get information about a particular shapefile, including the type geometries, the geographic extents of the data, and all of the attribute names and types. Listing 2 gives an example of the command used with the sample counties shapefile provided with this tutorial (see Downloads). The list of attributes in Listing 2 has been abbreviated to save space.

Listing 2. Shapefile attributes
db2se shape_info -fileName counties.shp

Shape file information
----------------------
File code                  = 9994
File length (16-bit words) = 758872
Shape file version         = 1000
Shape type                 = 5 (ST_MULTIPOLYGON)
Number of records          = 3141

Minimum X coordinate = -178.217598
Maximum X coordinate = -66.969271
Minimum Y coordinate = 18.921786
Maximum Y coordinate = 71.406235

Shapes do not have Z coordinates.
Shapes do not have M coordinates.
Shape index file (extension .shx) is present.

Attribute file information
--------------------------
dBase file code                = 3
Date of last update            = 2002-02-04
Number of records              = 3141
Number of bytes in header      = 1569
Number of bytes in each record = 461
Number of columns              = 48

Column Number  Column Name      Data Type       Length  Decimal
-------------  ---------------  --------------  ------  -------
            1  NAME             C ( Character)      32        0
            2  STATE_NAME       C ( Character)      25        0
            3  STATE_FIPS       C ( Character)       2        0
...
           48  AVG_SALE97       N (   Numeric)       7        2

Coordinate system definition: "GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]]"

Importing a shapefile using the simplest approach

The db2se import_shape Spatial Extender command provides an easy way to import shapefiles. You can write this command into an SQL script file, such as db2 -tvf import_counties.sql. Listing 3 shows a simple example.

Listing 3. Contents of import_counties.sql
!erase counties.msg;

!db2se import_shape sample
-fileName         counties.shp
-srsName          nad83_srs_1
-tableSchema      test
-tableName        counties
-spatialColumn    shape
-client           1
-messagesFile     counties.msg
;

!db2se register_spatial_column sample
-tableSchema      test
-tableName        counties
-columnName       shape
-srsName          nad83_srs_1
;

connect to sample;

create index test.countiesidx on test.counties(shape)
extend using db2gse.spatial_index(.5, 2.0, 0);

runstats on table test.counties and indexes all;

select count(*) from test.counties;

describe table test.counties;

Notes:

  1. Operating system commands can be included in a DB2 script if they are prefixed by an exclamation mark.
  2. If the message file exists, it is erased with the erase counties.msg command. The import_shape operation will terminate if the message file already exists.
  3. Most of the parameters of the db2se import_shape command are fairly intuitive. The table with the specified schema and name is created with attribute columns corresponding to the attributes of the shapefile. If the table already exists, the operation will terminate. The spatial column name is arbitrary, but shape is commonly used.
  4. The -client 1 parameter is important. This command runs the import operation on the DB2 client, accessing the import file locally. If the parameter is defined as -client 0, the import operation will run on the server and requires the file specification to be a location on the server that is available to the DB2 server.
  5. The db2se register_spatial_column command creates a constraint that all spatial values use the SRS nad83_srs_1. It also creates an entry in the db2gse.st_geometry_columns view that associates this SRS with this spatial column. This is helpful and sometimes required by some visualization applications.
  6. The connect to sample code connects to the sample database.
  7. A spatial index is created with the create index command and reasonable values. Spatial indexes are discussed in a later section.
  8. The runstats command updates the table statistics, which is always a good idea after loading or after making significant updates.
  9. The select count(*) command lets you gather how many rows were imported.
  10. The describe table command helps you gather the table columns created.

Importing a shapefile using a slightly more complex example

Sometimes it is necessary to implement more flexibility in the import process. In particular, you might want to import into different but compatible data type columns, select a subset of columns to import, or use column names different than the shapefile attribute names. Listing 4 illustrates these capabilities.

Listing 4. Contents of import_counties2.sql
!erase counties.msg;
connect to sample;

drop table test.counties;

create table test.counties(
   county_name    varchar(32)
  ,state_name     varchar(25)
  ,state_fips     varchar(2)
  ,county_fips    varchar(3)
  ,area           double
  ,population2000 integer
  ,shape          db2gse.st_multipolygon
  )
  ;

!db2se import_shape sample
-fileName         counties.shp
-inputAttrColumns N(name,state_name,state_fips,cnty_fips,area,pop2000)
-srsName          nad83_srs_1
-tableSchema      test
-tableName        counties
-tableAttrColumns county_name,state_name,state_fips,county_fips,area,population2000
-createTableFlag  0
-spatialColumn    shape
-typeSchema       db2gse
-typeName         st_multipolygon
-messagesFile     counties.msg
-client 1
;

!db2se register_spatial_column sample
-tableSchema      test
-tableName        counties
-columnName       shape
-srsName          nad83_srs_1
;

create index test.countiesidx on test.counties(shape)
extend using db2gse.spatial_index(.5, 2.0, 0);

runstats on table test.counties and indexes all;

select count(*) from test.counties;

describe table test.counties;

Notes:

  1. The table is created with the desired columns, data types, and column names.
  2. The -inputAttrColumns parameter specifies the names of the attributes in the shapefile to be imported.
  3. The -tableAttrColumns parameter specifies the names of the table columns into which to import data. Note that the -inputAttrColumns and -tableAttrColumns names are different.
  4. The -createTableFlag parameter indicates that the table already exists and should not be created.
  5. The -typeSchema and -typeName parameters specify the type of the spatial column. One of the peculiarities of the shapefile format is that it does not specify whether the spatial type is a collection (multi), so import_shape defaults to the multi type. For the example, specifying multi type isn't necessary because the county polygons actually are MultiPolygons. But for some applications, the data to be imported is not a multi type, and you can import it into a non-multi column.

Creating a spatial table from point coordinates

Understanding point locations

Much of the spatial data important in analysis is based on point locations, including the following:

  • Customer home locations
  • Customer mobile location
  • Store locations
  • Service locations
  • Cell tower locations
  • Hospitals

The most common representation of point locations uses degrees of latitude and longitude, which are derived from a geocoding process, from a GPS, or possibly from coordinates on a map. Geocoding is the process of converting a street address to a latitude and longitude value. This is typically performed by specialized software using a large dataset of addresses or of street segments. Often this is done in a batch mode to handle a large number of addresses, although it is also often done by a web service to geocode an address typed in by a user. Many companies already have this software installed and have the locations in a file or in database columns. See Resources for a developerWorks article with more details about geocoding with DB2 Spatial Extender.

Loading and maintaining spatial values

This section of the tutorial discusses the loading and maintenance of spatial values in a table, assuming that the latitude and longitude are already available. This example addresses managing a table of hospital locations.

See Downloads to download the data and SQL scripts used in this section.

Importing and creating from a file

Listing 5 shows an example that you can execute with the command db2 -tvf create_hospitals.sql

Listing 5. Contents of create_hospitals.sql
drop table test.hospitals;

create table test.hospitals(name varchar(70), longitude double, latitude double);

import from "hospitals.csv" of del
modified by coldel, decpt.
method p (1, 2, 3)
messages "import.msg"
insert into test.hospitals (name, longitude, latitude);

alter table test.hospitals add column location db2gse.st_point;

!db2se register_spatial_column sample
-tableSchema      test
-tableName        hospitals
-columnName       location
-srsName          nad83_srs_1
;

update test.hospitals
set location = db2gse.st_point(longitude, latitude,1)
;

reorg table test.hospitals;

create index test.hospitalsidx on test.hospitals(location)
extend using db2gse.spatial_index(.1, 0, 0);

runstats on table test.hospitals and indexes all;

select count(*) from test.hospitals;

select
  substr(name, 1, 25) as name
 ,decimal(longitude, 8, 4) as longitude
 ,decimal(latitude, 8, 4) as latitude
 ,varchar(db2gse.st_astext(location), 30) as location_wkt
from test.hospitals
 where name like 'New York%'
 ;

Notes:

  1. Drop the table in case it already exists. Only do this for test purposes.
  2. The hospital table is created with columns for the name, longitude, and latitude.
  3. The DB2 import utility is used to populate the table from a file containing delimited fields with the data for the name, longitude, and latitude.
  4. The table is altered to add a column of type db2gse.ST_Point. If the table already existed with the latitude and longitude, the previous steps would have been skipped.
  5. The db2se register_spatial_column command creates a constraint that all spatial values use the SRS nad83_srs_1. It also creates an entry in the db2gse.st_geometry_columns view that associates this SRS with this spatial column.
  6. The work of creating the spatial value is done very simply with an SQL update statement that sets the location column to the result of the ST_Point constructor, which accepts longitude, latitude, and srid values. It is important to note that the longitude value comes first, the opposite of the ordering you commonly see. The srid value of 1 indicates that the spatial reference system for NAD83 should be used.
  7. The DB2 reorg utility is invoked as the process of updating the location column increases the size of the row and will likely cause page spills.
  8. A spatial index is created with reasonable values.
  9. The runstats command is executed to update the table statistics. It is always good to do this after loading data or after making significant data updates.
  10. The number of rows is queried, and a few rows for New York hospitals are selected. The db2gse.ST_AsText spatial function is used to return a readable representation of the spatial value.

Maintaining spatial values

It is very likely that a table of locations will have updates, and it is important to ensure that the spatial values are set or updated appropriately as rows are inserted or updated. The SQL mechanism for doing this is to establish triggers on the table when the latitude or longitude values are changed.

Listing 6 shows an example of creating triggers and testing the consequences.

Listing 6. Contents of create_hospitals_triggers.sql
create trigger test.hosp_loc_update no cascade
before update of latitude, longitude
on test.hospitals
referencing  new as n
for each row mode db2sql
set n.location = db2gse.st_point(n.longitude, n.latitude, 1)
;

create trigger test.hosp_loc_insert no cascade
before insert
on test.hospitals
referencing new as n
for each row mode db2sql
set n.location = db2gse.st_point(n.longitude, n.latitude, 1)
;

select
  substr(name, 1, 25) as name
 ,decimal(longitude, 8, 4) as longitude
 ,decimal(latitude, 8, 4) as latitude
 ,varchar(db2gse.st_astext(location), 30) as location_wkt
from test.hospitals
 where name like 'New York%'
 ;

insert into test.hospitals(name, longitude, latitude)
values('New York2', -76.5, -42.3)
;

update test.hospitals
set (longitude, latitude) = (-76.5, -42.3)
where name = 'New York Hospital'
;

select
  substr(name, 1, 25) as name
 ,decimal(longitude, 8, 4) as longitude
 ,decimal(latitude, 8, 4) as latitude
 ,varchar(db2gse.st_astext(location), 30) as location_wkt
from test.hospitals
 where name like 'New York%'
 ;

Notes:

  1. Two separate but very similar triggers are created: one for update and one for insert. Before the row data is inserted or updated, the spatial constructor is invoked to return the ST_Point value, which will be stored in the location column.
  2. To see the effect of the triggers, rows that will be modified are selected, as shown in Listing 7.
Listing 7. Rows to be modified
NAME                      LONGITUDE  LATITUDE   LOCATION_WKT
------------------------- ---------- ---------- ------------------------------
New York Hospital           -73.9537    40.7645 POINT (-73.953751 40.764542)
New York Hospital           -73.7537    41.0275 POINT (-73.753746 41.027599)
New York State Hospital     -73.9517    41.5273 POINT (-73.951797 41.527317)
  3 record(s) selected.
  1. A new row is inserted, and the rows for New York Hospital are updated. You can see the updated location values, as shown in Listing 8.
Listing 8. Updated location values
.
NAME                      LONGITUDE  LATITUDE   LOCATION_WKT
------------------------- ---------- ---------- ------------------------------
New York Hospital           -76.5000   -42.3000 POINT (-76.500000 -42.300000)
New York Hospital           -76.5000   -42.3000 POINT (-76.500000 -42.300000)
New York State Hospital     -73.9517    41.5273 POINT (-73.951797 41.527317)
New York2                   -76.5000   -42.3000 POINT (-76.500000 -42.300000)
  4 record(s) selected.
  1. For readability purposes, the first part of the name is selected, the longitude and latitude are converted to decimal values, and the location column is converted to the well-known text (wkt) representation.

Working with spatial indexes

In most database applications it is important to define indexes on columns in order to avoid scanning all the rows of a table and checking each row to see if it meets the criteria of a query. This is especially important with spatial queries, because the spatial checking is more complex than checking simple numeric or character values.

In a relational database, an index is typically created on the values in a column where there is a simple linear sequencing of values from smallest to largest. This allows a binary tree or B-tree index to be created, which allows the database to quickly navigate to a particular value or sequential range of values.

Spatial data is inherently two-dimensional (or greater), which is not directly applicable to a B-tree index, so different techniques have been developed. The approach used in Spatial Extender is a grid index, which associates spatial values with a square grid. Line and polygon spatial values may intersect more than one grid cell, while a point spatial value can intersect at most one grid cell. When defining the grid cell sizes, you need to trade off the need to minimize the number of cells that intersect a spatial value against the need to minimize the number of spatial values contained within a grid cell. In order to efficiently handle spatial values with significantly different sizes, you can specify up to three grid cell sizes. The grid cells are then indexed by Spatial Extender using the DB2 B-tree index.

A spatial index is created with the DB2 create index command with an additional extend using clause, as shown in Listing 9.

Listing 9. Using the create index command
create index test.countiesidx on test.counties(shape)
extend using db2gse.spatial_index(1.2, 3.0, 14.0);

The units to specify the grid cell sizes are the same units as the coordinate system, which are degrees of latitude and longitude for the counties data. As you might expect, the counties have quite a range of sizes, so it makes sense to specify different grid cell sizes that best fit the different county sizes.

A degree of latitude is approximately 70 miles or 110 kilometers. A degree of longitude varies from approximately 70 miles at the equator to about 45 miles or 72 kilometers at 50 degrees of latitude.

Determining spatial index grid cell sizes

Determining an optimal grid cell size or grid cell sizes may not be obvious. In general, for line and polygon spatial values, the grid cell size should be a little larger than the size that would hold an average size line or polygon. For point spatial values, the grid cell size should be about 1/10 of the width of a typical query region.

In order to simplify this determination, two tools are provided that analyze the spatial values in a table and provide recommended grid cell sizes to use when creating the index. Don't worry about getting this precisely correct; the performance will be good for a fairly wide variation in the index parameters.

Using the gseidx index advisor tool

The gseidx command-line tool is provided with the installation of Spatial Extender and is available on all of the supported platforms. Significant information about spatial indexes and the details of gseidx is provided in the DB2 Infocenter in the section Using indexes and views to access spatial data.

Listing 10 shows a simple example of using gseidx.

Listing 10. Contents of advise_counties.sql
!gseidx "connect to sample
  get geometry statistics
  for column test.counties(shape)
  advise";

This command can be executed using the command db2 -tvf advise_counties.sql and will result in the output shown in Listing 11.

Listing 11. Output of db2 -tvf advise_counties.sql command
Number of Rows: 3141
Number of non-empty Geometries: 3141
Number of empty Geometries: 0
Number of null values: 0

Extent covered by data:
    Minimum X: -178.217598
    Maximum X: -66.969271
    Minimum Y: 18.921786
    Maximum Y: 71.406235

Query Window Size:     Suggested Grid Sizes:           Index Entry Cost:
--------------------   -----------------------------   ----------------------
      0.01:              1.2,         3,        14             4.6
      0.02:              1.2,         3,        14             4.6
      0.05:              1.2,         3,        14             4.7
       0.1:              1.2,         3,        14             4.9
       0.2:              1.2,         3,        14             5.3
       0.5:              1.2,         3,        14             6.6
         1:              1.2,         3,        14             9.3
         2:              1.2,         3,        14              16
         5:              1.8,       3.6,        13              49
        10:              2.9,       8.7,        26             130
        20:              4.6,        14,        49             390
        50:               12,        72,         0            1700

The first two sections of Listing 11 provide statistics about the spatial data that was analyzed. The middle three columns of the last section provide suggested grid sizes to use in creating a spatial index.

The first column contains values that correspond to typical query window sizes. In the case of an application that is displaying a map, this is the typical width of the region that would be displayed. For example, if a user would most commonly display a map that was 10 miles across, this would correspond to about 0.25 degrees. Looking at the table, you can see that the suggested grid sizes are the same for query window sizes of 0.2 and 0.5 degrees, so you would use (1.2, 3, 14) to create the index.

The last column isn't particularly important. It is an estimate of the number of index entries that would be referenced in satisfying a query.

Java index advisor

You can also download a Java-based spatial index advisor from the Spatial Extender web site. This tool provides a user interface to select a spatial table and column to analyze. See Resources for where to obtain this tool.

An example of the user interface for the Java index advisor is shown in Figure 1.

Figure 1. Java index advisor user interface
Index Advisor

The algorithm used in this index advisor varies from the gseidx tool and results in different recommendations. However, the suggested values should work as well. The cost value is the estimated number of index pages that would be referenced, not the number of index entries as reported by the gseidx tool.


Exploring common problems and tips

Simplifying spatial function syntax

All spatial functions are defined in the schema db2gse, which must be prefixed to the function name in order for it to be correctly identified by DB2. There are two ways to simplify the spatial syntax.

Checking db2gse

The first way to simplify spatial syntax is to tell DB2 to assume that it should check the schema db2gse when resolving functions. This is done by using the set current function path statement before executing queries that involve spatial functions. This enables you to eliminate the schema name on the function calls. Listing 12 shows an example.

Listing 12. Set current function path statement
set current function path = current function path, db2gse;
select
   substr(name, 1, 25) as name
  ,st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') as distance
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

NAME                      DISTANCE
------------------------- ------------------------
Adventist Home              +2.42936545898570E+001
Greene County Memorial Ho   +2.27887983783443E+001
  2 record(s) selected.

Using method notation

The second way to simplify spatial syntax is to use method notation, which is supported by DB2 LUW, although this is not compatible with the other IBM spatial database products. Using the method notation, you specify the name of the spatial function after the spatial value, as shown in Listing 13.

Listing 13. Using method notation
set current function path = current function path, db2gse;
select
   substr(name, 1, 25) as name
  ,location..st_distance(st_point(-74.237449, 42.036976,1), 'STATUTE MILE') as distance
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

NAME                      DISTANCE
------------------------- ------------------------
Adventist Home              +2.42936545898570E+001
Greene County Memorial Ho   +2.27887983783443E+001
  2 record(s) selected.

In the sections that follow, the examples don't specify the schema because they assume that the function path has been previously set.

Ordering the coordinates

When working with coordinates using latitude and longitude, longitude corresponds to x and latitude corresponds to y. When people give location coordinates or use a device such as a GPS, the coordinates are generally in the order latitude, longitude so it is a very common user error to specify the coordinates in the reverse order when working with latitude and longitude.

For example, to find hospitals within 20 miles of Woodstock, NY (latitude 42.036976, longitude -74.237449), the query would be written as shown in Listing 14.

Listing 14. Coordinate example
select name
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

Working with well-known text

The Spatial Extender supports the well-known text (WKT) representation of spatial values used as constants in spatial queries. The WKT representation is an international specification created by the Open Geospatial Consortium. See the spatial section of the DB2 Infocenter (see Resources) for details about the WKT support in Spatial Extender.

Using WKT, the example query in Listing 14 could be written as shown in Listing 15.

Listing 15. Coordinate example in WKT
select name
from test.hospitals
where st_distance(location,st_point('POINT (-74.237449 42.036976)',1), 'STATUTE MILE')
      < 25.0

Note that coordinates are separated by a space instead of a comma as one might expect.

WKT can also be used to represent line and polygon spatial values. It is important to note that the values defining a coordinate pair are separated by a space, and the coordinate pairs are separated by a comma. Also, when defining polygon values, include double parentheses at the beginning and end of the coordinate list defining the polygon, as shown in Listing 16.

Listing 16. Defining polygon values
select name
from test.hospitals
where st_within(location,st_polygon(
'POLYGON ((-74.1 42.0, -74.1 42.1, -74.0 42.1, -74.0 42.0, -74.1 42.0))',1)) = 1

Selecting spatial values

Most often, spatial values are used in a query but don't actually need to be returned to the user or application as part of the result set, as in the previous example. If you execute a query such as select * from test.hospitals or select location from test.hospitals, you will likely see a long hexadecimal string that isn't very meaningful, because this is the internal, compressed representation of the spatial data. If you want to see the coordinates in a readable form, you need to use the ST_AsText function to return the spatial value in the WKT representation. It is a good idea to restrict the length of the character data returned with the SUBSTR or VARCHAR SQL functions, as shown in Listing 17.

Listing 17. Using the st_astext function
select
   substr(name, 1, 25) as name
  ,varchar(st_astext(location), 30) as location
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

NAME                      LOCATION
------------------------- ------------------------------
Adventist Home            POINT (-73.785400 42.139526)
Greene County Memorial Ho POINT (-73.878181 42.230648)
  2 record(s) selected.

Another readable format is the Geography Markup Language (GML), which is an XML representation for spatial values defined by the Open Geospatial Consortium. You can select spatial values in this format using the ST_AsGML function, as shown in Listing 18.

Listing 18. Using the st_asgml function
select
   substr(name, 1, 25) as name
  ,varchar(st_asgml(location), 90) as location
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

NAME                      LOCATION
------------------------- ----------------------------------------
Adventist Home            <gml:Point srsName="EPSG:4269"><gml:pos>
                          -73.785400 42.139526</gml:pos></gml:Point>

Greene County Memorial Ho <gml:Point srsName="EPSG:4269"><gml:pos>
                          -73.878181 42.230648</gml:pos></gml:Point>
  2 record(s) selected.

When working with web mapping applications and displaying markers for point objects in the database, it is often useful to select the coordinates as x and y (or more likely, longitude and latitude). This is easily done with the ST_X and ST_Y functions, as shown in Listing 19.

Listing 19. Using the st_x and st_y functions
select
   substr(name, 1, 25) as name
  ,st_x(location) as longitude
  ,st_y(location) as latitude
from test.hospitals
where st_distance(location,st_point(-74.237449, 42.036976,1), 'STATUTE MILE') < 25.0

NAME                      LONGITUDE                LATITUDE
------------------------- ------------------------ ------------------------
Adventist Home              -7.37854000000000E+001   +4.21395260000000E+001
Greene County Memorial Ho   -7.38781810000000E+001   +4.22306480000000E+001
  2 record(s) selected.

When to treat a spatial value

A number of the spatial functions that can create different types of spatial values are documented as returning the generic type ST_Geometry. If you then try to use a spatial function that is valid only for a specific spatial type, you will get an error, as shown in Listing 20.

Listing 20. Error message for incompatible spatial type
The invocation of routine "ST_PERIMETER" is ambiguous. The argument in position "1"
does not have a best fit.. SQLCODE=-245, SQLSTATE=428F5, DRIVER=3.62.80

For example, the code in Listing 21 returns this error because the function st_buffer has a return type of st_geometry, even though the actual result will be of type st_polygon.

Listing 21. Example of code using incompatible spatial type
 select
  name
 ,st_perimeter(st_buffer(location, .1)) as perimeter
from test.hospitals
where name like 'New York%'

This can be resolved by informing DB2 that it should treat the result as st_polygon, as shown in Listing 22.

Listing 22. Using the treat function
select
  name
 ,st_perimeter(TREAT(st_buffer(location, .1) as db2gse.st_polygon))
  as perimeter
from test.hospitals
where name like 'New York%'

The code executes without error and returns the output shown in Listing 23.

Listing 23. Result of using the treat function
NAME                    PERIMETER
----------------------- ------------------------
New York Hospital         +6.28206837256265E-001
New York Hospital         +6.28206837256265E-001
New York State Hospital   +6.28206837256265E-001
  3 record(s) selected.

Executing spatial queries

Because spatial queries are no different than any other SQL query, any environment that can be used to execute SQL queries can be used to execute queries that include spatial operations. In the previous sections, spatial queries were executed using the DB2 command line processor by referencing a file of SQL statements. This section discusses some of the other environments for executing spatial queries, including IBM Data Studio, Java language applications, and C language applications.

Working with IBM Data Studio

IBM Data Studio is a free, Eclipse-based environment in which to develop and test database applications. Data Studio replaces the DB2 Control Center, which has been deprecated and is no longer available with DB2 Version 10. See Resources for information on downloading and setting up Data Studio.

The IBM Data Studio full-client enables you to run SQL commands as well as do Java development. The full client is recommended for the following sections of this tutorial.

When you have installed IBM Data Studio and have brought it up, set up an environment for executing the queries by completing the following steps.

  1. Click File > New > Data Development Project.
  2. Specify a name for your project, such as SpatialProject.
  3. Select a database to which to connect.
  4. Append DB2GSE to the default path, and click Finish.
  5. Right-click your project name in the left pane, and select New SQL or XQuery Script.
  6. Name your project something like query_hospital_counties, and click Finish.
  7. Copy the code in Listing 24 into the script input area.
Listing 24. Code for script input area
SELECT h.name, st_astext(h.location)
FROM test.hospitals AS h, test.counties AS c
WHERE st_within(h.location, c.shape) = 1
AND c.state_name = 'New York'
AND c.county_name = 'Greene';
  1. Press Ctrl+S to save the script.
  2. Press F5 to run the query. The query runs and presents a screen as shown in Figure 2. Click the Result1 tab to see your actual results.
Figure 2. Hospitals in county query in Data Studio
Data Studio results of first query run

Click to see larger image

Figure 2. Hospitals in county query in Data Studio

Data Studio results of first query run

Repeat the steps for the SQL query of distance shown in Listing 25.

Listing 25. SQL distance query
select
  name
 ,st_astext(location) as location_wkt
 ,st_distance(location, st_point(-74.237449, 42.036976, 1), 'STATUTE MILE') as distance
from test.hospitals
where st_distance(location, st_point(-74.237449, 42.036976, 1), 'STATUTE MILE') < 25.0
order by distance

The query runs and presents a screen as shown in Figure 3.

Figure 3. Distance to hospitals query in Data Studio
Data Studio distance to hospitals results

Working with the JDBC interface in Java applications

This section describes how to run spatial queries using JDBC in the Data Studio environment. If you have a JAVA SDK installed, the sample JDBC program available in Downloads can also be simply compiled and executed in a command-line environment.

Complete the following steps to create a Java development project in Data Studio and execute the same queries that you ran as SQL scripts.

  1. Click File > New > Java Development Project.
  2. Specify a name for your project, such as SpatialJDBCProject, and click Next.
  3. On the Java Settings window, click the Libraries tab.
  4. Click the Add External JARs button, and navigate to the sqllib\java directory in your DB2 installation.
  5. Select the db2jcc.jar and db2jcc_license_cu.jar files, and click Finish.
  6. Right-click the src node under your new project, and click New > Class.
  7. Specify JDBCSpatialQuery as the name of your class, and click Finish.
  8. Open the sample Java program from Downloads in a text editor, copy all the text, and replace the minimal class definition in the Data Studio editor window.
  9. Click Ctrl+S to save the file.
  10. If you are not using the SAMPLE database, modify the source code to substitute the name of your database, and save the file.
  11. Right-click in the editor window, and select Run As > Java Application. Figure 4 shows the console window in the bottom pane.
Figure 4. JDBC query in Data Studio
Data Studio output of JDBCSpatialQuery

Explanation of application

The static main function simply creates a connection to the SAMPLE database using the current user authorization by default or by the specified user ID and password, if provided as parameters. An instance of the class is created, and then each of the instance methods to perform queries are invoked with the database connection and appropriate parameters.

Function hospitalInCounty1

This section describes the simplest approach, although not the most efficient. The SQL query is constructed identically to the query executed previously in the tutorial in the Data Studio script editor by substituting into the character string the parameter values for the county and state names as constants. A Java statement object is created, and then the SQL query is passed to the executeQuery method. A loop is then set up to read each row of the result set, getting the name and location as text strings, which are output to the console. When this is complete, the result set and statement are closed. Listing 26 shows an example.

Listing 26. Example using constants
 void hospitalInCounty1(Connection con, String stateName, String countyName)
                        throws SQLException {
                Statement stmt;
                ResultSet rs;
                // Query hospitals that are within the specified state and county
                String sel1 = "SELECT h.name, db2gse.st_astext(h.location) "
                                + "FROM test.hospitals AS h, test.counties AS c "
                                + "WHERE db2gse.st_within(h.location, c.shape) = 1 "
                                + "AND c.state_name = '" + stateName + "' "
                                + "AND c.county_name = '" + countyName + "' ";

                System.out.println("\n\nQuery hospitals in " + countyName + ", "
                                + stateName);
                stmt = con.createStatement();
                rs = stmt.executeQuery(sel1);

                // display the result set
                // rs.next() returns false when there are no more rows
                while (rs.next()) {
                        String name1 = rs.getString(1);
                        String location = rs.getString(2);
                        System.out.println("Hospital name: '" + name1 + "'; location: "
                                        + location);
                }
                rs.close();
                stmt.close();
        }

Function hospitalInCounty2

This function is similar to Listing 26, but it uses parameter markers for the parameters instead of constants. This is particularly important if similar SQL statements will be executed repeatedly, because DB2 will cache recently compiled SQL statements and reuse them when it sees them again, even if they come from a different application. This can be useful when multiple users are running the same application with different values.

In order to use parameter markers, place question mark (?) characters in the SQL statement instead of the actual parameter value. A Java PreparedStatement object is created from the connection, passing in the SQL statement. The setString method is then invoked to set the parameter values for the state and county names. The rest of the logic is the same.

If the same SQL statement is going to be executed multiple times within the same application, it would be even more efficient to save the PreparedStatement and reuse it while providing new parameter values. Listing 27 shows an example.

Listing 27. Example using question marks in place of parameter values
void hospitalInCounty2(Connection con, String stateName, String countyName)
                        throws SQLException {
                PreparedStatement pstmt;
                ResultSet rs;
                // Query hospitals that are within the specified state and county
                String sel1 = "SELECT h.name, db2gse.st_astext(h.location) "
                                + "FROM test.hospitals AS h, test.counties AS c "
                                + "WHERE db2gse.st_within(h.location, c.shape) = 1 "
                                + "AND c.state_name = ?" + "AND c.county_name = ? ";

                System.out.println("\n\nQuery hospitals in " + countyName + ", "
                                + stateName);
                pstmt = con.prepareStatement(sel1);
                pstmt.setString(1, stateName); // set state name parameter
                pstmt.setString(2, countyName); // set county name parameter
                rs = pstmt.executeQuery();

                // display the result set
                // rs.next() returns false when there are no more rows
                while (rs.next()) {
                        String name1 = rs.getString(1);
                        String location = rs.getString(2);
                        System.out.println("Hospital name: '" + name1 + "'; location: "
                                        + location);
                }
                rs.close();
                pstmt.close();
        }

Function hospitalDistance

This hospitalDistance function implements the query to find hospitals within 25 miles of the specified location. The approach also uses parameter markers for the coordinates of the specified location. Note the use of the CAST expression in the SQL statement. Listing 28 shows an example.

Listing 28. Using the hospitalDistance function
String sel2 = "SELECT name, db2gse.st_astext(location), "
            + "db2gse.st_distance(location, db2gse.st_point(CAST (? AS DOUBLE), "
            + "CAST (? AS DOUBLE), 1), 'STATUTE MILE') AS distance "
            + "FROM test.hospitals "
            + "WHERE db2gse.st_distance(location, "
            + "db2gse.st_point(CAST (? AS DOUBLE), CAST (? AS DOUBLE), 1), "
            + "STATUTE MILE')  < 25.0 "
            + "ORDER BY distance";

In order for the point constructor to be recognized properly, it is necessary to explicitly cast the longitude and latitude parameters as DOUBLE values.

Working with the CLI interface in C-language applications

The installation of DB2 Spatial Extender includes a sample program gseRunGseDemo in the directory sqllib/samples/extenders/spatial that can be executed directly to verify the spatial installation by loading spatial data, executing various spatial stored procedures, and executing sample spatial queries against the data that was loaded. gseRunGseDemo is a C-language application that uses the DB2 CLI interface and is provided in source form that you can examine, modify, and compile in your own environment. The application is documented in the spatial section of the DB2 Infocenter under the topic Writing applications and using the sample program (see Resources).

Working with other application languages

Generally, any language that allows you to execute SQL statements against DB2 can be used to execute spatial queries. This is particularly useful with web-oriented languages such as PHP, Ruby, and Perl, which can be used in conjunction with the web mapping APIs provided by services such as Google, Yahoo, and Esri ArcGIS Online.


Conclusion

This first tutorial of the series introduced the DB2 spatial capabilities, approaches to importing and creating spatial data, and environments for executing spatial queries. Subsequent tutorials will cover typical spatial uses cases, tuning for the best performance, working in a data warehouse environment, and using visualization tools.

Acknowledgments

I thank Amyris Rada for her review and suggestions to improve this tutorial.


Downloads

DescriptionNameSize
Sample SQL scripts and Java code for this tutorialsqlscripts.zip5KB
Sample spatial data for this tutorialsampledata.zip1400KB

Resources

Learn

Get products and technologies

  • Download DB2 Express-C and DB2 Spatial Extender, which are fully functional, no-charge versions of DB2 and the DB2 Spatial Extender.
  • Download DB2 Spatial Extender, which corresponds to an existing DB2 server installation. Sample spatial data in shapefile format for use with Spatial Extender is also available.
  • Download Geobrowser for DB2 and Informix, which is free for you to use with any version of DB2 or Informix with the spatial feature installed.
  • Download IBM Data Studio, which provides developers and DBAs with basic capabilities for database management and development for DB2 and Informix at no charge.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=793988
ArticleTitle=Manage spatial data with IBM DB2 Spatial Extender, Part 1: Acquiring spatial data and developing applications
publish-date=02162012