Leverage the Spatial Extender: Creating an automatically maintained spatial table from latitude-longitude column data (DB2 Edition)

With instructions for performing distance-based queries and using federated source data

If you have location information in your database in the form of separate latitude and longitude (or X and Y) columns, the value you get from that information is limited. To take full advantage of the spatial intelligence in your data, you need to turn the separate numeric columns into a single spatial column, so you can display the locations on a map and analyze spatial relationships, using a Geographic Information System (GIS) or the spatial functions provided by the IBM® DB2® Spatial Extender. This article shows you how to create and maintain such a column, using a step-by-step approach that covers all the aspects that complicate this simple process, including map projections, spatial indexing, and error handling. As an application example, it shows how to perform simple, distance-based queries. A separate appendix covers how to use a federated table from another DB2 instance as a source (including those on iSeries™ and zSeries® hosts). Sample code included.

Share:

Robert Uleman (uleman@us.ibm.com), Worldwide Technical Sales, IBM

author photoRobert Uleman has over 20 years of experience in the development and sales of software for Geographic Information Systems, remote sensing/image processing, and geophysics. He has been in the database industry since joining Illustra, an object-relational database vendor, as a development manager responsible for what later became the Informix Geodetic DataBlade. Robert is now the spatial specialist on the technical sales team for IBM Information Management, supporting customers and sales teams worldwide, and a member of IBM's ESRI strategic alliance team.



13 April 2006

Introduction

Whether derived from address information, acquired by GPS devices, or otherwise generated, many databases contain geographic coordinates of point locations as separate numeric columns in a table. In this form, however, they are not usable for detecting spatial relationships by using the power of the DB2 Spatial Extender. To unlock the information inherent in these coordinates and take full advantage of it in spatial database queries or external Geographic Information System (GIS) software, we need to combine the separate coordinate values in each row to create a single-point geometry value. A column of such point values can be indexed to support spatial queries (for example, to find all points within a region of interest, or all points within a certain distance of a given location), and can be registered for use by an external GIS (such as ESRI's ArcGIS Desktop) for more powerful visualization and analysis.

In this article, you will add the point column in a separate table (see Figure 1), because that is the most general case. In many cases, you will not have the privileges required to alter the source table, because it is controlled by another department or a separate application (though assume that you do have the privileges to create triggers on the table). In other cases, the source table is not really a table, but a representation ("nickname") of a table that resides elsewhere, in a different database on a different server, and to which you have federated access. DB2 is particularly good at linking to other IBM databases on mainframe (zSeries) and AS/400® (iSeries) hosts, through a facility called DB2 Connect™; this scenario is common at customer sites that have a variety of IBM technology. This article does not cover how to set up a nickname, as that is a well-documented, general (non-spatial) process; I simply assume that you have access to the source data, whether it is local or not.

Figure 1 below shows an example of a source table, Things, with Latitude and Longitude columns and a derived point table, Things_pts. The tables are joined over a common key column (ID). Columns Address and CityStateZip are shown as examples of additional attribute columns in the source table. The derived spatial column, Location, uses the original coordinates: longitude as X, latitude as Y.

Figure 1. Latitude-longitude source table and derived point table
Latitude-longitude source table and derived point table

Creating and using geometry values such as points requires the DB2 Spatial Extender. Familiarity with this product is assumed throughout this article; the article does not cover installation, configuration, and general use of the Extender here. See Resources for further information.


Setting up the spatial table

In this section, you will take a given table that contains a latitude column and a longitude column, create an associated table with a spatial column, populate the new table with data from the source table, and prepare the table for use in spatial queries.

The source table

The source table generally looks similar to the one in Figure 1: One or more columns making up the primary key (here, the single column called ID); several alphanumeric columns; and, among these, two columns containing coordinate values (here, Latitude and Longitude). These coordinates may have been derived from address data in the same table (through a process known as geocoding), collected using GPS technology, acquired from commercial or public sources, or created in some other way; for the purpose of this article, it doesn't matter where they come from, and there are no assumptions about it. In the code samples and scripts, I do make the following assumptions about the coordinate values:

  • The coordinate values represent latitude and longitude: "geographic" coordinates, not projected ones. In other words, they are angular measures, in degrees with decimal fraction, not linear distances (in meters, feet, or similar units). See the next section, Spatial Reference Systems.
  • The coordinate values are numeric (decimal, float, etc.), not character strings. (Some applications create tables with columns of character data type only.)
  • The coordinate columns are called "Latitude" and "Longitude".

If any of these assumptions are not valid for your database, you have to make adjustments that vary from trivial (for column names) to somewhat involved (in the case of a projected coordinate system). These adjustments are mentioned in the appropriate places in the code samples and scripts.

Spatial Reference Systems

Coordinates are numeric values, representing either angles or linear distances. To know what they mean, you have to define both the origin and the units: that is, the frame of reference. In the terminology of the Spatial Extender, this is the Spatial Reference System (SRS), a combination of an earth-based coordinate system and a set of numeric conversion parameters. Each spatial geometry value contains an SRS ID (or SRID) to identify the associated SRS. A full discussion of SRSs is beyond the scope of this article (see Resources), but it is important to be aware of the following:

  • Latitude and longitude are geographic coordinates, angular measures, usually expressed in degrees with decimal fraction, that are valid everywhere on the globe; geographic coordinate systems are not associated with a map projection.
  • Maps (on paper or on the screen) are flat surfaces; they require a transformation from the curved surface of the earth to the flat plane: a map projection. The rectilinear (straight-line) coordinates of a projected coordinate system are distance measures, expressed in units of length.
  • Both geographic and projected coordinate systems include a definition of the assumed shape of the earth (a flattened sphere); this is called the geodetic datum.
  • While geographic coordinates can be used to represent point locations anywhere, they are not suitable for defining lines of appreciable length or computing distances. To query by distance, you must use a projected coordinate system.
  • The Spatial Extender can transform coordinates between geographic and projected SRSs in both directions, with one important restriction: the source and target SRS must both be based on the same geodetic datum. Conversion from one datum to another requires specialized algorithms and trained users, and is not supported by the Spatial Extender.

As mentioned above, you must assume that the source table contains latitude and longitude values. Usually, the database itself contains no information about its geodetic datum, so you must find this out from the application that created the coordinates. Often, it is something global like WGS-84 (World Geodetic System 1984, used by all GPS devices), or a regional datum like NAD-83 (North American Datum 1983).

Since the purpose of this exercise is to prepare the data for spatial analysis, including distance-based queries, choose an SRS based on a projected coordinate system for the spatial column containing the computed points. Which projection is best depends entirely on which area your data covers (the "extent"). For a limited extent, a standard projection such as UTM or US State Plane (you have to pick the zone) may do the trick; for a larger extent, such as the continental United States, a coordinate system with broader coverage is required (e.g., one of the predefined conic projections for North America). The choice and specification of SRSs is a complex topic worthy of its own article; generally, it requires at least some background in GIS and mapping. This article simply assumes that you already have appropriate SRSs set up, one for the geographic coordinates and one for the projected coordinates (both based on the same datum); and that you know their respective names and SRIDs.

The code samples in this article use the following SRSs:

SRIDSRS nameCoordinate System Description
1NAD83_SRS_1Unprojected, geographic (latitude-longitude),
North American Datum 1983
(Created by Spatial Extender enablement)
1011LOUISIANA_UTMProjected: Universal Transverse Mercator, Zone 15;
North American Datum 1983
(Created by user)

Note that SRS names and IDs are arbitrary and local to each database, but the underlying coordinate systems are often predefined, with reference to industry authorities.

Table structure

The spatial table has a simple structure: a unique, integer column (objectid), one or more columns making up the primary key (copied from the source table), plus a single point-geometry column. The point column is registered as a spatial column using a Spatial Extender procedure, associating it with the right SRID. The objectid column is there to support some GIS and map viewer software, including the ArcExplorer program that is available for free download from IBM (see Resources). Figure 2 summarizes this for the example I use for the code samples throughout this article, assuming a source table that represents well locations.

In this example, source table wells has a multicolumn key (id1, id2); additional columns are omitted (represented by "..."). The derived spatial column, location, in table wells_pts now uses a projected coordinate system: instead of copying longitude into X and latitude into Y, X and Y are computed according to the associated projection transformation. All names, types, and values are for illustration purposes only. The additional column, objectid, is explained above.

Figure 2. The wells source table and derived point table
Latitude-longitude source table and derived X-Y point table

In the example, the source table is defined as follows:

CREATE TABLE wells(
   id1       INTEGER NOT NULL,
   id2       INTEGER NOT NULL,
   latitude  DOUBLE,
   longitude DOUBLE,
   name      CHAR(30),
   PRIMARY KEY (id1, id2)
)
IN regtbs INDEX IN idxtbs LONG IN lobtbs;		(1)
  • NOTE. All examples assume that a database connection (to a database named xypoints) already exists, that database objects (tables, views) are created in the schema associated with the connected user, and that the connected user has all required privileges.

Note that this statement specifies the table spaces for the table, index, and large objects (LOBs). Actual table space configuration and names depend on the system environment, of course; they are listed in the example to emphasize that table spaces should always be specified, not allowed to default to some initial system setting. Likewise, the number of key columns and their types will vary, as will the types of the coordinate columns. Usually, coordinate values are stored in numeric columns (float, decimal, or even integer), and the specific type will not affect the code presented here. Sometimes, however, they are character columns; in that case, all code using them needs to include a conversion, for instance, Float(Latitude) instead of just Latitude (beware of blank or malformed numeric strings; you may need extra validation and error handling). Finally, the column name is included as the lone attribute column in this example; in most situations, there will be many additional attributes.

The derived table is constructed as follows:

CREATE TABLE wells_pts                  
(
  objectid INTEGER GENERATED ALWAYS AS IDENTITY,
  id1 INTEGER, id2 INTEGER,                          
  location db2gse.ST_Point               
)
IN regtbs INDEX IN idxtbs LONG IN lobtbs;		(2)
  • NOTE. The objectid column is maintained by the database: for each row inserted, the system will assign a monotonically increasing integer number, starting at 1. This guarantees that the column values remain unique and frees you from worrying about this column later on (don't try to assign a value to it, though). In fact, in the rest of this article, I will treat this column as hidden and not mention it again in either text or figures.

For convenience, create a view that integrates the spatial column of the new table with the attribute columns of the source table by joining over the key columns; the view looks like the source table with the spatial column added:

CREATE VIEW wells_all
AS
  SELECT t.*, f.location
  FROM  wells  t, wells_pts f
  WHERE (t.id1, t.id2) = (f.id1, f.id2);		(3)

Since this is an inner join, this view has no rows for records that appear in wells but not in wells_pts (because they have null or invalid coordinates). That makes sense, given that the purpose is to work with the spatial locations as well as the attributes; if you want only the attributes and want to see all rows, work directly with the source table, wells.

  • NOTE. If you have full control over the source table, the effect of statements (2) and (3) can be achieved by directly adding the spatial column to the table:
    ALTER TABLE wells ADD COLUMN location db2gse.ST_Point;(2')

Referential integrity

The point table (wells_pts) has a foreign-key relationship with the source table (wells). This means that every row in the point table must have a key value (consisting, in this case, of two column values) that must also exist in the source table. The database has facilities, including constraints, to enforce this relationship, but you need not worry about that, since wells_pts is directly derived from wells and only modified by automatic processes (triggers) responding to changes in wells (see Keeping the spatial table up to date). As long as there is no other way to update wells_pts (that is, you must revoke privileges from all other users), referential integrity is guaranteed.

In particular, the FOREIGN KEY constraint cannot be used in combination with the triggers described below, because any insertion into wells_pts, triggered by an insertion into wells, will result in an attempt to read from wells to verify that a row with the key value exists; and that read will fail because the row it needs is still locked by the process that inserted it.

Geometry column registration

To mark the spatial table wells_pts as spatial, and make it available for discovery by OpenGIS-compliant clients, register the column location in the Spatial Extender catalog. Using the command line, this is done as follows for the current example:

db2se register_spatial_column xypoints -tableName wells_pts
      -columnName location -srsName louisiana_utm	(4)

While you're at it, do the same for the combined view:

db2se register_spatial_column xypoints -tableName wells_all
      -columnName location -srsName louisiana_utm	(4a)

The parameters tableName, columnName, and srsName are self-explanatory; xypoints is the current database name. Note that the db2se command line utility uses the SRS name, not the SRID; the SRID is needed later on, in SQL statements.

The same step can be accomplished using the DB2 Control Center (Figure 3): right-click on the current database in the object tree or on the specific table (wells_pts) in the contents pane, choose Spatial Extender | Spatial Columns ...; in the Spatial Columns dialog, select the item for the table, and click Register...; in the Select Spatial Reference System dialog, select the right SRS (LOUISIANA_UTM), and click OK; close the Spatial Columns dialog.

Figure 3. Registering column location of table wells_pts in the Control Center, using SRS Louisiana_UTM, SRID 1011
Register spatial column

Upon registration of a spatial column, the Spatial Extender catalog records its name and associated SRS (Figure 4); in addition, a check constraint on the registered column ensures that only geometry values with the correct SRID can be inserted.

The figure below shows a registered spatial column (location in table wells_pts) in the Spatial Extender catalog. The Spatial Extender uses schema DB2GSE. The catalog tables shown are actually views; from each, several columns are omitted in this figure. Note the definition of the coordinate system, specifying both the geodetic datum and the projection parameters in the Well-Known Text format specified by the Open Geospatial Consortium's Simple Features Specification for SQL.

Figure 4. Spatial column in Spatial Extender catalog
Spatial column in Spatial Extender catalog

Initializing the spatial table

Now you are ready to initialize the spatial table by copying whatever rows already exist in the source table; along the way, of course, the separate latitude and longitude values are transformed into point geometry values. This is depicted in Figure 5.

Creating a point geometry value for the spatial column involves constructing a point geometry with latitude and longitude coordinates (geographic coordinate system, SRID = 1) and then transforming it to UTM (projected) coordinates (SRID = 1011), as shown in the figure below.

Figure 5. Creating a point geometry value
Creating a point geometry value

In the simplest possible SQL, this is accomplished as follows:

INSERT INTO wells_pts( id1, id2, location )
SELECT
  id1, id2,
  TREAT (
    db2gse.ST_Transform (
      db2gse.ST_Point (
        longitude,
        latitude,
        1
      ),
      1011
    ) AS db2gse.ST_Point
  )
FROM wells;						(5)
  • NOTE. The equivalent statement for a single table with added geometry column is:
    UPDATE wells SET location = TREAT ( ... );(5')
    with the expression inside the TREAT specification as in statement (5).

This statement bears a closer look. The first two columns selected and inserted are simply the primary key. The third column, highlighted in bold, is constructed by two nested functions inside a TREAT specification. The inner call, to ST_Point(), creates a point geometry with latitude-longitude coordinates (SRID = 1). It returns a value of type db2gse.ST_Point. The outer call, to ST_Transform(), projects this point geometry to the UTM coordinate system indicated by SRID 1011. This follows exactly the flow shown in Figure 5. One more thing is needed, however: ST_Transform() returns a value of type db2gse.ST_Geometry, which is the supertype to all geometry types; as a function that applies to all geometry types, it is properly a method of the supertype. The spatial column, however, expects an ST_Point value and will not accept an ST_Geometry, so we need to "downcast" it from supertype to subtype. This is what TREAT (... AS ...) does. It works, because the ST_Geometry value in this case is really an instance of ST_Point. Unlike in the related, and perhaps more familiar, CAST specification, there is no conversion, truncation, or other transformation involved in TREAT. Since the entire boldfaced expression in statement (5) appears in many code fragments, I will abbreviate it to TREAT ( ... ) from now on.

Note that this bulk copy makes some assumptions, which may not be safe ones in real applications. First, it assumes that nothing goes wrong in the construction of the final point value. If it does, and one of the functions throws an exception, the entire process rolls back. Second, it assumes that there is enough log space available to complete the entire copy. I will come back to this later and refine the process to deal with rejected rows and to limit log usage.

Spatial indexing

Now that the spatial table is populated, it is time to add a spatial index to support spatial queries; in DB2 that is a grid index. Since every data set is different, this example uses an arbitrary value for the grid cell size: 1 km. This should work for a wide range of project extents, including the state of Louisiana (the project area of the example); you may have to adjust this depending on the extent of your project area and the density of your data (larger cells for a large area and low density; and vice versa). Note that it is never advantageous to specify more than one grid level for point data.

The command looks like this:

CREATE INDEX wells_pts_grx
ON wells_pts (location)
EXTEND USING db2gse.spatial_index (1000, 0, 0);		(6)
  • For the single, integrated table, the command is the same except for the table (and optionally the index) name:
    CREATE INDEX wells_grx
    ON wells (location)
    EXTEND USING db2gse.spatial_index (1000, 0, 0);
    (6')

The first parameter (1000) specifies the grid cell size in meters, because that is the linear unit of the SRS associated with the location column.

In most cases, a non-spatial index on the key columns will be required as well; this is standard database practice and will not be covered here.


Keeping the spatial table up to date

Triggers and procedures

Now that the spatial column is in place and populated with the initial contents of the source table, you need to ensure that the spatial column is kept up to date, always reflecting the current contents of the source table. This is done using database triggers. You can set up a trigger for each of the operations that modify the source table: inserts, updates, and deletions. In principle, you can specify everything that needs to happen right in the body of the trigger; in practice, it is often preferable to call a user-defined procedure and put all the logic there. This tends to be more powerful and more flexible, in particular with respect to error processing.

Insert trigger

When a new row is inserted into the source table, a corresponding row (or feature, the GIS term for a database item that represents something in the real world) must be added to the spatial table. This is done as follows:

CREATE PROCEDURE wells_ins_proc (
  id1 INTEGER, id2 INTEGER,
  latitude  FLOAT, longitude FLOAT
)
  SPECIFIC wells_ins_proc MODIFIES SQL DATA DETERMINISTIC
  CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION
BEGIN ATOMIC
  INSERT INTO wells_pts( id1, id2, location )
  VALUES (
    id1, id2,
    TREAT ( ... );
END@ 							(7)

CREATE TRIGGER wells_ins
  AFTER INSERT ON wells
  REFERENCING NEW AS t
  FOR EACH ROW CALL wells_ins_proc (
    t.id1, t.id2,
    t.latitude, t.longitude
  )@ 							(8)
  • NOTE. The "@" is used to terminate the entire CREATE PROCEDURE statement, because the ";" is needed to terminate statements within the procedure. On the command line, this requires the -d@ option.
  • NOTE. For the single, integrated table, no separate insert procedure (7) is needed; instead, the insert trigger calls the same procedure as the update trigger (see below), because once the new row is inserted, all that needs to happen is to assign a value to the spatial column:
    ... FOR EACH ROW CALL wells_upd_proc ...(8')

Update trigger

When either of the coordinate columns in the source table is updated, a new point geometry must be constructed. The update trigger looks like this:

CREATE PROCEDURE wells_upd_proc (
  id1 INTEGER, id2 INTEGER,
  latitude FLOAT, longitude FLOAT
)
  SPECIFIC wells_upd_proc MODIFIES SQL DATA DETERMINISTIC
  CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION
BEGIN ATOMIC
  UPDATE wells_pts f
  SET location = TREAT ( ... )
  WHERE (f.id1, f.id2) =
        (wells_upd_proc.id1, wells_upd_proc.id2);
END@ 							(9)

CREATE TRIGGER wells_upd
  AFTER UPDATE OF latitude, longitude ON wells
  REFERENCING NEW AS t
  FOR EACH ROW CALL wells_upd_proc (
    t.id1, t.id2,
    t.latitude,
    t.longitude
  )@ 							(10)

Note that this assumes that key columns are never updated; once a row exists, it is immutably tied to its primary-key value. If this assumption is invalid, additional conditions must be handled. I will not address this here.

  • NOTE For the single, integrated table, the update procedure serves both the insert and update triggers:
    ... UPDATE wells f ...(9')

Delete trigger

When a row in the source table is deleted, the corresponding row in the feature table must be deleted as well (since you cannot have the system do this automatically with the ON DELETE CASCADE clause of a foreign key constraint, as explained above). The trigger is defined as follows:

CREATE TRIGGER wells_del
  AFTER DELETE
  ON wells
  REFERENCING OLD AS t
  FOR EACH ROW
    DELETE FROM wells_pts f
    WHERE (f.id1, f.id2) = (t.id1, t.id2);		(11)

This trigger is so simple that it does not require a procedure. The only thing that can go "wrong" is that no qualifying feature exist, in which case there is nothing to do.

  • NOTE In the case of a single, integrated table, this trigger is not needed at all; once a row is deleted, that's it.

Null values, exceptions, and commit intervals

Null values

Sometimes, a row in the source table does not have valid coordinate values. The simplest case is when either or both of the coordinates is null. In that case, there is no purpose to creating an associated point feature, because the location value would not be a usable point geometry (the ST_Point constructor function returns a "POINT EMPTY" rather than null).

  • NOTE. This discussion does not apply to the case of a single, integrated table. The row exists, so the spatial column must have a value, whether null or "POINT EMPTY". If it is important to avoid "POINT EMPTY" (usually, this does not cause any problems), put the UPDATE statement in the trigger procedure in an
    IF latitude IS NOT NULL AND longitude IS NOT NULL ... END IF;
    block.

In the bulk copy, all you need to do is select only those rows that have no nulls in the latitude and longitude columns:

INSERT INTO wells_pts( id1, id2, location )
SELECT
  id1, id2,
  TREAT ( ... )
FROM wells
WHERE latitude IS NOT NULL AND longitude IS NOT NULL;	(12)

Likewise, in the insert trigger it's better to do nothing in case one of the coordinates is null. Unfortunately, you cannot prevent the trigger from calling the procedure, since in DB2 a procedure is always CALLED ON NULL INPUT. Therefore, you have to detect nulls inside the procedure (see statement (5); to conserve space and retain clarity, code snippets from here on only show highlighted changes from previous samples):

CREATE PROCEDURE wells_ins_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
    -- Insert the corresponding feature, unless any
    -- coordinate is null
  IF latitude IS NOT NULL AND longitude IS NOT NULL
    INSERT INTO wells_pts( id1, id2, location )
    VALUES ( id1, id2, TREAT ( ... ) );
  END IF;
END@							(13)

In the update trigger, you need to delete the corresponding feature if one of the coordinates becomes null:

CREATE PROCEDURE wells_upd_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
  IF latitude IS NULL OR longitude IS NULL THEN
    -- If the new row has null coordinates, delete the
    -- corresponding feature
    DELETE FROM wells_pts f
    WHERE (f.id1, f.id2) = (id1, id2);
  ELSE
    UPDATE wells_pts f SET location = TREAT ( ... )
    WHERE (f.id1, f.id2) =
          (wells_upd_proc.id1, wells_upd_proc.id2);
  END IF;
END@							(14)

Note that there may be nothing to delete, if the coordinates were already null or invalid; but that does not constitute an error.

The delete trigger does not depend on the coordinate values, so nothing changes there.

Exceptions in triggers and procedures

As mentioned above, the code given so far does not take into account that things may go wrong. When an exception is thrown during processing of a trigger, the trigger aborts, and the entire statement that caused the trigger to fire is rolled back. This is not what you want, because that would interrupt the application's normal workflow. So you must be able to trap exceptions and do something intelligent: usually, that means to do nothing and let the overall process continue, silently ignoring the error.

In this code, the main thing that can go wrong is that the ST_Transform() function cannot compute a proper point value in projected coordinates from the given latitude and longitude. Usually, this is because the data is incorrect and the coordinates indicate a location outside the expected area, and out of range for the projection used; sometimes, it is because "magic" values, like 0.0, are used (instead of null) to indicate missing data. If the coordinates are completely wrong (for example, an impossible latitude value of 100.0), the ST_Point() function may be the one to raise an exception.

DB2 stored procedure language allows for the creation of exception handlers for specific conditions; you will use this to trap anticipated errors. (Note that other conditions may cause exceptions in ST_Transform() and ST_Point(), such as SRS definition-related problems; these should be resolved, however, during development and testing, and should not occur once the process is in operation.)

In the bulk copy used to initialize the contents of the point table, you want to trap coordinate-based exceptions. Since dynamic SQL does not support exception handlers, you need to create a procedure, call it, and then (optionally) drop the procedure again:

CREATE PROCEDURE wells_init_proc ()
  SPECIFIC wells_init_proc MODIFIES SQL DATA DETERMINISTIC
  CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION
BEGIN
DECLARE dummy INTEGER;
init: FOR t AS copy_cursor CURSOR WITH HOLD FOR
    SELECT id1, id2, longitude, latitude FROM wells
    WHERE latitude IS NOT NULL AND longitude IS NOT NULL
  DO
ins: BEGIN
    -- If one of the coordinates is bad, the transform fails.
    -- Trap the exception and continue with the next row.
    DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC';
    DECLARE BadCoord        CONDITION FOR SQLSTATE '38SSG';
    DECLARE CONTINUE HANDLER FOR TransformFailed, BadCoord
      SET dummy = 1;
    INSERT INTO wells_pts ( id1, id2, location )
    VALUES ( id1, id2, TREAT ( ... ) );
  END ins;
  END FOR init;
END@
CALL wells_init_proc()@
DROP PROCEDURE wells_init_proc@				(15)

Discussion of the precise syntax and workings of the FOR statement is beyond the scope of this article, but the CURSOR WITH HOLD clause is required to ensure that the process continues with the next row after an exception is trapped, instead of the next statement after the FOR (which would mean exiting the procedure). Also, the response to each of the anticipated exceptions is to do nothing, but the CONTINUE HANDLER declaration expects a complete SQL statement; hence the dummy assignment. The specific conditions declared with their SQLSTATEs are described in the Spatial Extender documentation (messages GSE3015N and GSE3416N, respectively).

Apply similar exception handling to the insert trigger:

CREATE PROCEDURE wells_ins_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
  DECLARE dummy INTEGER;
  DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC';
  DECLARE BadCoord        CONDITION FOR SQLSTATE '38SSG';
  DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
    SET dummy = 1;
  IF latitude IS NOT NULL AND ... [common SQL code omitted]
END@							(16)

The update trigger is more complicated: if the update causes an exception, you have to delete the existing feature row, just as for null coordinates. If the row is not found (because the old coordinates were null or invalid), you have to turn the update into an insert; and in that case, any exception should just be ignored (no need to delete anything), so the exception handler must be overridden.

CREATE PROCEDURE wells_upd_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
  DECLARE dummy INTEGER;
  IF latitude IS NULL OR ... [common SQL code omitted]
  ELSE
Upd: BEGIN
    DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC';
    DECLARE BadCoord        CONDITION FOR SQLSTATE '38SSG';
    DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
      DELETE FROM wells_pts f
      WHERE (f.id1, f.id2) = (id1, id2);
    DECLARE EXIT HANDLER FOR NOT FOUND
  nf: BEGIN
      DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
        SET dummy = 1;
      INSERT INTO wells_pts( id1, id2, location )
      VALUES ( id1, id2, TREAT ( ... ) );
    END nf;
    UPDATE wells_pts f SET location = TREAT ( ... )
    WHERE (f.id1, f.id2) =
          (wells_upd_proc.id1, wells_upd_proc.id2);
  END Upd;
  END IF;
END@							(17)

The delete trigger needs no exception handling. If the corresponding spatial row does not exist, there is simply nothing to delete; this condition is not an exception that causes a transaction to roll back.

Silent processing with log: The error table

With exception handlers in the trigger and bulk copy procedures, you can now be sure that bad coordinates will not block transactions on the source table, and that the application user will not see messages caused by a back-end process he did not initiate. Bad coordinates are reflected in the absence of associated point geometry values, and the trigger procedures fail silently. This is user-friendly, but it would be nice if you could check whether any rows from the source table caused problems, complete with identification of these rows and a description of what happened. For this, you need to keep a log; bulk load and import programs often log to a file, but inside the database you need to use a table. There are many ways to do this; the following is simply a proposed structure and mechanism.

Along with the wells_pts table, create a wells_err table for logging purposes. Each log record will have a time stamp, a descriptive character string, the primary key from the source table for identification of the offending row, and, because most if not all failures result from bad or null coordinates, the original latitude and longitude. See Figure 6.

Figure 6 shows exception logging with an error table. Upon insertion of the highlighted record in table "wells", the insert trigger calls the point constructor and the transform function. The latter raises an exception, because the longitude of +90.525 degrees (Eastern hemisphere) places the point somewhere in Tibet, out of reach of the UTM zone for Louisiana. In the exception handler, insert a row into the error log table, recording the time and date, reason code, primary key, and coordinates. Inspection of the error table conveniently reveals the source row as well as the problematic coordinate.

Figure 6. Exception logging with an error table
Exception logging with an error table

The error logging table from Figure 6 is defined as follows:

CREATE TABLE wells_err
(
  when      TIMESTAMP,
  reason    CHAR(20),
  id1       INTEGER, id2 INTEGER,
  latitude  FLOAT,
  longitude FLOAT
)
IN regtbs INDEX IN idxtbs LONG IN lobtbs;		(18)

You can now add error logging to the various procedures you've defined. Let's look at the trigger procedures first; come back to the bulk copy in the next section, where you'll use the error-logging mechanism for additional purposes.

The insert trigger procedure now looks as follows. Note that the dummy variable declaration has disappeared: you don't need it because you now have an actual statement in the exception handler. Also, the IF statement has been inverted: you now test for null coordinates (instead of NOT NULL)), write a log record if found, and proceed with the insert otherwise.

CREATE PROCEDURE wells_ins_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
  ... [condition declarations omitted]
  DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
    INSERT INTO wells_err
    VALUES ( CURRENT TIMESTAMP, 'INS Invalid Coord',
             id1, id2, latitude, longitude );
  IF latitude IS NULL OR longitude IS NULL THEN
    INSERT INTO wells_err
    VALUES ( CURRENT TIMESTAMP, 'INS Null Coord',
             id1, id2, latitude, longitude );
  ELSE
    ... [common SQL code omitted]
END@							(19)

The update trigger procedure is similarly modified:

CREATE PROCEDURE wells_upd_proc (
  ... [common SQL code omitted]
BEGIN ATOMIC
  IF latitude IS NULL OR longitude IS NULL THEN
    DELETE FROM wells_pts f
    WHERE (f.id1, f.id2) = (id1, id2);
    INSERT INTO wells_err
    VALUES ( CURRENT TIMESTAMP, 'UPD Null Coord",
             id1, id2, latitude, longitude );
  ELSE
Upd: BEGIN
    ... [condition declarations omitted]
    DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
  tf: BEGIN
      DELETE FROM wells_pts f
      WHERE (f.id1, f.id2) = (id1, id2);
      INSERT INTO wells_err
      VALUES ( CURRENT TIMESTAMP, 'UPD Invalid Coord',
               id1, id2, latitude, longitude );
    END tf;
    DECLARE EXIT HANDLER FOR NOT FOUND
  nf: BEGIN
      DECLARE EXIT HANDLER FOR TransformFailed, BadCoord
        INSERT INTO wells_err
        VALUES ( CURRENT TIMESTAMP, 'UPD Invalid Coord',
                   id1, id2, latitude, longitude );
      INSERT INTO ... [common SQL code omitted]
END@							(20)

As before, the delete trigger needs no modification.

Commit interval

All triggers defined here work separately "FOR EACH ROW". The bulk copy used to initialize the table, however, processes all rows from the source table in a single transaction. If the source table is large, this may cause an overflow of the logical log. As with bulk loads from a file, therefore, it is better to issue a COMMIT statement periodically, so the logical log files can be flushed. As shown in the code snippet below, this requires keeping a counter so you know when to commit and write a log record, taking care not to count rows that cause an exception. While you're at it, you'll write diagnostic records every time you perform a COMMIT, as well as at the start and end of the process, for performance-measurement purposes. Note that the test for null values is gone from the FOR statement, because you want to handle that condition by writing a diagnostic record.

CREATE PROCEDURE wells_init_proc ( commit_interval INTEGER )
  ... [common SQL code omitted]
BEGIN
  DECLARE count INTEGER DEFAULT 0;
  DECLARE failed INTEGER DEFAULT 0;
  IF commit_interval < 1 THEN SET commit_interval = 1; END IF;
  INSERT INTO wells_err (when, reason)
  VALUES ( CURRENT TIMESTAMP, 'CPY Start' );
init: FOR t AS copy_cursor CURSOR WITH HOLD FOR
    SELECT id1, id2, latitude, longitude FROM wells
  DO
ins: BEGIN
    ... [condition declarations omitted]
    DECLARE CONTINUE HANDLER FOR TransformFailed, BadCoord
tf: BEGIN
      INSERT INTO wells_err
      VALUES ( CURRENT TIMESTAMP, 'CPY Invalid Coord',
               id1, id2, latitude, longitude );
      SET failed = 1;
    END tf;
    IF latitude IS NULL OR longitude IS NULL THEN
      INSERT INTO wells_err
      VALUES ( CURRENT TIMESTAMP, 'CPY Null Coord',
               id1, id2, latitude, longitude );
    ELSE
      INSERT INTO wells_pts ( id1, id2, location )
      VALUES ( id1, id2, TREAT ( ... ) );
      IF failed = 0 THEN
        SET count = count + 1;
        IF Mod( count, commit_interval ) = 0 THEN
          INSERT INTO wells_err (when, reason)
          VALUES ( CURRENT TIMESTAMP,
                   'CPY Commit ' || CHAR(count) );
          COMMIT;
        END IF;
      ELSE
        SET failed = 0;
      END IF;
    END IF;
  END ins;
  END FOR init;
  INSERT INTO wells_err (when, reason)
  VALUES ( CURRENT TIMESTAMP, 'CPY Total ' || CHAR(count) );
  COMMIT;
END@
CALL wells_init_proc( 1000 )@
DROP PROCEDURE wells_init_proc@				(21)

Conclusion

In principle, turning given latitude and longitude coordinates into a point location is simple, but taking care of some of the practical contingencies makes it more complex than you might expect. In this article you have built up, in increasing levels of sophistication, the procedures and triggers needed to maintain a spatial table with point geometry values, associated with a source table containing latitude and longitude values. This prepares the database for use by a GIS and for spatial business logic implemented in SQL, including distance-based queries.

The download package contains a fully documented Windows® command script that sets up and executes the entire process, including exception handling, bulk copy commit interval, and error logging, for practically any source table. In addition, it includes a sample dataset, so you can try it out. The script creates separate SQL command files for each step. Additional scripts test the implemented mechanisms on the provided sample data and show an example of a distance-based query, complete with map images. Use the scripts as they are, or customize them to suit your configuration and requirements.

If parts of the article are not clear, if there are bugs in the scripts, or if you have suggestions for improvement or follow-on articles, don't hesitate to contact the author.


Appendix A: Distance-based queries

The end goal of this article's exercise was to prepare an existing dataset for spatial queries, including distance-based ones. The result is not necessarily a map; it may be simply another table with selected attribute columns (no spatial column) for reporting and analysis in, say, Excel®. This appendix shows what such queries might look like.

The simplest and most common query is to find out which objects are within a specified distance of a given location; usually, you want to see the results sorted in ascending order of distance. The following query does this for the example (wells_all), searching all locations within 15 KM from the given point, with latitude and longitude equal to 30.4 degrees and −91.1 degrees, respectively:

SELECT id1, id2, name,
  DECIMAL (
    db2gse.ST_Distance ( w.location,
      db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ),
    6, 2 ) AS km
FROM wells_all w
WHERE
  db2gse.ST_Distance ( w.location,
    db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ) <= 15
ORDER BY 4;						(A1)

Two observations are worth making here. The first is that the ST_Distance() function, which measures the linear distance between two geometries, takes a third argument to specify the linear unit of measurement, regardless of the underlying SRS (note, though, that this only works for projected SRSs). This is a nice convenience for making the SQL code more readable. Second, the ST_Distance() function implicitly applies a projection transformation to the second argument (a point geometry with SRS = 1, an unprojected coordinate system), to bring it into the same SRS as the first argument (the column location, which has a projected SRS). Alternatively, you can make the transform explicit by including the whole TREAT( ... ) expression as in statement (5) (with longitude and latitude replaced by the given values). This is a matter of coding preference; it does not affect query execution.

A similar query can be used to populate a results table to be used for further analysis:

CREATE TABLE wells_found
(
  id1      INTEGER, id2 INTEGER,
  name     CHAR(30),
  distance FLOAT
) IN regtbs INDEX IN idxtbs LONG IN lobtbs;		(A2)

INSERT INTO wells_found ( id1, id2, name, distance )
SELECT id1, id2, name,
    db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ),
                         'KILOMETER' )
FROM wells_all w
WHERE
    db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ),
                         'KILOMETER' ) <= 15;	        (A3)

(In this case, there is no need to sort by distance, or to format the distance values nicely.)

As a slightly more sophisticated example, here is a query involving a spatial join, that is, a join where the join predicate is a spatial function. In a hypothetical analysis of the spatial distribution of point locations, this query finds those that have one or more other points nearby, as defined by a given distance. The following statement finds all locations from wells_all with one or more neighbors within a distance of three KM:

SELECT a.id1, a.id2, a.name, COUNT(*) AS number
FROM wells_all a, wells_all b
WHERE
    db2gse.ST_Distance ( a.location, b.location,
                         'KILOMETER' ) <= 3.0
AND NOT (a.id1, a.id2) = (b.id1, b.id2)
GROUP BY a.id1, a.id2, a.name;				(A4)

From an SQL standpoint, this query, with its GROUP BY clause and COUNT(*) aggregate function, is not as elementary as the previous one. Spatially, however, it is a straightforward query. It is a self-join, which means that it joins a single table with itself: it performs a table scan (this takes time!) and, for each row, it searches for those rows whose location is within the specified distance of the current row's location. You're only interested in the total number of neighbors for each location, not the detailed list of neighbors, so group the results for each location and only ask for a count. Every location, of course, is its own neighbor by this query's spatial search criterion, so exclude that row from the search.


Appendix B: Federated source data

One of the reasons for creating a separate table, rather than adding a spatial column to the source table, is that the source may not be a local table at all, but a representation (nickname) of a federated table residing elsewhere, in another database on another server. DB2 is particularly good at linking to other IBM databases on mainframe (zSeries) and AS/400 (iSeries) hosts, through a facility called DB2 Connect. As this is a common scenario at sites with DB2, this appendix shows in brief how to deal with this situation.

Nicknames do not support triggers, nor do the Materialized Query Tables (MQTs) sometimes used to cache federated data. If you have the cooperation of the remote host's DBA, you may be able to set up replication, rather than federation, for the required data. The replication target (on the local system) would be a real table, so the trigger-based mechanism outlined in this article would work.

If replication is not possible, you can still populate an associated spatial table using the nickname as the source, and join the two in a combined view. Since triggers are not available to keep the spatial table up to date, you need to use the bulk copy mechanism as part of regularly scheduled mechanism. At times when it is acceptable to take the spatial table off-line, first clear the table (DELETE FROM wells_pts;), and then call the procedure from statement (21) (do not drop the procedure as shown there).

Obviously, it is not efficient to reinitialize the spatial table each time, but it may be the only way. If the source table does not allow updates to the key and coordinate columns (for example location changes and corrections may be managed by creating new features and keeping the old ones as history), however, you can leave existing rows alone and look only for new rows (insertions) and rows that have disappeared (deletions):

BEGIN ATOMIC
 FOR new AS
  SELECT id1, id2, latitude, longitude
  FROM wells t
  WHERE NOT EXISTS (
    SELECT * FROM wells_pts f
    WHERE (f.id1, f.id2) = (t.id1, t.id2)
  )
 DO
  CALL wells_ins_proc ( id1, id2, latitude, longitude );
 END FOR;
END@							(B1)

DELETE FROM wells_pts f
WHERE NOT EXISTS (
  SELECT * FROM wells t
  WHERE (t.id1, t.id2) = (f.id1, f.id2)
);							(B2)

If updates to the coordinate columns are possible, you may still be able to avoid bulk copies if the source table contains a time stamp column that records the time of last update for each row. In that case, you can set up your batch process in such a way that only rows updated since the last run are considered. For example, assuming that the source table has a column last_updated of type TIMESTAMP, and your batch process runs every 24 hours, you could use the following:

FOR changed AS
  SELECT id1, id2, latitude, longitude
  FROM wells
  WHERE Date(last_updated) >= CURRENT DATE - 1 DAYS
DO
  CALL wells_upd_proc ( id1, id2, latitude, longitude );	(B3)

Of course, it would be better to compare the last_updated value to the actual time the batch process last ran (to completion, successfully), but you get the idea.


Download

DescriptionNameSize
Command scripts and dataSpatial_from_ latlong_DB2.zip2654KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, IBM i
ArticleID=107758
ArticleTitle=Leverage the Spatial Extender: Creating an automatically maintained spatial table from latitude-longitude column data (DB2 Edition)
publish-date=04132006