Whether derived from address information, acquired by GPS devices, or otherwise generated, many databases contain geographic coordinates of point locations as separate numeric columns in a table. In this form, however, they are not usable for detecting spatial relationships by using the power of the DB2 Spatial Extender. To unlock the information inherent in these coordinates and take full advantage of it in spatial database queries or external Geographic Information System (GIS) software, we need to combine the separate coordinate values in each row to create a single-point geometry value. A column of such point values can be indexed to support spatial queries (for example, to find all points within a region of interest, or all points within a certain distance of a given location), and can be registered for use by an external GIS (such as ESRI's ArcGIS Desktop) for more powerful visualization and analysis.
In this article, you will add the point column in a separate table (see Figure 1), because that is the most general case. In many cases, you will not have the privileges required to alter the source table, because it is controlled by another department or a separate application (though assume that you do have the privileges to create triggers on the table). In other cases, the source table is not really a table, but a representation ("nickname") of a table that resides elsewhere, in a different database on a different server, and to which you have federated access. DB2 is particularly good at linking to other IBM databases on mainframe (zSeries) and AS/400® (iSeries) hosts, through a facility called DB2 Connect™; this scenario is common at customer sites that have a variety of IBM technology. This article does not cover how to set up a nickname, as that is a well-documented, general (non-spatial) process; I simply assume that you have access to the source data, whether it is local or not.
Figure 1 below shows an example of a source table, Things, with Latitude and Longitude columns and a derived point table, Things_pts. The tables are joined over a common key column (ID). Columns Address and CityStateZip are shown as examples of additional attribute columns in the source table. The derived spatial column, Location, uses the original coordinates: longitude as X, latitude as Y.
Figure 1. Latitude-longitude source table and derived point table
Creating and using geometry values such as points requires the DB2 Spatial Extender. Familiarity with this product is assumed throughout this article; the article does not cover installation, configuration, and general use of the Extender here. See Resources for further information.
Setting up the spatial table
In this section, you will take a given table that contains a latitude column and a longitude column, create an associated table with a spatial column, populate the new table with data from the source table, and prepare the table for use in spatial queries.
The source table
The source table generally looks similar to the one in Figure 1: One or more columns making up the primary key (here, the single column called ID); several alphanumeric columns; and, among these, two columns containing coordinate values (here, Latitude and Longitude). These coordinates may have been derived from address data in the same table (through a process known as geocoding), collected using GPS technology, acquired from commercial or public sources, or created in some other way; for the purpose of this article, it doesn't matter where they come from, and there are no assumptions about it. In the code samples and scripts, I do make the following assumptions about the coordinate values:
- The coordinate values represent latitude and longitude: "geographic" coordinates, not projected ones. In other words, they are angular measures, in degrees with decimal fraction, not linear distances (in meters, feet, or similar units). See the next section, Spatial Reference Systems.
- The coordinate values are numeric (decimal, float, etc.), not character strings. (Some applications create tables with columns of character data type only.)
- The coordinate columns are called "Latitude" and "Longitude".
If any of these assumptions are not valid for your database, you have to make adjustments that vary from trivial (for column names) to somewhat involved (in the case of a projected coordinate system). These adjustments are mentioned in the appropriate places in the code samples and scripts.
Spatial Reference Systems
Coordinates are numeric values, representing either angles or linear distances. To know what they mean, you have to define both the origin and the units: that is, the frame of reference. In the terminology of the Spatial Extender, this is the Spatial Reference System (SRS), a combination of an earth-based coordinate system and a set of numeric conversion parameters. Each spatial geometry value contains an SRS ID (or SRID) to identify the associated SRS. A full discussion of SRSs is beyond the scope of this article (see Resources), but it is important to be aware of the following:
- Latitude and longitude are geographic coordinates, angular measures, usually expressed in degrees with decimal fraction, that are valid everywhere on the globe; geographic coordinate systems are not associated with a map projection.
- Maps (on paper or on the screen) are flat surfaces; they require a transformation from the curved surface of the earth to the flat plane: a map projection. The rectilinear (straight-line) coordinates of a projected coordinate system are distance measures, expressed in units of length.
- Both geographic and projected coordinate systems include a definition of the assumed shape of the earth (a flattened sphere); this is called the geodetic datum.
- While geographic coordinates can be used to represent point locations anywhere, they are not suitable for defining lines of appreciable length or computing distances. To query by distance, you must use a projected coordinate system.
- The Spatial Extender can transform coordinates between geographic and projected SRSs in both directions, with one important restriction: the source and target SRS must both be based on the same geodetic datum. Conversion from one datum to another requires specialized algorithms and trained users, and is not supported by the Spatial Extender.
As mentioned above, you must assume that the source table contains latitude and longitude values. Usually, the database itself contains no information about its geodetic datum, so you must find this out from the application that created the coordinates. Often, it is something global like WGS-84 (World Geodetic System 1984, used by all GPS devices), or a regional datum like NAD-83 (North American Datum 1983).
Since the purpose of this exercise is to prepare the data for spatial analysis, including distance-based queries, choose an SRS based on a projected coordinate system for the spatial column containing the computed points. Which projection is best depends entirely on which area your data covers (the "extent"). For a limited extent, a standard projection such as UTM or US State Plane (you have to pick the zone) may do the trick; for a larger extent, such as the continental United States, a coordinate system with broader coverage is required (e.g., one of the predefined conic projections for North America). The choice and specification of SRSs is a complex topic worthy of its own article; generally, it requires at least some background in GIS and mapping. This article simply assumes that you already have appropriate SRSs set up, one for the geographic coordinates and one for the projected coordinates (both based on the same datum); and that you know their respective names and SRIDs.
The code samples in this article use the following SRSs:
|SRID||SRS name||Coordinate System Description|
|1||NAD83_SRS_1||Unprojected, geographic (latitude-longitude),|
North American Datum 1983
(Created by Spatial Extender enablement)
|1011||LOUISIANA_UTM||Projected: Universal Transverse Mercator, Zone 15;|
North American Datum 1983
(Created by user)
Note that SRS names and IDs are arbitrary and local to each database, but the underlying coordinate systems are often predefined, with reference to industry authorities.
The spatial table has a simple structure: a unique, integer column (objectid), one or more columns making up the primary key (copied from the source table), plus a single point-geometry column. The point column is registered as a spatial column using a Spatial Extender procedure, associating it with the right SRID. The objectid column is there to support some GIS and map viewer software, including the ArcExplorer program that is available for free download from IBM (see Resources). Figure 2 summarizes this for the example I use for the code samples throughout this article, assuming a source table that represents well locations.
In this example, source table wells has a multicolumn key (id1, id2); additional columns are omitted (represented by "..."). The derived spatial column, location, in table wells_pts now uses a projected coordinate system: instead of copying longitude into X and latitude into Y, X and Y are computed according to the associated projection transformation. All names, types, and values are for illustration purposes only. The additional column, objectid, is explained above.
Figure 2. The wells source table and derived point table
In the example, the source table is defined as follows:
CREATE TABLE wells( id1 INTEGER NOT NULL, id2 INTEGER NOT NULL, latitude DOUBLE, longitude DOUBLE, name CHAR(30), PRIMARY KEY (id1, id2) ) IN regtbs INDEX IN idxtbs LONG IN lobtbs; (1)
- NOTE. All examples assume that a database connection (to a database named xypoints) already exists, that database objects (tables, views) are created in the schema associated with the connected user, and that the connected user has all required privileges.
Note that this statement specifies the table spaces for the table, index,
and large objects (LOBs). Actual table space configuration and names
depend on the system environment, of course; they are listed in the
example to emphasize that table spaces should always be specified, not
allowed to default to some initial system setting. Likewise, the number of
key columns and their types will vary, as will the types of the coordinate
columns. Usually, coordinate values are stored in numeric columns (float,
decimal, or even integer), and the specific type will not affect the code
presented here. Sometimes, however, they are character columns; in that
case, all code using them needs to include a conversion, for instance,
Float(Latitude) instead of just
Latitude (beware of blank or malformed numeric
strings; you may need extra validation and error handling). Finally, the
column name is included as the lone attribute column in this
example; in most situations, there will be many additional attributes.
The derived table is constructed as follows:
CREATE TABLE wells_pts ( objectid INTEGER GENERATED ALWAYS AS IDENTITY, id1 INTEGER, id2 INTEGER, location db2gse.ST_Point ) IN regtbs INDEX IN idxtbs LONG IN lobtbs; (2)
- NOTE. The objectid column is maintained by the database: for each row inserted, the system will assign a monotonically increasing integer number, starting at 1. This guarantees that the column values remain unique and frees you from worrying about this column later on (don't try to assign a value to it, though). In fact, in the rest of this article, I will treat this column as hidden and not mention it again in either text or figures.
For convenience, create a view that integrates the spatial column of the new table with the attribute columns of the source table by joining over the key columns; the view looks like the source table with the spatial column added:
CREATE VIEW wells_all AS SELECT t.*, f.location FROM wells t, wells_pts f WHERE (t.id1, t.id2) = (f.id1, f.id2); (3)
Since this is an inner join, this view has no rows for records that appear in wells but not in wells_pts (because they have null or invalid coordinates). That makes sense, given that the purpose is to work with the spatial locations as well as the attributes; if you want only the attributes and want to see all rows, work directly with the source table, wells.
- NOTE. If you have full control over the source table, the
effect of statements (2) and (3) can be achieved by directly adding the spatial column to
ALTER TABLE wells ADD COLUMN location db2gse.ST_Point;
The point table (wells_pts) has a foreign-key relationship with the source table (wells). This means that every row in the point table must have a key value (consisting, in this case, of two column values) that must also exist in the source table. The database has facilities, including constraints, to enforce this relationship, but you need not worry about that, since wells_pts is directly derived from wells and only modified by automatic processes (triggers) responding to changes in wells (see Keeping the spatial table up to date). As long as there is no other way to update wells_pts (that is, you must revoke privileges from all other users), referential integrity is guaranteed.
In particular, the
FOREIGN KEY constraint cannot
be used in combination with the triggers described below, because any
insertion into wells_pts, triggered by an insertion into
wells, will result in an attempt to read from wells to
verify that a row with the key value exists; and that read will fail
because the row it needs is still locked by the process that inserted
Geometry column registration
To mark the spatial table wells_pts as spatial, and make it available for discovery by OpenGIS-compliant clients, register the column location in the Spatial Extender catalog. Using the command line, this is done as follows for the current example:
db2se register_spatial_column xypoints -tableName wells_pts -columnName location -srsName louisiana_utm (4)
While you're at it, do the same for the combined view:
db2se register_spatial_column xypoints -tableName wells_all -columnName location -srsName louisiana_utm (4a)
The parameters tableName, columnName, and srsName are self-explanatory; xypoints is the current database name. Note that the db2se command line utility uses the SRS name, not the SRID; the SRID is needed later on, in SQL statements.
The same step can be accomplished using the DB2 Control Center (Figure 3): right-click on the current database in the object tree or on the specific table (wells_pts) in the contents pane, choose Spatial Extender | Spatial Columns ...; in the Spatial Columns dialog, select the item for the table, and click Register...; in the Select Spatial Reference System dialog, select the right SRS (LOUISIANA_UTM), and click OK; close the Spatial Columns dialog.
Figure 3. Registering column location of table wells_pts in the Control Center, using SRS Louisiana_UTM, SRID 1011
Upon registration of a spatial column, the Spatial Extender catalog records its name and associated SRS (Figure 4); in addition, a check constraint on the registered column ensures that only geometry values with the correct SRID can be inserted.
The figure below shows a registered spatial column (location in table wells_pts) in the Spatial Extender catalog. The Spatial Extender uses schema DB2GSE. The catalog tables shown are actually views; from each, several columns are omitted in this figure. Note the definition of the coordinate system, specifying both the geodetic datum and the projection parameters in the Well-Known Text format specified by the Open Geospatial Consortium's Simple Features Specification for SQL.
Figure 4. Spatial column in Spatial Extender catalog
Initializing the spatial table
Now you are ready to initialize the spatial table by copying whatever rows already exist in the source table; along the way, of course, the separate latitude and longitude values are transformed into point geometry values. This is depicted in Figure 5.
Creating a point geometry value for the spatial column involves constructing a point geometry with latitude and longitude coordinates (geographic coordinate system, SRID = 1) and then transforming it to UTM (projected) coordinates (SRID = 1011), as shown in the figure below.
Figure 5. Creating a point geometry value
In the simplest possible SQL, this is accomplished as follows:
INSERT INTO wells_pts( id1, id2, location ) SELECT id1, id2, TREAT ( db2gse.ST_Transform ( db2gse.ST_Point ( longitude, latitude, 1 ), 1011 ) AS db2gse.ST_Point ) FROM wells; (5)
- NOTE. The equivalent statement for a single table with added
geometry column is:
with the expression inside the TREAT specification as in statement (5).
UPDATE wells SET location = TREAT ( ... );
This statement bears a closer look. The first two columns selected and
inserted are simply the primary key. The third column, highlighted in
bold, is constructed by two nested functions inside a
TREAT specification. The inner call, to
ST_Point(), creates a point geometry with
latitude-longitude coordinates (SRID = 1). It returns a value of type
db2gse.ST_Point. The outer call, to
ST_Transform(), projects this point geometry to
the UTM coordinate system indicated by SRID 1011. This follows exactly the
flow shown in Figure 5. One more thing is needed, however:
ST_Transform() returns a value of type
db2gse.ST_Geometry, which is the supertype to
all geometry types; as a function that applies to all geometry types, it
is properly a method of the supertype. The spatial column, however,
ST_Point value and will not accept
ST_Geometry, so we need to "downcast" it
from supertype to subtype. This is what
TREAT (... AS ...) does. It works, because the
ST_Geometry value in this case is really an
ST_Point. Unlike in the related,
and perhaps more familiar,
there is no conversion, truncation, or other transformation involved in
TREAT. Since the entire boldfaced expression in
statement (5) appears in many code fragments, I will
abbreviate it to
TREAT ( ... ) from now
Note that this bulk copy makes some assumptions, which may not be safe ones in real applications. First, it assumes that nothing goes wrong in the construction of the final point value. If it does, and one of the functions throws an exception, the entire process rolls back. Second, it assumes that there is enough log space available to complete the entire copy. I will come back to this later and refine the process to deal with rejected rows and to limit log usage.
Now that the spatial table is populated, it is time to add a spatial index to support spatial queries; in DB2 that is a grid index. Since every data set is different, this example uses an arbitrary value for the grid cell size: 1 km. This should work for a wide range of project extents, including the state of Louisiana (the project area of the example); you may have to adjust this depending on the extent of your project area and the density of your data (larger cells for a large area and low density; and vice versa). Note that it is never advantageous to specify more than one grid level for point data.
The command looks like this:
CREATE INDEX wells_pts_grx ON wells_pts (location) EXTEND USING db2gse.spatial_index (1000, 0, 0); (6)
- For the single, integrated table, the command is the same except for
the table (and optionally the index) name:
CREATE INDEX wells_grx
ON wells (location)
EXTEND USING db2gse.spatial_index (1000, 0, 0);
The first parameter (1000) specifies the grid cell size in meters, because that is the linear unit of the SRS associated with the location column.
In most cases, a non-spatial index on the key columns will be required as well; this is standard database practice and will not be covered here.
Keeping the spatial table up to date
Triggers and procedures
Now that the spatial column is in place and populated with the initial contents of the source table, you need to ensure that the spatial column is kept up to date, always reflecting the current contents of the source table. This is done using database triggers. You can set up a trigger for each of the operations that modify the source table: inserts, updates, and deletions. In principle, you can specify everything that needs to happen right in the body of the trigger; in practice, it is often preferable to call a user-defined procedure and put all the logic there. This tends to be more powerful and more flexible, in particular with respect to error processing.
When a new row is inserted into the source table, a corresponding row (or feature, the GIS term for a database item that represents something in the real world) must be added to the spatial table. This is done as follows:
CREATE PROCEDURE wells_ins_proc ( id1 INTEGER, id2 INTEGER, latitude FLOAT, longitude FLOAT ) SPECIFIC wells_ins_proc MODIFIES SQL DATA DETERMINISTIC CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION BEGIN ATOMIC INSERT INTO wells_pts( id1, id2, location ) VALUES ( id1, id2, TREAT ( ... ); END@ (7) CREATE TRIGGER wells_ins AFTER INSERT ON wells REFERENCING NEW AS t FOR EACH ROW CALL wells_ins_proc ( t.id1, t.id2, t.latitude, t.longitude )@ (8)
- NOTE. The "
@" is used to terminate the entire
CREATE PROCEDUREstatement, because the "
;" is needed to terminate statements within the procedure. On the command line, this requires the
- NOTE. For the single, integrated table, no separate insert
procedure (7) is needed; instead, the insert trigger calls the same
procedure as the update trigger (see below), because once the new row
is inserted, all that needs to happen is to assign a value to the
... FOR EACH ROW CALL wells_upd_proc ...
When either of the coordinate columns in the source table is updated, a new point geometry must be constructed. The update trigger looks like this:
CREATE PROCEDURE wells_upd_proc ( id1 INTEGER, id2 INTEGER, latitude FLOAT, longitude FLOAT ) SPECIFIC wells_upd_proc MODIFIES SQL DATA DETERMINISTIC CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION BEGIN ATOMIC UPDATE wells_pts f SET location = TREAT ( ... ) WHERE (f.id1, f.id2) = (wells_upd_proc.id1, wells_upd_proc.id2); END@ (9) CREATE TRIGGER wells_upd AFTER UPDATE OF latitude, longitude ON wells REFERENCING NEW AS t FOR EACH ROW CALL wells_upd_proc ( t.id1, t.id2, t.latitude, t.longitude )@ (10)
Note that this assumes that key columns are never updated; once a row exists, it is immutably tied to its primary-key value. If this assumption is invalid, additional conditions must be handled. I will not address this here.
- NOTE For the single, integrated table, the update procedure
serves both the insert and update triggers:
... UPDATE wells f ...
When a row in the source table is deleted, the corresponding row in the
feature table must be deleted as well (since you cannot have the system do
this automatically with the
ON DELETE CASCADE
clause of a foreign key constraint, as explained above). The trigger is
defined as follows:
CREATE TRIGGER wells_del AFTER DELETE ON wells REFERENCING OLD AS t FOR EACH ROW DELETE FROM wells_pts f WHERE (f.id1, f.id2) = (t.id1, t.id2); (11)
This trigger is so simple that it does not require a procedure. The only thing that can go "wrong" is that no qualifying feature exist, in which case there is nothing to do.
- NOTE In the case of a single, integrated table, this trigger is not needed at all; once a row is deleted, that's it.
Null values, exceptions, and commit intervals
Sometimes, a row in the source table does not have valid coordinate values.
The simplest case is when either or both of the coordinates is null. In
that case, there is no purpose to creating an associated point feature,
because the location value would not be a usable point geometry (the
ST_Point constructor function returns a "POINT
EMPTY" rather than null).
- NOTE. This discussion does not apply to the case of a single,
integrated table. The row exists, so the spatial column must have a
value, whether null or "POINT EMPTY". If it is important to avoid
"POINT EMPTY" (usually, this does not cause any problems), put the
UPDATEstatement in the trigger procedure in an
IF latitude IS NOT NULL AND longitude IS NOT NULL ... END IF;
In the bulk copy, all you need to do is select only those rows that have no nulls in the latitude and longitude columns:
INSERT INTO wells_pts( id1, id2, location ) SELECT id1, id2, TREAT ( ... ) FROM wells WHERE latitude IS NOT NULL AND longitude IS NOT NULL; (12)
Likewise, in the insert trigger it's better to do nothing in case one of
the coordinates is null. Unfortunately, you cannot prevent the trigger
from calling the procedure, since in DB2 a procedure is always
CALLED ON NULL INPUT. Therefore, you have to
detect nulls inside the procedure (see statement (5); to conserve space and retain clarity, code snippets from here on
only show highlighted changes from previous samples):
CREATE PROCEDURE wells_ins_proc ( ... [common SQL code omitted] BEGIN ATOMIC -- Insert the corresponding feature, unless any -- coordinate is null IF latitude IS NOT NULL AND longitude IS NOT NULL INSERT INTO wells_pts( id1, id2, location ) VALUES ( id1, id2, TREAT ( ... ) ); END IF; END@ (13)
In the update trigger, you need to delete the corresponding feature if one of the coordinates becomes null:
CREATE PROCEDURE wells_upd_proc ( ... [common SQL code omitted] BEGIN ATOMIC IF latitude IS NULL OR longitude IS NULL THEN -- If the new row has null coordinates, delete the -- corresponding feature DELETE FROM wells_pts f WHERE (f.id1, f.id2) = (id1, id2); ELSE UPDATE wells_pts f SET location = TREAT ( ... ) WHERE (f.id1, f.id2) = (wells_upd_proc.id1, wells_upd_proc.id2); END IF; END@ (14)
Note that there may be nothing to delete, if the coordinates were already null or invalid; but that does not constitute an error.
The delete trigger does not depend on the coordinate values, so nothing changes there.
Exceptions in triggers and procedures
As mentioned above, the code given so far does not take into account that things may go wrong. When an exception is thrown during processing of a trigger, the trigger aborts, and the entire statement that caused the trigger to fire is rolled back. This is not what you want, because that would interrupt the application's normal workflow. So you must be able to trap exceptions and do something intelligent: usually, that means to do nothing and let the overall process continue, silently ignoring the error.
In this code, the main thing that can go wrong is that the
ST_Transform() function cannot compute a proper
point value in projected coordinates from the given latitude and
longitude. Usually, this is because the data is incorrect and the
coordinates indicate a location outside the expected area, and out of
range for the projection used; sometimes, it is because "magic" values,
like 0.0, are used (instead of null) to indicate missing data. If the
coordinates are completely wrong (for example, an impossible latitude
value of 100.0), the
ST_Point() function may be
the one to raise an exception.
DB2 stored procedure language allows for the creation of exception handlers
for specific conditions; you will use this to trap anticipated errors.
(Note that other conditions may cause exceptions in
ST_Point(), such as SRS definition-related
problems; these should be resolved, however, during development and
testing, and should not occur once the process is in operation.)
In the bulk copy used to initialize the contents of the point table, you want to trap coordinate-based exceptions. Since dynamic SQL does not support exception handlers, you need to create a procedure, call it, and then (optionally) drop the procedure again:
CREATE PROCEDURE wells_init_proc () SPECIFIC wells_init_proc MODIFIES SQL DATA DETERMINISTIC CALLED ON NULL INPUT LANGUAGE SQL NO EXTERNAL ACTION BEGIN DECLARE dummy INTEGER; init: FOR t AS copy_cursor CURSOR WITH HOLD FOR SELECT id1, id2, longitude, latitude FROM wells WHERE latitude IS NOT NULL AND longitude IS NOT NULL DO ins: BEGIN -- If one of the coordinates is bad, the transform fails. -- Trap the exception and continue with the next row. DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC'; DECLARE BadCoord CONDITION FOR SQLSTATE '38SSG'; DECLARE CONTINUE HANDLER FOR TransformFailed, BadCoord SET dummy = 1; INSERT INTO wells_pts ( id1, id2, location ) VALUES ( id1, id2, TREAT ( ... ) ); END ins; END FOR init; END@ CALL wells_init_proc()@ DROP PROCEDURE wells_init_proc@ (15)
Discussion of the precise syntax and workings of the
FOR statement is beyond the scope of this
article, but the
CURSOR WITH HOLD clause is
required to ensure that the process continues with the next row after an
exception is trapped, instead of the next statement after the
FOR (which would mean exiting the procedure).
Also, the response to each of the anticipated exceptions is to do nothing,
CONTINUE HANDLER declaration expects a
complete SQL statement; hence the dummy assignment. The specific
conditions declared with their SQLSTATEs are described in the Spatial
Extender documentation (messages GSE3015N and GSE3416N, respectively).
Apply similar exception handling to the insert trigger:
CREATE PROCEDURE wells_ins_proc ( ... [common SQL code omitted] BEGIN ATOMIC DECLARE dummy INTEGER; DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC'; DECLARE BadCoord CONDITION FOR SQLSTATE '38SSG'; DECLARE EXIT HANDLER FOR TransformFailed, BadCoord SET dummy = 1; IF latitude IS NOT NULL AND ... [common SQL code omitted] END@ (16)
The update trigger is more complicated: if the update causes an exception, you have to delete the existing feature row, just as for null coordinates. If the row is not found (because the old coordinates were null or invalid), you have to turn the update into an insert; and in that case, any exception should just be ignored (no need to delete anything), so the exception handler must be overridden.
CREATE PROCEDURE wells_upd_proc ( ... [common SQL code omitted] BEGIN ATOMIC DECLARE dummy INTEGER; IF latitude IS NULL OR ... [common SQL code omitted] ELSE Upd: BEGIN DECLARE TransformFailed CONDITION FOR SQLSTATE '38SUC'; DECLARE BadCoord CONDITION FOR SQLSTATE '38SSG'; DECLARE EXIT HANDLER FOR TransformFailed, BadCoord DELETE FROM wells_pts f WHERE (f.id1, f.id2) = (id1, id2); DECLARE EXIT HANDLER FOR NOT FOUND nf: BEGIN DECLARE EXIT HANDLER FOR TransformFailed, BadCoord SET dummy = 1; INSERT INTO wells_pts( id1, id2, location ) VALUES ( id1, id2, TREAT ( ... ) ); END nf; UPDATE wells_pts f SET location = TREAT ( ... ) WHERE (f.id1, f.id2) = (wells_upd_proc.id1, wells_upd_proc.id2); END Upd; END IF; END@ (17)
The delete trigger needs no exception handling. If the corresponding spatial row does not exist, there is simply nothing to delete; this condition is not an exception that causes a transaction to roll back.
Silent processing with log: The error table
With exception handlers in the trigger and bulk copy procedures, you can now be sure that bad coordinates will not block transactions on the source table, and that the application user will not see messages caused by a back-end process he did not initiate. Bad coordinates are reflected in the absence of associated point geometry values, and the trigger procedures fail silently. This is user-friendly, but it would be nice if you could check whether any rows from the source table caused problems, complete with identification of these rows and a description of what happened. For this, you need to keep a log; bulk load and import programs often log to a file, but inside the database you need to use a table. There are many ways to do this; the following is simply a proposed structure and mechanism.
Along with the wells_pts table, create a wells_err table for logging purposes. Each log record will have a time stamp, a descriptive character string, the primary key from the source table for identification of the offending row, and, because most if not all failures result from bad or null coordinates, the original latitude and longitude. See Figure 6.
Figure 6 shows exception logging with an error table. Upon insertion of the highlighted record in table "wells", the insert trigger calls the point constructor and the transform function. The latter raises an exception, because the longitude of +90.525 degrees (Eastern hemisphere) places the point somewhere in Tibet, out of reach of the UTM zone for Louisiana. In the exception handler, insert a row into the error log table, recording the time and date, reason code, primary key, and coordinates. Inspection of the error table conveniently reveals the source row as well as the problematic coordinate.
Figure 6. Exception logging with an error table
The error logging table from Figure 6 is defined as follows:
CREATE TABLE wells_err ( when TIMESTAMP, reason CHAR(20), id1 INTEGER, id2 INTEGER, latitude FLOAT, longitude FLOAT ) IN regtbs INDEX IN idxtbs LONG IN lobtbs; (18)
You can now add error logging to the various procedures you've defined. Let's look at the trigger procedures first; come back to the bulk copy in the next section, where you'll use the error-logging mechanism for additional purposes.
The insert trigger procedure now looks as follows. Note that the dummy
variable declaration has disappeared: you don't need it because you now
have an actual statement in the exception handler. Also, the
IF statement has been inverted: you now test
for null coordinates (instead of
write a log record if found, and proceed with the insert otherwise.
CREATE PROCEDURE wells_ins_proc ( ... [common SQL code omitted] BEGIN ATOMIC ... [condition declarations omitted] DECLARE EXIT HANDLER FOR TransformFailed, BadCoord INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'INS Invalid Coord', id1, id2, latitude, longitude ); IF latitude IS NULL OR longitude IS NULL THEN INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'INS Null Coord', id1, id2, latitude, longitude ); ELSE ... [common SQL code omitted] END@ (19)
The update trigger procedure is similarly modified:
CREATE PROCEDURE wells_upd_proc ( ... [common SQL code omitted] BEGIN ATOMIC IF latitude IS NULL OR longitude IS NULL THEN DELETE FROM wells_pts f WHERE (f.id1, f.id2) = (id1, id2); INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'UPD Null Coord", id1, id2, latitude, longitude ); ELSE Upd: BEGIN ... [condition declarations omitted] DECLARE EXIT HANDLER FOR TransformFailed, BadCoord tf: BEGIN DELETE FROM wells_pts f WHERE (f.id1, f.id2) = (id1, id2); INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'UPD Invalid Coord', id1, id2, latitude, longitude ); END tf; DECLARE EXIT HANDLER FOR NOT FOUND nf: BEGIN DECLARE EXIT HANDLER FOR TransformFailed, BadCoord INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'UPD Invalid Coord', id1, id2, latitude, longitude ); INSERT INTO ... [common SQL code omitted] END@ (20)
As before, the delete trigger needs no modification.
All triggers defined here work separately
FOR EACH ROW". The bulk copy used to
initialize the table, however, processes all rows from the source table in
a single transaction. If the source table is large, this may cause an
overflow of the logical log. As with bulk loads from a file, therefore, it
is better to issue a
periodically, so the logical log files can be flushed. As shown in the
code snippet below, this requires keeping a counter so you know when to
commit and write a log record, taking care not to count rows that cause an
exception. While you're at it, you'll write diagnostic records every time
you perform a
COMMIT, as well as at the start
and end of the process, for performance-measurement purposes. Note that
the test for null values is gone from the
statement, because you want to handle that condition by writing a
CREATE PROCEDURE wells_init_proc ( commit_interval INTEGER ) ... [common SQL code omitted] BEGIN DECLARE count INTEGER DEFAULT 0; DECLARE failed INTEGER DEFAULT 0; IF commit_interval < 1 THEN SET commit_interval = 1; END IF; INSERT INTO wells_err (when, reason) VALUES ( CURRENT TIMESTAMP, 'CPY Start' ); init: FOR t AS copy_cursor CURSOR WITH HOLD FOR SELECT id1, id2, latitude, longitude FROM wells DO ins: BEGIN ... [condition declarations omitted] DECLARE CONTINUE HANDLER FOR TransformFailed, BadCoord tf: BEGIN INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'CPY Invalid Coord', id1, id2, latitude, longitude ); SET failed = 1; END tf; IF latitude IS NULL OR longitude IS NULL THEN INSERT INTO wells_err VALUES ( CURRENT TIMESTAMP, 'CPY Null Coord', id1, id2, latitude, longitude ); ELSE INSERT INTO wells_pts ( id1, id2, location ) VALUES ( id1, id2, TREAT ( ... ) ); IF failed = 0 THEN SET count = count + 1; IF Mod( count, commit_interval ) = 0 THEN INSERT INTO wells_err (when, reason) VALUES ( CURRENT TIMESTAMP, 'CPY Commit ' || CHAR(count) ); COMMIT; END IF; ELSE SET failed = 0; END IF; END IF; END ins; END FOR init; INSERT INTO wells_err (when, reason) VALUES ( CURRENT TIMESTAMP, 'CPY Total ' || CHAR(count) ); COMMIT; END@ CALL wells_init_proc( 1000 )@ DROP PROCEDURE wells_init_proc@ (21)
In principle, turning given latitude and longitude coordinates into a point location is simple, but taking care of some of the practical contingencies makes it more complex than you might expect. In this article you have built up, in increasing levels of sophistication, the procedures and triggers needed to maintain a spatial table with point geometry values, associated with a source table containing latitude and longitude values. This prepares the database for use by a GIS and for spatial business logic implemented in SQL, including distance-based queries.
The download package contains a fully documented Windows® command script that sets up and executes the entire process, including exception handling, bulk copy commit interval, and error logging, for practically any source table. In addition, it includes a sample dataset, so you can try it out. The script creates separate SQL command files for each step. Additional scripts test the implemented mechanisms on the provided sample data and show an example of a distance-based query, complete with map images. Use the scripts as they are, or customize them to suit your configuration and requirements.
If parts of the article are not clear, if there are bugs in the scripts, or if you have suggestions for improvement or follow-on articles, don't hesitate to contact the author.
Appendix A: Distance-based queries
The end goal of this article's exercise was to prepare an existing dataset for spatial queries, including distance-based ones. The result is not necessarily a map; it may be simply another table with selected attribute columns (no spatial column) for reporting and analysis in, say, Excel®. This appendix shows what such queries might look like.
The simplest and most common query is to find out which objects are within a specified distance of a given location; usually, you want to see the results sorted in ascending order of distance. The following query does this for the example (wells_all), searching all locations within 15 KM from the given point, with latitude and longitude equal to 30.4 degrees and â91.1 degrees, respectively:
SELECT id1, id2, name, DECIMAL ( db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ), 6, 2 ) AS km FROM wells_all w WHERE db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ) <= 15 ORDER BY 4; (A1)
Two observations are worth making here. The first is that the
ST_Distance() function, which measures the
linear distance between two geometries, takes a third argument to specify
the linear unit of measurement, regardless of the underlying SRS (note,
though, that this only works for projected SRSs). This is a nice
convenience for making the SQL code more readable. Second, the
ST_Distance() function implicitly applies a
projection transformation to the second argument (a point geometry with
SRS = 1, an unprojected coordinate system), to bring it into the same SRS
as the first argument (the column location, which has a projected
SRS). Alternatively, you can make the transform explicit by including the
TREAT( ... ) expression
as in statement (5) (with longitude and
latitude replaced by the given values). This is a matter of
coding preference; it does not affect query execution.
A similar query can be used to populate a results table to be used for further analysis:
CREATE TABLE wells_found ( id1 INTEGER, id2 INTEGER, name CHAR(30), distance FLOAT ) IN regtbs INDEX IN idxtbs LONG IN lobtbs; (A2) INSERT INTO wells_found ( id1, id2, name, distance ) SELECT id1, id2, name, db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ) FROM wells_all w WHERE db2gse.ST_Distance ( w.location, db2gse.ST_Point( -91.1, 30.4, 1 ), 'KILOMETER' ) <= 15; (A3)
(In this case, there is no need to sort by distance, or to format the distance values nicely.)
As a slightly more sophisticated example, here is a query involving a spatial join, that is, a join where the join predicate is a spatial function. In a hypothetical analysis of the spatial distribution of point locations, this query finds those that have one or more other points nearby, as defined by a given distance. The following statement finds all locations from wells_all with one or more neighbors within a distance of three KM:
SELECT a.id1, a.id2, a.name, COUNT(*) AS number FROM wells_all a, wells_all b WHERE db2gse.ST_Distance ( a.location, b.location, 'KILOMETER' ) <= 3.0 AND NOT (a.id1, a.id2) = (b.id1, b.id2) GROUP BY a.id1, a.id2, a.name; (A4)
From an SQL standpoint, this query, with its
GROUP BY clause and
COUNT(*) aggregate function, is not as
elementary as the previous one. Spatially, however, it is a
straightforward query. It is a self-join, which means that it joins a
single table with itself: it performs a table scan (this takes time!) and,
for each row, it searches for those rows whose location is within
the specified distance of the current row's location. You're only
interested in the total number of neighbors for each location, not the
detailed list of neighbors, so group the results for each location and
only ask for a count. Every location, of course, is its own neighbor by
this query's spatial search criterion, so exclude that row from the
Appendix B: Federated source data
One of the reasons for creating a separate table, rather than adding a spatial column to the source table, is that the source may not be a local table at all, but a representation (nickname) of a federated table residing elsewhere, in another database on another server. DB2 is particularly good at linking to other IBM databases on mainframe (zSeries) and AS/400 (iSeries) hosts, through a facility called DB2 Connect. As this is a common scenario at sites with DB2, this appendix shows in brief how to deal with this situation.
Nicknames do not support triggers, nor do the Materialized Query Tables (MQTs) sometimes used to cache federated data. If you have the cooperation of the remote host's DBA, you may be able to set up replication, rather than federation, for the required data. The replication target (on the local system) would be a real table, so the trigger-based mechanism outlined in this article would work.
If replication is not possible, you can still populate an associated
spatial table using the nickname as the source, and join the two in a
combined view. Since triggers are not available to keep the spatial table
up to date, you need to use the bulk copy mechanism as part of regularly
scheduled mechanism. At times when it is acceptable to take the spatial
table off-line, first clear the table
DELETE FROM wells_pts;), and then call the
procedure from statement (21) (do not drop the
procedure as shown there).
Obviously, it is not efficient to reinitialize the spatial table each time, but it may be the only way. If the source table does not allow updates to the key and coordinate columns (for example location changes and corrections may be managed by creating new features and keeping the old ones as history), however, you can leave existing rows alone and look only for new rows (insertions) and rows that have disappeared (deletions):
BEGIN ATOMIC FOR new AS SELECT id1, id2, latitude, longitude FROM wells t WHERE NOT EXISTS ( SELECT * FROM wells_pts f WHERE (f.id1, f.id2) = (t.id1, t.id2) ) DO CALL wells_ins_proc ( id1, id2, latitude, longitude ); END FOR; END@ (B1) DELETE FROM wells_pts f WHERE NOT EXISTS ( SELECT * FROM wells t WHERE (t.id1, t.id2) = (f.id1, f.id2) ); (B2)
If updates to the coordinate columns are possible, you may still be able to
avoid bulk copies if the source table contains a time stamp column that
records the time of last update for each row. In that case, you can set up
your batch process in such a way that only rows updated since the last run
are considered. For example, assuming that the source table has a column
last_updated of type
your batch process runs every 24 hours, you could use the following:
FOR changed AS SELECT id1, id2, latitude, longitude FROM wells WHERE Date(last_updated) >= CURRENT DATE - 1 DAYS DO CALL wells_upd_proc ( id1, id2, latitude, longitude ); (B3)
Of course, it would be better to compare the last_updated value to the actual time the batch process last ran (to completion, successfully), but you get the idea.
|Command scripts and data||Spatial_from_ latlong_DB2.zip||2654KB|
- IBM Software - DB2 Spatial Extender for Linux, UNIX and Windows - Product Overview: The public overview Web site for the DB2 Spatial Extender
- DB2 Spatial Extender and Geodetic Extender User's Guide and Reference, V8.2: The fundamental guide to using the DB2 Spatial Extender, including installation and configuration.
- Setting up spatial reference systems: From the Administering Spatial Extender section in the DB2 UDB InfoCenter.
- Accessing the attributes of subtypes: From the Database applications section in the DB2 UDB InfoCenter.
- Geodetic Datum Overview: a nice explanation of geodetic datums and related topics, from Peter Dana at the University of Colorado.
- Learn more about DB2 on the developerWorks DB2 for Linux, UNIX, and Windows page.
Get products and technologies
- Trial download: IBM DB2 Universal Database Enterprise Server Edition V8.2: Download a trial version from developerWorks.
- DB2 Spatial Extender for Linux, UNIX and Windows Downloads: Download a trial version of the DB2 Spatial Extender.
- ArcExplorer for DB2 Version 3.0: Download this free, Windows-based map viewer for DB2 Spatial Extender.
- IBM trial products for download: Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- developerWorks blogs: Get involved in the developerWorks community.