Skip to main content

skip to main content

developerWorks  >  Information Management  >

Extending the DB2 Spatial Extender, Part 1: Add your customized spatial functionality to the DB2 Spatial Extender

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss

Sample code


Rate this page

Help us improve this content


Level: Intermediate

Knut Stolze (stolze@de.ibm.com), DB2 WebSphere Information Integration Development, IBM Germany

17 Nov 2005

The IBM® DB2® Spatial Extender provides a set of data types tailored to the management of spatial data. Along with the data types comes a variety of functions and methods to perform spatial comparisons or operations. However, the field of spatial data processing is quite complex and the extender's functionality might not provide the specific function you need. This article explains the available mechanisms to enhance the Spatial Extender. It describes the infrastructure you use to access the spatial information inside your own user-defined function, which can be used to implement whatever logic you need.

Introduction

Spatial data and its most prominent representative, geographic information, can be found in many database systems. But very often the spatial information is only hidden in form of addresses. The DB2 Spatial Extender (see Resources to find more information on the extender) can be used to represent the spatial information explicitly in your data model. It provides specialized data types and associated functions.

The field of spatial data processing is very broad. The DB2 Spatial Extender offers more than 100 different functions, for example to import or export spatial data from and to certain external data format, collect spatial attributes, compare two geometries and thus implement spatial joins, compute new geometries based on existing ones, and some additional support functions like the transformation between different coordinate systems, including geographic projections.

Sometimes, however, the very function that is needed for your spatial processing might not (yet) be implemented in the extender. In this article, we present how you can close this gap by writing your own user-defined functions (UDFs) and integrate them with the DB2 Spatial Extender. In the first section we explain how DB2 Universal Database™ (UDB) (see the Resources for the DB2 Information Center) communicates with external functions if structured types, which are used for the implementation of the spatial data types, are in the picture. That information builds the foundation for additional spatial logic. The next section walks you through an example and demonstrates how to add functions that measures the distance of a point along a line string. A complementary function is also described.



Back to top


Spatial data and user-defined functions

DB2 UDB provides a series of object-relational features. One feature is structured types (see the Resources for an article on structured types in DB2 UDB). Structures types can be compared to structs in C/C++ or classes in Java™ or other object-oriented programming languages. The major difference between Java/C++ classes and SQL structured types lies in the information hiding of the class' attributes, which is handled differently due to the inherent openess of the attributes of structured types. Other properties like inheritance (subtyping) and methods, including overriding methods in subtypes, are available.

The DB2 Spatial Extender was built on top of those object-relational features. It implements a hierarchy of 12 types, along with many spatial-specific methods and functions. The root type of the hierarchy is ST_Geometry. Figure 1 shows the complete spatial hierarchy as it is provided by the extender. The data types in the boxes with grey background are not instantiable, which is comparable to abstract classes in object-oriented programming languages like C++ or Java.


Figure 1. Spatial type hierarchy
Spatial type hierarchy

The DB2 Spatial Extender implements various methods on each of the types in the hierarchy. The general spatial functionality is located at the root of the hierarchy, but type-specific methods can also be found further down in the hierarchy. For example, the method ST_Length is defined for the type ST_Curve and computes the length of the curve or linestring.

The logic implemented by the spatial functions and methods is rather complex (see the Resources for an introduction to computational geometry). Although implementing it completely with SQL only might be possible, implementing it in an external programming language like C or Java is much more efficient for the specialized data structures that can be used. DB2 allows for the integration of functions written in an external, non-SQL programming language. The DB2 kernel needs to communicate with the external function in order to pass input parameters to the function and to take the result and represent it as SQL values. Structured types complicate the communication further because such types are, as the name already indicates, structured and are comprised of several scalar values (attributes) that need to be send to the external function in some fashion. This is the point where transform functions come into play to serialize the structured type. Once you understand how transform functions work, you can use them to implement your own functions that work with spatial data. Alternatively, you can wrap your external function in another SQL function to take care of the serialization of the structured data explicitly. Both approaches are described in the subsequent sections.

Transform functions

The primary purpose of a transform function is the conversion or transformation of a value of a structured type into one or more values of the DB2 built-in types. Those built-in types are well known and the infrastructure for the user-defined functions (UDFs) is capable of handling them. So a structured value is converted to a set of INTEGER, VARCHAR, ... values and those values are then sent to the external function. Additionally, the reverse operation is supported, in other words, converting a set of INTEGER, VARCHAR, ... values, that are returned by the external function, to a structured value. The first category denotes the so-called FROM SQL transform functions and the second category is the TO SQL functions. Listing 1 sketches the interface of a FROM SQL and a TO SQL transform function. The type StrucType has two attributes, which are extracted and sent to the external function by the fromSQLTransform. When an external function will return a value of a structured type, the toSQLTransform function takes two parameters, which are actually returned by the external function, as input and constructs the value of type StrucType, and populates its attributes with those two parameters.


Listing 1. Interface of FROM SQL and TO SQL functions.
CREATE TYOE StrucType AS (
   attr1  INTEGER,
   attr2  VARCHAR(100),
)
MODE DB2SQL@

CREATE FUNCTION fromSQLTransform ( strucValue StrucType )
   RETURNS ROW ( value1 INTEGER, value2 VARCHAR(100) )
   LANGUAGE SQL
   ...
   RETURN VALUES ( strucValue..attr1, strucValue..attr2 )@

CREATE FUNCTION toSQLTransform ( value1 INTEGER, value2 VARCHAR(100) )
   RETURNS StrucType
   LANGUAGE SQL
   ...
   RETURN StrucType()..attr1(value1)..attr2(value2)@

A FROM SQL and a TO SQL function can be grouped together in a transform group. Transform groups are the means to refer to transform functions. When you define your own user-defined functions that will take a structured value as input paramater and return a structured value as result, you have to include the TRANSFORM GROUP clause in the CREATE FUNCTION statement to specify the transform group to be used. Listing 2 shows such a SQL statement. You will notice that there is no further specification necessary regarding the structured types and their treatment.


Listing 2. Using a transform function in a UDF.
CREATE FUNCTION myFunction ( strucValue StrucType )
   RETURNS StrucType
   EXTERNAL NAME 'library!function'
   LANGUAGE C
   ...
   TRANSFORM GROUP myGroup

When you invoke the UDF in a SQL statement, DB2 will automatically compile the necessary transform function into the access plan. You can think about this procedure as in Listing 3. The first SQL statement is the statement that gets sent from the application to the DB2 server and the second statement is basically the logic that gets executed internally.


Listing 3. Idea behind the automatic use of transform functions.
SELECT ...
FROM   ...
WHERE  myFunction(strucValue)..attr1 = 10

SELECT ...
FROM   ...
WHERE  toSQLTransform(
          <external-function>(
             fromSQLTransform(strucValue)))..attr1 = 10

So far, we have covered the specifics of transform functions on the DB2 engine side. Essentially, a structured value is taken apart and its attributes are send to the external code. So the external code has to accept the separate attributes and its null indicators -- just like any other user-defined function. However, one additional information is sent to the UDF and that is the null indication for the whole structured value. So for each structured value that is passed to a UDF as input parameter or returned as output parameter, one additional null indicator must be handled.

Manually wrapping external functions

The previous section explained the mechanisms of transform functions and how they fit into the picture if you implement methods or functions that deal with structured types in an external programming language. The current section explains how you can achieve the same without transforms by mimicking the logic yourself. Essentially, you have to pursue a two-step approach. The first step is to write your UDF without structured types in such a way that it takes only parameters with the built-in data types as parameters. Next you wrap another UDF, written in SQL, around it to handle the structured types. In Listing 4, you will see such an example. The function internalUDF implements the first step and myUDF covers the second step.


Listing 4. Wrapping the external function to handle structured types.
CREATE FUNCTION internalUDF ( value1 INTEGER, value2 VARCHAR(100) )
   RETURNS INTEGER
   EXTERNAL NAME 'library!function'
   LANGUAGE C
   ...

CREATE FUNCTION myUDF ( strucValue StrucType )
   RETURNS INTEGER
   LANGUAGE SQL
   ...
   RETURN internalUDF ( strucValue..attr1, strucValue..attr2 )

The example in Listing 4 avoided a structured type as the data type for the result. The processing needs to be more complex in such a case because (scalar) UDFs are tailored to taking multiple input parameters but only returning a single value. If you want the function to return multiple values so that you can construct the complete structured value from that, you need to resort to table functions. Table functions return a whole table and you need to query it using a SELECT statement. Listing 5 demonstrates the idea.


Listing 5. Using table functions to handle structured return types.
CREATE FUNCTION internalUDF ( value1 INTEGER, value2 VARCHAR(100) )
   RETURNS TABLE ( INTEGER, VARCHAR(100) )
   EXTERNAL NAME 'library!function'
   LANGUAGE C
   ...

CREATE FUNCTION myUDF ( strucValue StrucType )
   RETURNS StrucType
   LANGUAGE SQL
   ...
   RETURN SELECT StrucType()..attr1(v1)..attr2(v2)
          FROM   TABLE ( internalUDF ( strucValue..attr1,
                    strucValue..attr2 ) ) AS t(v1, v2)

The wrapping approach is quite simple and obvious because everything regarding the transformation is explicit in the function definition. There are two shortcomings with the wrapping, which makes the use of transform functions often the better choice:

  • In the external code, you cannot simply handle the case where the whole structured value is NULL. A way around this issue is the use of an additional parameter in the internal function.
  • If you have several functions that deal with the same structured types, you will have to code the transformation in each and every function. Such a redundancy can lead to maintenance problems.


Back to top


A practical example

In this section, we explain how to adopt the above-described techniques to enhance the DB2 Spatial Extender with additional functionality. To that end, two functions are added. The first function takes a linestring and a point as input parameter. The point is projected onto the linestring and then the distance from the start of the linestring is measured (along the linestring) to the projected point. The second function is the complement to the first: It takes a linestring as input along with a distance and returns the point which can be found on the linestring at that specific distance. Again, the distance is measured from the start of the linestring along the linestring.

For the first function, DistanceToPoint, we use a transform function in order to implicitly convert the two input geometries to their well-known binary (WKB) representation. In addition to the WKB, we also pass the identifier of the associated spatial reference system (SRS ID) to the external Java code so that we can ensure that the processing logic does not try to work on geometries in different systems. The second function is named PointAtDistance and it employs the wrapping technique.

Using transform groups

Before we can define the user-defined function that actually calls the external Java routine, we need to define a transform group. For the definition of the transform group, we require the existence of the proper FROM SQL transform function. Note that we do not need a TO SQL transform function in this special case because DistanceToPoint does not return a structured type as result. Listing 6 shows the necessary steps up to the point where the final function is defined. The script install_fct.sql to perform those steps can also be found in the download section.


Listing 6. Defining the transform function and the DistanceToPoint function.
-- function to serialize the geometry to WKB + SRS-ID
CREATE FUNCTION spatial.wkbTransform ( geometry db2gse.ST_Geometry ) 
   RETURNS ROW ( wkb BLOB(10M), srsId INTEGER )
   SPECIFIC spatial."wkbTransform"
   LANGUAGE SQL  DETERMINISTIC
   NO EXTERNAL ACTION  CONTAINS SQL
   RETURN VALUES ( geometry..ST_AsBinary(), geometry..ST_SrsId() )@

-- transform group for the UDF
CREATE TRANSFORM FOR db2gse.ST_Geometry
   wkbUdfTransformGrp ( FROM SQL WITH FUNCTION spatial.wkbTransform )@

-- add the function/method to the ST_LineString type
ALTER TYPE db2gse.ST_LineString
   ADD METHOD DistanceToPoint(point db2gse.ST_Point)
      RETURNS DOUBLE
      SPECIFIC db2gse."dist2pt"
      LANGUAGE JAVA  PARAMETER STYLE DB2GENERAL
      DETERMINISTIC  FENCED  THREADSAFE
      RETURNS NULL ON NULL INPUT
      NO SQL  NO EXTERNAL ACTION
      NO SCRATCHPAD  NO FINAL CALL
      ALLOW PARALLEL  NO DBINFO@

-- define the method itself
CREATE METHOD DistanceToPoint FOR db2gse.ST_LineString
   EXTERNAL NAME 'PointDistance.getDistanceToPoint'
   TRANSFORM GROUP wkbUdfTransformGrp@

The DB2 Application Development Guide recommends the use of the PARAMETER STYLE JAVA for methods implemented in Java. However, that parameter style cannot be used if structured types are involved. Another comment is that DistanceToPoint is defined as method. That way the method inherits the schema name db2gse from the type. The method can also be invoked using functional notation due to the way the ST_Geometry type is defined using the WITH FUNCTION ACCESS clause. Of course, you can also define DistanceToPoint as function only -- as is demonstrated for the next function.

Although the above-listed SQL statements are essential to be able to invoke the new functionality from your SQL code, you also need to implement the actual logic. We wrote a Java class PointDistance with a method named getDistanceToPoint for that purpose. The Java method needs to be written like any other Java UDF. However, there is a difference in the handling of null values if structured types and transform functions are involved (as was explained before). DB2 will pass a null indicator for each parameter that is produced by the transform function. This are the parameters for the WKB and the SRS ID. Additionally, there is one null indicator for the whole structured value right after the null indicators for the serialized/transformed parameters. This is done for both input parameter because both are structured types on the SQL-side. The effect of this is illustrated in Listing 7. The first condition verifies that none of the input parameters is null. You will notice the gap in the numbering. This is due to the null indicators for the geometries themselves, and those null indicators are tested in the second condition.


Listing 7. Java code for the function.
public void getDistanceToPoint(
        COM.ibm.db2.app.Blob lineStringWkb, int lineSrsId,
        COM.ibm.db2.app.Blob pointWkb, int pointSrsId,
        double distance) throws Exception
{
    // NULL WKB/SRS-ID?
    if (isNull(1) || isNull(2) || isNull(4) || isNull(5)) {
        return;
    }
    // NULL geometries?
    if (isNull(3) || isNull(6)) { 
        return;
    }

    // ...

    set(5, linestring.getDistanceToProjectedPoint(point));
}

The final step lies in testing the logic. Listing 8 illustrates a few calls to the function and the expected results. The script test_dist2pt.sql also includes a test case where the point cannot be projected on the linestring at all.


Listing 8. Testing the function DistanceToPoint.
VALUES db2gse.ST_LineString('linestring(10 10, 11 11)', 0)..
          DistanceToPoint(db2gse.ST_Point(10, 10, 0))@

1
------------------------
  +0.00000000000000E+000

  1 record(s) selected.

VALUES db2gse.ST_LineString('linestring(10 10, 20 10, 20 20, 10 20)', 0)..
          DistanceToPoint(db2gse.ST_Point(15, 16, 0))@

1
------------------------
  +1.60000000000000E+001

  1 record(s) selected.

VALUES db2gse.DistanceToPoint(db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          db2gse.ST_Point(25, 25, 0))@

1
------------------------
SQL4302N  Procedure or user-defined function "DB2GSE.DISTANCETOPOINT",
specific name "dist2pt" aborted with an exception "Point could not be
projected onto lines".  SQLSTATE=38501

VALUES db2gse.DistanceToPoint(db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          db2gse.ST_Point(20, 11, 0))@

1
------------------------
  +1.10000000000000E+001

  1 record(s) selected.

Wrapping the external function

The second function to locate the point at a given distance on the linestring uses the wrapping approach. The actual external Java routine is accessed using a UDF named PointAtDist_int. The suffix _int indicates that this is an internal function. The function takes a BLOB as input, along with the desired distance. The BLOB contains the well-known binary representation of the linestring. Likewise, the function returns the well-known binary encoding of the resulting point as a BLOB. The wrapping function PointAtDistance takes care of the necessary conversions and calls the ST_AsBinary method to get the well-known binary representation to the internal function. And the result of the internal function is fed to the constructor ST_Point to create the final result.

For the previous function, we passed the ID of the spatial reference system to the external code to work with. Here, this is not necessary and the resulting point will get the SRS ID carried over directly in the wrapping function. If you want to use the SRS ID (or any other additional information) in the external Java code, you have to pass it to the internal function as an additional parameter. Listing 9 contains the complete definitions of the internal and wrapping functions.


Listing 9. Defining the function PointAtDistance.
-- create internal function that works on BLOBs
CREATE FUNCTION spatial.PointAtDist_int(
      linestring BLOB(10M), distance DOUBLE)
   RETURNS  BLOB(200)
   SPECIFIC spatial."pt@dist_int"
   EXTERNAL NAME 'PointDistance.getPointAtDistance'
   LANGUAGE JAVA  PARAMETER STYLE DB2GENERAL
   DETERMINISTIC  FENCED  THREADSAFE
   RETURNS NULL ON NULL INPUT
   NO SQL  NO EXTERNAL ACTION  NO SCRATCHPAD
   NO FINAL CALL  ALLOW PARALLEL  NO DBINFO@

-- create wrapping function
CREATE FUNCTION spatial.PointAtDistance(
      linestring db2gse.ST_LineString, distance DOUBLE)
   RETURNS db2gse.ST_Point
   SPECIFIC spatial."pt@dist_ext"
   LANGUAGE SQL  READS SQL DATA
   DETERMINISTIC  NO EXTERNAL ACTION
   RETURN db2gse.ST_Point(
             spatial.PointAtDist_int(
                linestring..ST_AsBinary(), distance),
             linestring..ST_SrsId())@

The Java code that implements the method getPointAtDistance is a straightforward Java UDF with no special handling for structured types. This is due to the fact that the function does not deal with structured types. After all, it just gets a BLOB and a DOUBLE as input and returns a BLOB as result. The handling of the structured types is completely done on the SQL side and hidden from the external logic. Listing 10 shows the entry point of the Java function to illustrate that no special handling for null indicators needs to be done. The complete code can be found in the download section.


Listing 10. Java code for the function.
public void getPointAtDistance(COM.ibm.db2.app.Blob lineStringWkb,
        double distance, COM.ibm.db2.app.Blob pointWkb) throws Exception
{
    // NULL WKB/distance?
    if (isNull(1) || isNull(2)) {
        return;
    }
    if (distance < 0.0) {
        throw new Exception("Invalid distance " + distance);
    }

    // ...

    set(3, endPoint.convertToWKB());
}

As usual, we test the function right away in Listing 11. The distance is set in bold italics to highlight it in between all those numbers. The tests involve the success cases, as well as error situations. One error is, for example, if the given distance exceeds the length of the linestring. For presentation purposes, the resulting point is converted to its well-known text representation and shortened to 50 characters. In your own application, you can do without the call to ST_AsText. All the test cases can be found in the script test_ptatdist.sql, which is available in the download section.


Listing 11. Testing the function PointAtDistance.
VALUES VARCHAR(spatial.PointAtDistance(
          db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          
                    5.0
                )..ST_AsText(), 50)

1
--------------------------------------------------
POINT (15 10)

  1 record(s) selected.

VALUES VARCHAR(spatial.PointAtDistance(
          db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          
                    20.0
                )..ST_AsText(), 50)

1
--------------------------------------------------
POINT (20 20)

  1 record(s) selected.

VALUES VARCHAR(spatial.PointAtDistance(
          db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          
                    40.0
                )..ST_AsText(), 50)

1
--------------------------------------------------
SQL4302N  Procedure or user-defined function "SPATIAL.POINTATDIST_INT",
specific name "pt@dist_int" aborted with an exception "Distance 40.0
exceeds the length 30.0".  SQLSTATE=38501

VALUES VARCHAR(spatial.PointAtDistance(
          db2gse.ST_LineString(
             'linestring(10 10, 20 10, 20 20, 10 20)', 0),
          
                    13.0
                )..ST_AsText(), 50)

1
--------------------------------------------------
POINT (20 13)

  1 record(s) selected.



Back to top


Installing the Java code

The Java code for the three classes used to implement the desired logic can be found in the download section. After unzipping the file, you will find the source code for the classes in the local directory. Simply byte-compile it using your favorite Java compiler and copy the three class files to the sqllib/function/ directory of your instance. Once this is done, DB2 will be able to find the classes and call your functions.



Back to top


Summary

This article explained how you can enhance the DB2 Spatial Extender with your own functionality by adding new functions or methods. The spatial-type hierarchy was explained and that it is based on DB2's structured types. Transform functions are employed by DB2 to serialize structured types in such a way that the parameters can be communicated to the external function or vice versa. Based on those concepts, we presented the two functions DistanceToPoint and PointAtDistance, which add new logic to the spatial data type ST_LineString. You can easily adapt the described techniques to your own purposes and specialize the extender to satisfy our requirements.




Back to top


Download

DescriptionNameSizeDownload method
Java source code and SQL scriptsspatial-extension.zip11 KBFTP|HTTP
Information about download methods


Resources

Learn

Discuss


About the author

Author photo

Knut Stolze started his work with DB2 when he joined IBM as a visiting scientist at the Silicon Valley Lab where he worked on the DB2 Image Extender. For the following two years he moved on to the DB2 Spatial Extender team and was responsible for several enhancements to improve the usability, the performance, and the standard-conformance of the Extender. Currently, he works as a teaching assistant at the University of Jena, Germany, and continues his work for IBM in the area of the Information Integrator development. He can be reached through the newsgroups comp.databases.ibm-db2 and ibm.software.db2.udb.spatial or via stolze@de.ibm.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top