Setting up geocoding operations

Db2® Spatial Extender lets you set, in advance, the work that must be done when a geocoder is invoked.

Before you begin

Before you can set geocoding operations for a particular geocoder, your user ID must hold one of the following authorities or privileges:
  • DATAACCESS authority on the database that contains the tables that the geocoder will operate on
  • CONTROL privilege on each table that the geocoder operates on
  • SELECT privilege and UPDATE privilege on each table that the geocoder operates on

About this task

You can specify the following parameters when a geocoder is invoked:
  • What column the geocoder is to provide data for.
  • Whether the input that the geocoder reads from a table or view should be limited to a subset of rows in the table or view.
  • The range or number of records that the geocoder should geocode in batch sessions within a unit of work.
  • Requirements for geocoder-specific operations. For example, a geocoder may geocode only those records that match their counterparts in the reference data to a specified degree or higher. This degree is called the minimum match score.

You must specify the parameters in the preceding list before you set up the geocoder to run in automatic mode. From then on, each time the geocoder is invoked (not only automatically, but also for batch runs), geocoding operations will be performed in accordance with your specifications. For example, if you specify that 45 records should be geocoded in batch mode within each unit of work, a commit will be issued after every forty-fifth record is geocoded. (Exception: You can override your specifications for individual sessions of batch geocoding.)

You do not have to establish defaults for geocoding operations before you run the geocoder in batch mode. Rather, at the time that you initiate a batch session, you can specify how the operations are to be performed for the length of the run. If you do establish defaults for batch sessions, you can override them, as needed, for individual sessions.

Procedure

To set up geocoding operations:

Choose which way you want to set up geocoding operations:
  • Issue the db2se setup_gc command.
  • Run an application that calls the DB2GSE.ST_SETUP_GEOCODING procedure.

What to do next

Recommendations: When a geocoder reads a record of address data, it tries to match that record with a counterpart in the reference data. In broad outline, the way it proceeds is as follows: First, it searches the reference data for streets whose zip code is the same as the zip code in the record. If it finds a street name that is similar to the one in the record to a certain minimum degree, or to a degree higher than this minimum, it goes on to look for an entire address. If it finds an entire address that is similar to the one in the record to a certain minimum degree, or to a degree higher than this minimum, it geocodes the record. If it does not find such an address, it returns a null.

The minimum degree to which the street names must match is referred to as spelling sensitivity. The minimum degree to which the entire addresses must match is called the minimum match score. For example, if the spelling sensitivity is 80, then the match between the street names must be at least 80 percent accurate before the geocoder will search for the entire address. If the minimum match score is 60, then the match between the addresses must be at least 60 percent accurate before the geocoder will geocode the record.

You can specify what the spelling sensitivity and minimum match score should be. Be aware that you might need to adjust them. For example, suppose that the spelling sensitivity and minimum match score are both 95. If the addresses that you want geocoded have not been carefully validated, matches of 95 percent accuracy are highly unlikely. As a result, the geocoder is likely to return a null when it processes these records. In such a case, it is advisable to lower the spelling sensitivity and minimum match score, and run the geocoder again. Recommended scores for spelling sensitivity and the minimum match score are 70 and 60, respectively.

As noted at the start of this discussion, you can determine whether the input that the geocoder reads from a table or view should be limited to a subset of rows in the table or view. For example, consider the following scenarios :
  • You invoke the geocoder to geocode addresses in a table in batch mode. Unfortunately, the minimum match score is too high, causing the geocoder to return a null when it processes most of the addresses. You reduce the minimum match score when you run the geocoder again. To limit its input to those addresses that were not geocoded, you specify that it should select only those rows that contain the null that it had returned earlier.
  • The geocoder selects only rows that were added after a certain date.
  • The geocoder selects only rows that contain addresses in a particular area; for example, a block of counties or a state.
As noted at the start of this discussion, you can determine the number of records that the geocoder should process in batch sessions within a unit of work. You can have the geocoder process the same number of records in each unit of work, or you can have it process all the records of a table within a single unit of work. If you choose the latter alternative, be aware that:
  • You have less control over the size of the unit of work than the former alternative affords. Consequently, you cannot control how many locks are held or how many log entries are made as the geocoder operates.
  • If the geocoder encounters an error that necessitates a rollback, you need to run the geocoder to run against all the records again. The resulting cost in resources can be expensive if the table is extremely large and the error and rollback occur after most records have been processed.