Importing from shapefiles
The de-facto standard for storing and exchanging spatial data in a file format is the shapefile representation created by Esri. Esri provides shapefiles of sample data to download and use with Spatial Extender (see Resources), although the shapefile needed for this tutorial is included in Downloads for your convenience. Esri shapefiles represent the following types of data:
- Country boundaries
- State boundaries
- County boundaries
- Zipcode boundaries and centroids
- Major highways
- Cities, airports, hospitals, schools
You can download many other shapefiles from other sources. There is no charge to use some shapefiles, while some require a license fee.
A shapefile is actually a collection of related files that have the same name but different file extensions. The most common files, and the ones used by Spatial Extender, include the following file types:
- This is a required binary file that contains a header describing the type of data (points, lines, polygons, and other geometries), the number of records, and the actual coordinate data.
- This is a required binary file that contains the character and numeric attributes associated with each record. This actually uses the DBASE format and can be viewed with applications that support this format. There are significant limitations with this format. The attribute names are limited to 11 characters in length, and character data can be no longer than 255 bytes.
- This is an optional text file that contains the definition of the coordinate system associated with the spatial data. The coordinate system in the .prj file must match the coordinate system of the SRS of the column into which the data is being loaded. If this file is not present, Spatial Extender assumes that the data is in the coordinate system of the specified SRS.
- This is an optional binary file that provides an index into the .shp file. The main use of this file with Spatial Extender is to start importing at a record other than the very first record.
You aren't required to know the details of these files, but if you are interested, follow the link to the Esri specification in the Resources section.
You can use the
db2se shape_info Spatial
Extender command to get information about a particular shapefile,
including the type geometries, the geographic extents of the data, and
all of the attribute names and types. Listing 2 gives
an example of the command used with the sample counties
shapefile provided with this tutorial (see Downloads). The list of attributes in Listing 2 has been abbreviated to save space.
Listing 2. Shapefile attributes
db2se shape_info -fileName counties.shp Shape file information ---------------------- File code = 9994 File length (16-bit words) = 758872 Shape file version = 1000 Shape type = 5 (ST_MULTIPOLYGON) Number of records = 3141 Minimum X coordinate = -178.217598 Maximum X coordinate = -66.969271 Minimum Y coordinate = 18.921786 Maximum Y coordinate = 71.406235 Shapes do not have Z coordinates. Shapes do not have M coordinates. Shape index file (extension .shx) is present. Attribute file information -------------------------- dBase file code = 3 Date of last update = 2002-02-04 Number of records = 3141 Number of bytes in header = 1569 Number of bytes in each record = 461 Number of columns = 48 Column Number Column Name Data Type Length Decimal ------------- --------------- -------------- ------ ------- 1 NAME C ( Character) 32 0 2 STATE_NAME C ( Character) 25 0 3 STATE_FIPS C ( Character) 2 0 ... 48 AVG_SALE97 N ( Numeric) 7 2 Coordinate system definition: "GEOGCS["GCS_North_American_1983", DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]], PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]]"
db2se import_shape Spatial Extender
command provides an easy way to import shapefiles. You can write this
command into an SQL script file, such as
import_counties.sql. Listing 3 shows a simple example.
Listing 3. Contents of import_counties.sql
!erase counties.msg; !db2se import_shape sample -fileName counties.shp -srsName nad83_srs_1 -tableSchema test -tableName counties -spatialColumn shape -client 1 -messagesFile counties.msg ; !db2se register_spatial_column sample -tableSchema test -tableName counties -columnName shape -srsName nad83_srs_1 ; connect to sample; create index test.countiesidx on test.counties(shape) extend using db2gse.spatial_index(.5, 2.0, 0); runstats on table test.counties and indexes all; select count(*) from test.counties; describe table test.counties;
- Operating system commands can be included in a DB2 script if they are prefixed by an exclamation mark.
- If the message file exists, it is erased with the
erase counties.msgcommand. The import_shape operation will terminate if the message file already exists.
- Most of the parameters of the
db2se import_shapecommand are fairly intuitive. The table with the specified schema and name is created with attribute columns corresponding to the attributes of the shapefile. If the table already exists, the operation will terminate. The spatial column name is arbitrary, but shape is commonly used.
- The -client 1 parameter is important. This command runs the import operation on the DB2 client, accessing the import file locally. If the parameter is defined as -client 0, the import operation will run on the server and requires the file specification to be a location on the server that is available to the DB2 server.
db2se register_spatial_columncommand creates a constraint that all spatial values use the SRS nad83_srs_1. It also creates an entry in the db2gse.st_geometry_columns view that associates this SRS with this spatial column. This is helpful and sometimes required by some visualization applications.
connect to samplecode connects to the sample database.
- A spatial index is created with the
create indexcommand and reasonable values. Spatial indexes are discussed in a later section.
runstatscommand updates the table statistics, which is always a good idea after loading or after making significant updates.
select count(*)command lets you gather how many rows were imported.
describe tablecommand helps you gather the table columns created.
Sometimes it is necessary to implement more flexibility in the import process. In particular, you might want to import into different but compatible data type columns, select a subset of columns to import, or use column names different than the shapefile attribute names. Listing 4 illustrates these capabilities.
Listing 4. Contents of import_counties2.sql
!erase counties.msg; connect to sample; drop table test.counties; create table test.counties( county_name varchar(32) ,state_name varchar(25) ,state_fips varchar(2) ,county_fips varchar(3) ,area double ,population2000 integer ,shape db2gse.st_multipolygon ) ; !db2se import_shape sample -fileName counties.shp -inputAttrColumns N(name,state_name,state_fips,cnty_fips,area,pop2000) -srsName nad83_srs_1 -tableSchema test -tableName counties -tableAttrColumns county_name,state_name,state_fips,county_fips,area,population2000 -createTableFlag 0 -spatialColumn shape -typeSchema db2gse -typeName st_multipolygon -messagesFile counties.msg -client 1 ; !db2se register_spatial_column sample -tableSchema test -tableName counties -columnName shape -srsName nad83_srs_1 ; create index test.countiesidx on test.counties(shape) extend using db2gse.spatial_index(.5, 2.0, 0); runstats on table test.counties and indexes all; select count(*) from test.counties; describe table test.counties;
- The table is created with the desired columns, data types, and column names.
- The -inputAttrColumns parameter specifies the names of the attributes in the shapefile to be imported.
- The -tableAttrColumns parameter specifies the names of the table columns into which to import data. Note that the -inputAttrColumns and -tableAttrColumns names are different.
- The -createTableFlag parameter indicates that the table already exists and should not be created.
- The -typeSchema and -typeName parameters specify
the type of the spatial column. One of the peculiarities of the
shapefile format is that it does not specify whether the spatial
type is a collection (multi), so
import_shapedefaults to the multi type. For the example, specifying multi type isn't necessary because the county polygons actually are MultiPolygons. But for some applications, the data to be imported is not a multi type, and you can import it into a non-multi column.