If you want to write high-quality software that uses any sort of map, it pays to understand the subject well. There are dozens of possible uses for a map in your software — some crucial, such as showing locations of homes in a realty website, and some less crucial but still nice, such as showing a map of an address in a customer relationship management application.
A crash course in geographic information science
Geographic data can be stored in any form. A written longhand address, such as 1384 Elm Street, is technically geographic data. Specialized formats are used to make the exchange and use of spatial data easier. Most data files consist of a single dataset, referred to as a layer. This naming convention derives from a map being built with multiple layers of data placed on top of one another.
Vector data and common formats
Vector data refers to data defined by geometric constructs, such as line segments representing a street or points representing single locations. Vector data contains geometric shapes defined by x and y coordinates, and frequently z (elevation) coordinates. Some data formats also provide an m (measure) coordinate, most often used for time increments.
Similarly, some formats support multipart shapes (for example, multiline or multipolygon). These are shapes in which separate, disconnected shapes are grouped together to form a single shape element. An example is Hawaii, which consists of several discrete shapes (islands), usually represented as a single, multipart shape. More advanced data formats also allow specifying arcs, circles, or other complex shapes, and there are data formats that allow you to mix multiple shape types (for example, a point shape and a line shape in the same file). Geometric vector shapes are often called features, leading to the term feature data, which is synonymous with vector data or shape data.
Most vector data formats also allow storage of attribute data — typically, a simple database associating a set of values to each shape, much like a spreadsheet, with each row corresponding to a single shape. Figure 1 provides an example of vector data and an attribute table.
Figure 1. Vector data and an attribute table
The most common vector format by far is the ESRI Shapefile (see Resources), which stores only a single shape type; each file can contain only points or only polygons or only lines. The format also supports multipart shapes in three dimensions, as well as a measurement (m) dimension. The format consists of at least three individual files:
- example.shp — Stores actual shape geometry
- example.shx — Stores a spatial index of shapes, making rendering and other operations faster
- example.dbf — Stores attribute table data (actually a dBASE format file)
- example.prj — Defines the spatial reference system (SRS) or projection
Raster data and common formats
Raster data is non-geometric data defined by a grid of values covering a spatial area. Taking a 10-meter-by-10-meter grid and making one observation for each intersection (100 observations) would constitute a raster dataset. Layers of raster data are more often called bands, although the term layer still applies to raster as well as vector data.
Elevation data is a good example of a single-band raster dataset, often visualized with a color ramp, as shown in Figure 2. Electromagnetic radiation is another common dataset (for example, near-infrared radiation or visible light). Standard aerial or satellite photographs are three-band datasets consisting of red, green, and blue bands.
Figure 2. An elevation dataset, colored from red to deep blue depending on height
The most common open data format for raster data is the GeoTIFF — a TIFF image that has been extended to include additional tags, such as spatial reference/projection information and location information. See the link to the GDAL raster format information page in Resources for more information on other raster data formats.
There are multiple ways to define a coordinate system for Earth coordinates; some are better for particular purposes than others. Every coordinate system has its weak points. The common aim of all such systems is to represent an ellipsoidal shape on a flat surface — distortion in some places is almost inevitable. The best known is World Geodetic System [WGS84] geographic, or latitude and longitude (corresponding to the y and x axes, respectively). Using latitude and longitude is easiest along the equator, where the grid lines formed are roughly square. Massive distortion is seen in northern or southern areas (for example, the huge appearance of Greenland). Preserving accuracy when using a 2-D map requires understanding spatial reference systems and using them appropriately.
The Earth is not round: Ellipsoid models
The Earth is an ellipsoid, slightly flattened at the poles and bulging at the equator. Different mathematical models describing the shape of the Earth have been developed over time. The most common is the WGS1984; see the link to the Geodetic Models for the Earth page in Resources for more information.
Using mismatched ellipsoid models in your data is not generally a major problem. The ellipsoid model is the first part of a spatial reference system definition.
The second part of a spatial reference system is its projection, which defines how to lay out the Earth surface onto a 2-D surface. Unlike the ellipsoid model, using the correct projection is essential, and using data with mismatched projections often results in comedic errors, such as the layout error seen in Figure 3.
Figure 3. Two shapefiles with differing Albers Equal Area projections, failing to line up
The most common projection after geographic (latitude and longitude) is the transverse Mercator projection. Imagine taking a tube, placing the Earth inside of it, and projecting the Earth's image onto the tube along that line. The place where the Earth touches the tube is called the central meridian, and the region around it preserves angles and distances quite well but with sharply decreasing quality as you move away from the central meridian. Worldwide, 60 transverse Mercator zones have been defined for the northern and southern hemispheres, referred to as Universal Transverse Mercator (UTM).
Local and national governments often publish free data. Visit local city offices to inquire about geographic information science (GIS) data, and you'll usually leave with more data than you need. Arguably, the best web-based data source is the USGS Seamless Data Server (see Resources), which offers an enormous amount of data coverage.
Useful software for working with data
Several open source tools are handy for working with GIS data:
- Quantum GIS (QGIS) — QGIS is a full GUI available for Windows®, Linux®, and Mac OS X. One of the best tools for creating and editing data, it includes a plug-in architecture for extensions.
- MapWindow GIS — Another full GUI solution available for Windows, this tool also features a plug-in architecture for extensibility using Microsoft® C# and Microsoft Visual Basic.NET®.
- GDAL/OGR Simple Feature Library — Primarily intended
for programming, this tool ships with several invaluable command-line
tools, such as
gdal_translateandogr2ogr(raster and vector format translators, respectively) as well asgdaltransform(a tool for converting between data projections). - MapServer — This server generates imagery from data sources, providing symbology, labels, and all other details needed to make a meaningful map.
This is complicated: Can't I just use Google Maps or Google Earth?
Sure! These are both fantastic tools, but they're not right for every use. Google Maps is great for inclusion on a website when you need a simple map without hunting for data, and it's great for people without a strong programming background. However, it lacks the ability to truly use your own data beyond adding simple point locations, and it relies on a third party. Similarly, Google Earth has its applications and is a fantastic tool, but it's not intended for scientific analysis or large-scale applications in the same spirit as ArcMap, QGIS, GRASS, and others.
All of your existing language skills and tools can still be useful. GDAL, which includes tools for working with a huge variety of GIS data, comes with bindings for many common languages, including C/C++, PHP, Python, Perl, C#, and even Microsoft Visual Basic® V6. Use a language that you're comfortable with: All that changes is the subject matter.
Example program: Golf course litter reporter
The goal of this sample application is to create a public web page visitors can use to view a golf course and report problems such as litter or flooding. Problem reports should immediately become available on the map to viewers, and a shapefile stored on the server should be updated for use from a desktop GIS or a handheld device carried by golf course maintenance staff member. The project can be see in its completed state in Figure 4.
Figure 4. The completed basic web application example
The main data — the 2008 aerial image — was retrieved from USGS Seamless Server. The only other dataset in use is a shapefile representing hole boundaries. This was manually digitized using QGIS simply to provide labels to the map and to provide bounding boxes (minimum and maximum x and y values defining a rectangle) for each hole, for a Zoom To option.
The front-end GUI with HTML and jQuery
The tools you use for the front end include standard CSS/HTML as well as jQuery, Smarty, and Raphaël. jQuery is a fantastic extension to JavaScript that allows you to write code extremely quickly without sacrificing quality. A related project — jQuery UI — provides a set of widgets and common GUI elements for quick use. Smarty is a PHP-driven template engine that makes it easy to separate code and logic: You don't use many of its features here, but having Smarty in place from the beginning helps improve extensibility. Finally, Raphaël is a JavaScript drawing library that makes it trivial to draw geometric shapes over the top of an HTML element.
The primary purpose of the GUI is to render the map. In addition, the GUI must allow
the user to perform basic operations like zooming and panning and makes it
easy to perform the specialized tasks you have in mind. MapServer is used
to generate images, which requires formulating a request to MapServer that
includes a desired image width and height as well as the spatial
coordinates (bounding box) of the image you want. These coordinates can
come from JavaScript — as in the case of the Zoom To
feature — or you can provide them. You request a map by
assembling the request and posting it to the server, loading it directly
into a <div> element. As Listing 1 shows,
you can do this in one JavaScript line.
Listing 1. Loading a map into a
div element
// Order of bbox is important - MinX, MinY, MaxX, MaxY
$('#map').css('background-image', 'url(' + ret.mapservPath + '?MAP=' + ret.mapFile +
'&SERVICE=WMS&LAYERS=' + layers +
'&MODE=MAP&REQUEST=GetMap&VERSION=1.1&SRS=epsg:32612&BBOX=' + MinX + ',' + MinY +
',' + MaxX + ',' + MaxY + '&FORMAT=PNG&WIDTH=700&HEIGHT=600&cacheavoidance=' +
Math.random() + ')');
|
You use Raphaël to allow the user to draw a rectangle on the map for zoom in and zoom out operations. This provides a set of pixel coordinates corresponding to what the new zoom level should be. However, you also need to know the geographic coordinates corresponding to this selection, so you project it using the "pixels per projected unit" in each direction, as illustrated in Listing 2.
Listing 2. Converting (projecting) coordinates from pixel to projected
var x1 = Math.min(lastMouseX, e.pageX);
var x2 = Math.max(lastMouseX, e.pageX);
var y1 = Math.min(lastMouseY, e.pageY);
var y2 = Math.max(lastMouseY, e.pageY);
// Adjust for map's position
var position = $('#map').position();
x1 -= position.left;
y1 -= position.top;
x2 -= position.left;
y2 -= position.top;
var pixelPerProjectionX = (currentMaxX - currentMinX) / 700;
var pixelPerProjectionY = (currentMaxY - currentMinY) / 600;
var corner1ProjectedX = currentMinX + x1 * pixelPerProjectionX;
var corner1ProjectedY = currentMaxY - y1 * pixelPerProjectionY;
var corner2ProjectedX = currentMinX + x2 * pixelPerProjectionX;
var corner2ProjectedY = currentMaxY - y2 * pixelPerProjectionY;
if (corner2ProjectedY < corner1ProjectedY) {
// We're upside down!
var temp = corner2ProjectedY;
corner2ProjectedY = corner1ProjectedY;
corner1ProjectedY = temp;
}
prepareMap(corner1ProjectedX, corner1ProjectedY, corner2ProjectedX, corner2ProjectedY);
|
This code provides the coordinates to send to MapServer: Width and height can be
whatever you want (hard-coded in this example). One important
consideration remains: MapServer will give you what you ask for, without
complaint. If the user draws a tall, narrow box, MapServer will happily
oblige and give him exactly the rectangle asked for at the width
and height asked for, leading to a distorted, squashed, or flattened
image. You can prevent this by making sure the scale (how many pixels
correspond to each projected unit of distance) is the same (or close) in
both the x and y directions. Listing 3 illustrates this
concept but uses a shortcut: roughly approximate scale by subtracting
Min from Max for
X and Y coordinates defining the box.
Listing 3. Correction for image skew
var scaleX = (MaxX - MinX);
var scaleY = (MaxY - MinY);
// Prevent image warping due to oddly shaped requests -
// fiddle with X until scale is matched
// This dataset's projection is UTM zone 12, so tolerance is in meters -
// 5 meter skew tolerance
while (!equalWithinTolerance(scaleX, scaleY, 5)) {
if (scaleX > scaleY) {
// Bring X in
MinX += 0.00005;
MaxX -= 0.00005;
}
else {
// Push X out
MinX -= 0.00005;
MaxX += 0.00005;
}
scaleX = (MaxX - MinX);
}
|
Essentially, this is all the front end needs to do beyond standard web application design (UI elements, instructions, headings, etc.) and special behavior.
Back-end processing with PHP and C++
MapServer, to provide a map image, must be told what layers to use and how to render them. You do this with a map file — a flat text file. Luckily, you rarely need to build the map file by hand. QGIS includes a plug-in to export a map file from a loaded project, letting you design your map via a GUI and manually correct it. In general, map file exports work well, but sometimes contain common errors:
- A symbol set file is defined, called Symbols.txt, that doesn't always exist. If you don't use custom symbols, just delete this line. The same is true for a font set file called Fontset.txt.
- Some color transparencies and minor coloring differences may be present. You can tweak these differences by hand through trial and error.
- File paths are often exported incorrectly, particularly if you're uploading data to a server. Search and replace to the rescue.
The remainder of back-end processing for the web application is handled with PHP.
The Application class handles all requests,
which consist of just two operations: getHoleBounds and reportProblem.
These functions demonstrate opening a shapefile using the OGR Simple
Feature Library. For example, adding a new shape/problem is as simple as
opening the file, opening the layer within the file (remember that some
data formats support multiple layers of data in a single file), creating
the feature, and adding it. Listing 4 illustrates the complete function to
add a problem report.
Listing 4. Completed example application, Golf Course Problem Reporter
function reportProblem() {
// Data validation
foreach (array('action', 'problemType', 'description', 'x', 'y')
as $requiredItem) {
if (!isset($_REQUEST[$requiredItem]))
throw new Exception("Missing required item $requiredItem.");
}
if (!is_numeric($_REQUEST['x']))
throw new Exception("Non-numeric X coordinate provided.");
if (!is_numeric($_REQUEST['y']))
throw new Exception("Non-numeric Y coordinate provided.");
// Fire up OGR/GDAL drivers...
OGRRegisterAll();
// Open shapefile for writing (1 as second arg)
$driver = NULL;
$ds = OGROpen(Config::problemsShapefile, true, $driver);
// Open the layer - shapefiles only have one layer per file (0)
$layer = OGR_DS_GetLayer($ds, 0);
$layerDef = OGR_L_GetLayerDefn($layer);
// Create a feature and set field values
$newFeature = OGR_F_Create($layerDef);
OGR_F_SetFieldString($newFeature, 0, $_REQUEST['description']);
OGR_F_SetFieldString($newFeature, 1, $_REQUEST['problemType']);
// Create geometry and insert our point
$newGeom = OGR_G_CreateGeometry(wkbPoint);
OGR_G_SetPoint($newGeom, 0, $_REQUEST['x'], $_REQUEST['y'], 0);
// Tie it together
OGR_F_SetGeometry($newFeature, $newGeom);
OGR_L_CreateFeature($layer, $newFeature);
// Close it
OGR_DS_Destroy($ds);
}
|
For the sake of example, interaction with the problems shapefile is
demonstrated in C++ by creating a command-line
utility to add and list problems. Interacting with a shapefile in C++
follows the same basic approach as with PHP, but using a more strongly
object-oriented approach. A wrapper for the tool using zenity
(a GTK+ UI tool) is provided so you can use it without having to have
command-line experience (for example, golf course office staff).
Understanding and using GIS concepts and data in your software can, when implemented in the right place, make it much more useful for your users. Any developer working on web, desktop, mobile, or other platforms can benefit from using spatial and geographic data in their work. The popularity of mapping and related topics is exploding — particularly as more mobile devices come equipped with GPS location information. Nearly every facet of our lives can tie into our location in some manner — and so can our software.
Learn
-
Learn more about ESRI
shapefiles.
-
The OGR vector format information page
provides additional information on alternative vector formats beyond the ESRI
Shapefile.
-
Check out the GDAL raster format information
for more information on raster formats, listing nearly every format known.
-
To learn more about spatial reference systems, see the
Geodetic Models
for the Earth page to learn about ellipsoid models and the
USGS Map
Projections Poster page to learn more about projection systems.
-
USGS Seamless Data Server is one of the best
data resources available.
-
MapServer map file documentation
can be handy for troubleshooting error messages from MapServer or customizing
your maps.
-
Golf Course Problem Reporter: See
the complete sample application live demonstration. Problems are reset nightly.
- See other approaches to working with
Geospatial Information in the Building a geospatial information system series on
developerWorks.
-
To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Follow developerWorks on Twitter.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
-
Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products, as well as our most popular articles and tutorials.
-
Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.
Get products and technologies
-
QGIS Desktop GIS is a fantastic multiplatform
desktop GIS application.
-
Alternatively, MapWindow GIS Desktop GIS
is an excellent desktop GIS for Windows.
-
The GDAL/OGR libraries and command-line tools
are invaluable for both programming and daily operations on data.
-
MapServer provides map imagery on demand
from data.
-
If you don't already have them installed, download the Smarty
template engine and the Raphaël JavaScript
library.
-
Innovate your next open source development project with IBM trial software, available for download or on DVD.
- Download
IBM product evaluation versions
or explore
the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from
DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
-
Participate in developerWorks blogs and get involved in the developerWorks community.
-
Get involved in the developerWorks community.
Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Christopher Michaelis is a developer specializing in web hosting and web applications. He attended Utah State University and Idaho State University, studying computer science and geographic information science. His experience covers a wide range of development, including applications for environmental modeling, education, diabetes data analysis, systems administration tools, web hosting, platform integration, and location-based services.




