Using geospatial data in applications on Linux with GDAL

What is geospatial data, and how can I use it?

Learn the concepts of geospatial data, including data formats, coordinate reference systems, and where to find data, and get an overview of tools available and how to use data in code. You'll construct an example application, using the Geospatial Data Abstraction Layer (GDAL) and UMN MapServer to demonstrate the concepts learned.

Christopher Michaelis, Developer, Consultant

Christopher Michaelis is a developer specializing in web hosting and web applications. He attended Utah State University and Idaho State University, studying computer science and geographic information science. His experience covers a wide range of development, including applications for environmental modeling, education, diabetes data analysis, systems administration tools, web hosting, platform integration, and location-based services.



01 February 2011

Also available in Russian Japanese

If you want to write high-quality software that uses any sort of map, it pays to understand the subject well. There are dozens of possible uses for a map in your software — some crucial, such as showing locations of homes in a realty website, and some less crucial but still nice, such as showing a map of an address in a customer relationship management application.

A crash course in geographic information science

Geographic data can be stored in any form. A written longhand address, such as 1384 Elm Street, is technically geographic data. Specialized formats are used to make the exchange and use of spatial data easier. Most data files consist of a single dataset, referred to as a layer. This naming convention derives from a map being built with multiple layers of data placed on top of one another.

Vector data and common formats

Vector data refers to data defined by geometric constructs, such as line segments representing a street or points representing single locations. Vector data contains geometric shapes defined by x and y coordinates, and frequently z (elevation) coordinates. Some data formats also provide an m (measure) coordinate, most often used for time increments.

Similarly, some formats support multipart shapes (for example, multiline or multipolygon). These are shapes in which separate, disconnected shapes are grouped together to form a single shape element. An example is Hawaii, which consists of several discrete shapes (islands), usually represented as a single, multipart shape. More advanced data formats also allow specifying arcs, circles, or other complex shapes, and there are data formats that allow you to mix multiple shape types (for example, a point shape and a line shape in the same file). Geometric vector shapes are often called features, leading to the term feature data, which is synonymous with vector data or shape data.

OGR vector format information

See the OGR vector format information in Resources for more information about other vector data formats.

Most vector data formats also allow storage of attribute data — typically, a simple database associating a set of values to each shape, much like a spreadsheet, with each row corresponding to a single shape. Figure 1 provides an example of vector data and an attribute table.

Figure 1. Vector data and an attribute table
Table overlayed on a map shows country data, including NAME, GMI_CNTRY and REGION

The most common vector format by far is the ESRI Shapefile (see Resources), which stores only a single shape type; each file can contain only points or only polygons or only lines. The format also supports multipart shapes in three dimensions, as well as a measurement (m) dimension. The format consists of at least three individual files:

  • example.shp— Stores actual shape geometry
  • example.shx— Stores a spatial index of shapes, making rendering and other operations faster
  • example.dbf— Stores attribute table data (actually a dBASE format file)
  • example.prj— Defines the spatial reference system (SRS) or projection

Raster data and common formats

Raster data is non-geometric data defined by a grid of values covering a spatial area. Taking a 10-meter-by-10-meter grid and making one observation for each intersection (100 observations) would constitute a raster dataset. Layers of raster data are more often called bands, although the term layer still applies to raster as well as vector data.

Elevation data is a good example of a single-band raster dataset, often visualized with a color ramp, as shown in Figure 2. Electromagnetic radiation is another common dataset (for example, near-infrared radiation or visible light). Standard aerial or satellite photographs are three-band datasets consisting of red, green, and blue bands.

Figure 2. An elevation dataset, colored from red to deep blue depending on height
Graphic showing a topographical map, color-coded as described in the heading

The most common open data format for raster data is the GeoTIFF— a TIFF image that has been extended to include additional tags, such as spatial reference/projection information and location information. See the link to the GDAL raster format information page in Resources for more information on other raster data formats.


Spatial reference systems

There are multiple ways to define a coordinate system for Earth coordinates; some are better for particular purposes than others. Every coordinate system has its weak points. The common aim of all such systems is to represent an ellipsoidal shape on a flat surface — distortion in some places is almost inevitable. The best known is World Geodetic System [WGS84] geographic, or latitude and longitude (corresponding to the y and x axes, respectively). Using latitude and longitude is easiest along the equator, where the grid lines formed are roughly square. Massive distortion is seen in northern or southern areas (for example, the huge appearance of Greenland). Preserving accuracy when using a 2-D map requires understanding spatial reference systems and using them appropriately.

The Earth is not round: Ellipsoid models

The Earth is an ellipsoid, slightly flattened at the poles and bulging at the equator. Different mathematical models describing the shape of the Earth have been developed over time. The most common is the WGS1984; see the link to the Geodetic Models for the Earth page in Resources for more information.

Using mismatched ellipsoid models in your data is not generally a major problem. The ellipsoid model is the first part of a spatial reference system definition.

Projection systems

The second part of a spatial reference system is its projection, which defines how to lay out the Earth surface onto a 2-D surface. Unlike the ellipsoid model, using the correct projection is essential, and using data with mismatched projections often results in comedic errors, such as the layout error seen in Figure 3.

Figure 3. Two shapefiles with differing Albers Equal Area projections, failing to line up
Graphic shows a map of the United States in purple, with a scattering of green circles incorrectly offeset from the map

The most common projection after geographic (latitude and longitude) is the transverse Mercator projection. Imagine taking a tube, placing the Earth inside of it, and projecting the Earth's image onto the tube along that line. The place where the Earth touches the tube is called the central meridian, and the region around it preserves angles and distances quite well but with sharply decreasing quality as you move away from the central meridian. Worldwide, 60 transverse Mercator zones have been defined for the northern and southern hemispheres, referred to as Universal Transverse Mercator (UTM).

Map projections

Refer to the link to the Map Projections Poster published by the U.S. Geological Survey (USGS) provided in Resources for more information on various map projection systems.


Common free data sources

Local and national governments often publish free data. Visit local city offices to inquire about geographic information science (GIS) data, and you'll usually leave with more data than you need. Arguably, the best web-based data source is the USGS Seamless Data Server (see Resources), which offers an enormous amount of data coverage.


Useful software for working with data

Several open source tools are handy for working with GIS data:

  • Quantum GIS (QGIS)— QGIS is a full GUI available for Windows®, Linux®, and Mac OS X. One of the best tools for creating and editing data, it includes a plug-in architecture for extensions.
  • MapWindow GIS— Another full GUI solution available for Windows, this tool also features a plug-in architecture for extensibility using Microsoft® C# and Microsoft Visual Basic.NET®.
  • GDAL/OGR Simple Feature Library— Primarily intended for programming, this tool ships with several invaluable command-line tools, such as gdal_translate and ogr2ogr (raster and vector format translators, respectively) as well as gdaltransform (a tool for converting between data projections).
  • MapServer— This server generates imagery from data sources, providing symbology, labels, and all other details needed to make a meaningful map.

This is complicated: Can't I just use Google Maps or Google Earth?

Sure! These are both fantastic tools, but they're not right for every use. Google Maps is great for inclusion on a website when you need a simple map without hunting for data, and it's great for people without a strong programming background. However, it lacks the ability to truly use your own data beyond adding simple point locations, and it relies on a third party. Similarly, Google Earth has its applications and is a fantastic tool, but it's not intended for scientific analysis or large-scale applications in the same spirit as ArcMap, QGIS, GRASS, and others.


Coding with geographic data

All of your existing language skills and tools can still be useful. GDAL, which includes tools for working with a huge variety of GIS data, comes with bindings for many common languages, including C/C++, PHP, Python, Perl, C#, and even Microsoft Visual Basic® V6. Use a language that you're comfortable with: All that changes is the subject matter.

Example program: Golf course litter reporter

The goal of this sample application is to create a public web page visitors can use to view a golf course and report problems such as litter or flooding. Problem reports should immediately become available on the map to viewers, and a shapefile stored on the server should be updated for use from a desktop GIS or a handheld device carried by golf course maintenance staff member. The project can be see in its completed state in Figure 4.

Figure 4. The completed basic web application example
Screenshot shows the application with a photo of a golf course and various graphical data element laid over it with buttons to zoom, pan and report a problem

The main data — the 2008 aerial image — was retrieved from USGS Seamless Server. The only other dataset in use is a shapefile representing hole boundaries. This was manually digitized using QGIS simply to provide labels to the map and to provide bounding boxes (minimum and maximum x and y values defining a rectangle) for each hole, for a Zoom To option.

The front-end GUI with HTML and jQuery

The tools you use for the front end include standard CSS/HTML as well as jQuery, Smarty, and Raphaël. jQuery is a fantastic extension to JavaScript that allows you to write code extremely quickly without sacrificing quality. A related project — jQuery UI — provides a set of widgets and common GUI elements for quick use. Smarty is a PHP-driven template engine that makes it easy to separate code and logic: You don't use many of its features here, but having Smarty in place from the beginning helps improve extensibility. Finally, Raphaël is a JavaScript drawing library that makes it trivial to draw geometric shapes over the top of an HTML element.

The primary purpose of the GUI is to render the map. In addition, the GUI must allow the user to perform basic operations like zooming and panning and makes it easy to perform the specialized tasks you have in mind. MapServer is used to generate images, which requires formulating a request to MapServer that includes a desired image width and height as well as the spatial coordinates (bounding box) of the image you want. These coordinates can come from JavaScript — as in the case of the Zoom To feature — or you can provide them. You request a map by assembling the request and posting it to the server, loading it directly into a <div> element. As Listing 1 shows, you can do this in one JavaScript line.

Listing 1. Loading a map into a div element
// Order of bbox is important - MinX, MinY, MaxX, MaxY
$('#map').css('background-image', 'url(' + ret.mapservPath + '?MAP=' + ret.mapFile + 
	'&SERVICE=WMS&LAYERS=' + layers + 
	'&MODE=MAP&REQUEST=GetMap&VERSION=1.1&SRS=epsg:32612&BBOX=' + MinX + ',' + MinY +
	',' + MaxX + ',' + MaxY + '&FORMAT=PNG&WIDTH=700&HEIGHT=600&cacheavoidance=' + 
	Math.random() + ')');

You use Raphaël to allow the user to draw a rectangle on the map for zoom in and zoom out operations. This provides a set of pixel coordinates corresponding to what the new zoom level should be. However, you also need to know the geographic coordinates corresponding to this selection, so you project it using the "pixels per projected unit" in each direction, as illustrated in Listing 2.

Listing 2. Converting (projecting) coordinates from pixel to projected
var x1 = Math.min(lastMouseX, e.pageX);
var x2 = Math.max(lastMouseX, e.pageX);
var y1 = Math.min(lastMouseY, e.pageY);
var y2 = Math.max(lastMouseY, e.pageY);

// Adjust for map's position
var position = $('#map').position();
x1 -= position.left;
y1 -= position.top;
x2 -= position.left;
y2 -= position.top;

var pixelPerProjectionX = (currentMaxX - currentMinX) / 700;
var pixelPerProjectionY = (currentMaxY - currentMinY) / 600;
var corner1ProjectedX = currentMinX + x1 * pixelPerProjectionX;
var corner1ProjectedY = currentMaxY - y1 * pixelPerProjectionY;
var corner2ProjectedX = currentMinX + x2 * pixelPerProjectionX;
var corner2ProjectedY = currentMaxY - y2 * pixelPerProjectionY;

if (corner2ProjectedY < corner1ProjectedY) {
	// We're upside down!
	var temp = corner2ProjectedY;
	corner2ProjectedY = corner1ProjectedY;
	corner1ProjectedY = temp;
}

prepareMap(corner1ProjectedX, corner1ProjectedY, corner2ProjectedX, corner2ProjectedY);

This code provides the coordinates to send to MapServer: Width and height can be whatever you want (hard-coded in this example). One important consideration remains: MapServer will give you what you ask for, without complaint. If the user draws a tall, narrow box, MapServer will happily oblige and give him exactly the rectangle asked for at the width and height asked for, leading to a distorted, squashed, or flattened image. You can prevent this by making sure the scale (how many pixels correspond to each projected unit of distance) is the same (or close) in both the x and y directions. Listing 3 illustrates this concept but uses a shortcut: roughly approximate scale by subtracting Min from Max for X and Y coordinates defining the box.

Listing 3. Correction for image skew
var scaleX = (MaxX - MinX);
var scaleY = (MaxY - MinY);
// Prevent image warping due to oddly shaped requests - 
// fiddle with X until scale is matched
// This dataset's projection is UTM zone 12, so tolerance is in meters - 
// 5 meter skew tolerance
while (!equalWithinTolerance(scaleX, scaleY, 5)) {
	if (scaleX > scaleY) {
		// Bring X in
		MinX += 0.00005;
		MaxX -= 0.00005;
	}
	else {
		// Push X out
		MinX -= 0.00005;
		MaxX += 0.00005;
	}
	scaleX = (MaxX - MinX);
}

Essentially, this is all the front end needs to do beyond standard web application design (UI elements, instructions, headings, etc.) and special behavior.

Back-end processing with PHP and C++

MapServer, to provide a map image, must be told what layers to use and how to render them. You do this with a map file— a flat text file. Luckily, you rarely need to build the map file by hand. QGIS includes a plug-in to export a map file from a loaded project, letting you design your map via a GUI and manually correct it. In general, map file exports work well, but sometimes contain common errors:

  • A symbol set file is defined, called Symbols.txt, that doesn't always exist. If you don't use custom symbols, just delete this line. The same is true for a font set file called Fontset.txt.
  • Some color transparencies and minor coloring differences may be present. You can tweak these differences by hand through trial and error.
  • File paths are often exported incorrectly, particularly if you're uploading data to a server. Search and replace to the rescue.

The remainder of back-end processing for the web application is handled with PHP. The Application class handles all requests, which consist of just two operations: getHoleBounds and reportProblem. These functions demonstrate opening a shapefile using the OGR Simple Feature Library. For example, adding a new shape/problem is as simple as opening the file, opening the layer within the file (remember that some data formats support multiple layers of data in a single file), creating the feature, and adding it. Listing 4 illustrates the complete function to add a problem report.

Listing 4. Completed example application, Golf Course Problem Reporter
function reportProblem() {
	// Data validation
	foreach (array('action', 'problemType', 'description', 'x', 'y')
		as $requiredItem) {
		if (!isset($_REQUEST[$requiredItem])) 
			throw new Exception("Missing required item $requiredItem.");
	}
	if (!is_numeric($_REQUEST['x']))
		throw new Exception("Non-numeric X coordinate provided.");
	if (!is_numeric($_REQUEST['y']))
		throw new Exception("Non-numeric Y coordinate provided.");

	// Fire up OGR/GDAL drivers...
	OGRRegisterAll();

	// Open shapefile for writing (1 as second arg)
	$driver = NULL;
	$ds = OGROpen(Config::problemsShapefile, true, $driver);

	// Open the layer - shapefiles only have one layer per file (0)
	$layer = OGR_DS_GetLayer($ds, 0);
	$layerDef = OGR_L_GetLayerDefn($layer);

	// Create a feature and set field values
	$newFeature = OGR_F_Create($layerDef);
	OGR_F_SetFieldString($newFeature, 0, $_REQUEST['description']); 
	OGR_F_SetFieldString($newFeature, 1, $_REQUEST['problemType']); 

	// Create geometry and insert our point
	$newGeom = OGR_G_CreateGeometry(wkbPoint);
	OGR_G_SetPoint($newGeom, 0, $_REQUEST['x'], $_REQUEST['y'], 0);

	// Tie it together
	OGR_F_SetGeometry($newFeature, $newGeom);
	OGR_L_CreateFeature($layer, $newFeature);

	// Close it
	OGR_DS_Destroy($ds);
}

For the sake of example, interaction with the problems shapefile is demonstrated in C++ by creating a command-line utility to add and list problems. Interacting with a shapefile in C++ follows the same basic approach as with PHP, but using a more strongly object-oriented approach. A wrapper for the tool using zenity (a GTK+ UI tool) is provided so you can use it without having to have command-line experience (for example, golf course office staff).


Conclusion

Understanding and using GIS concepts and data in your software can, when implemented in the right place, make it much more useful for your users. Any developer working on web, desktop, mobile, or other platforms can benefit from using spatial and geographic data in their work. The popularity of mapping and related topics is exploding — particularly as more mobile devices come equipped with GPS location information. Nearly every facet of our lives can tie into our location in some manner — and so can our software.

Resources

Learn

Get products and technologies

Discuss

  • Participate in developerWorks blogs and get involved in the developerWorks community.
  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=619781
ArticleTitle=Using geospatial data in applications on Linux with GDAL
publish-date=02012011