Previously restricted to geographers and geologists, GIS software has become extremely popular since global mapping services became available on the Web and mobile-phone companies began offering Global Positioning System (GPS) services. GIS systems acquire and process spatial data describing the Earth's surface. Basically, geographic data turns into geographic information after spatial processing software formats, processes, and displays the data. Sonar, radar, cameras, and other observation platforms acquire the data. GIS software stores the data in a 3-D database, and formats and transforms the data, sometimes even showing development over time in so-called 4-D (3-D + time) transformations.
Of course, all GIS data can and often must be edited, and most data sets arrive in a format that GIS specialists are educated to deal with. Many, if not all, data formats adhere to open standards, and all operations within the GIS space can be conducted using open source applications running on Linux, the major Berkeley Software Design (BSD) flavors, and some UNIX varieties. Most of the important GIS software suites are available on Mac OS X, as well.
GIS software has an enormous application range. The very notion of natural-resource management depends on the availability of maps and the ability to overlay topological data with data layers referring to -- among other things -- geomorphological and hydrological processes. Archaeologists use GIS to reconstruct ancient trading networks. Urban planners use GIS to model infrastructure. Environmental scientists need GIS software to model erosion in coastal areas and mountain valleys. Global-warming effects become far more apparent when visualized in GIS data viewers. Cartographers depend on GIS software to aggregate data to produce data-rich maps. GPS-enabled mobile-telephone users can use GPS and GIS software to locate the people they are talking to and, sometimes, their own position.
Quantum GIS: An open source GIS data viewer
GIS applications on Linux have multiplied in recent years. In the 1980s, programmers developed the Geographic Resources Analysis Support System (GRASS). After modification in the late 1990s, GRASS has enabled anyone with a knowledge of GIS and some expertise in Linux to run a complete GIS system from the Linux command line or a graphical user interface (GUI). Unfortunately, GRASS' very completeness led to complications for GIS beginners on Linux. The GUI struggled to keep up with the large number of features and command-line flags, aimed at GIS experts, rather than at novices trying to create maps for the Web.
In May 2002, developers created GPLed Quantum GIS (QGIS), a project aimed at beginning and intermediate users who need to access, display, and possibly edit GIS data sets. GIS users can deploy QGIS as a stand-alone GIS data viewer and editor or as part of a GIS tool chain. A GIS tool chain might include QGIS, the GRASS software suite, a 3-D PostGIS database, and a map server delivering data sets and maps to users accessing mapping sites over the Internet.
The developers of QGIS decided to use the C++-based Qt tool kit to build the QGIS interface, which was a major departure from previous practice. (Programmers created GRASS using principally Tcl/Tk, whose foundation we can trace back to the late 1980s.) Although developed on the basis of the GPLed Qt tool kit, QGIS runs on most Linux and UNIX varieties, Microsoft® Windows®, and Mac OS X.
Applications store GIS data using two distinct data structures: raster data and vector data. You can add database storage in a 3-D data format optimized for handling by PostgreSQL -- known as the PostGIS data format. For reasons you'll see later, we classify PostGIS data as vector data.
QGIS handles all three -- raster, vector, and database -- a state of affairs that took considerable effort for programmers writing GIS data programming libraries. You can easily visualize the data structure by imagining a grid of square or hexagonal cells. (In practice, applications use square cells most of the time.) These cells overlay a geographic area like a matrix, with a mathematical representation formalized in a field called map algebra. A GIS specialist can add data, like precipitation values or economic data, to each cell, but describing complex irregular geographic shapes is difficult. The software often must rely on similarity and the values' location in cells to classify features (a street or a coastline, for example), rather than feature descriptions encapsulated in metadata. Another possible interpretation lies in using the color values associated with individual cells to classify groups of raster cells into features.
Cells are arranged in rows and columns, which mirrors the way a hard drive stores data. Many raster-based formats have roots in image formats: A common raster-based image format is bitmap (.bmp). Tagged image file format (.tiff) is another commonly used raster-based image format, which GIS specialists adapted to geographers' needs and renamed GeoTIFF. Raster-based data formats tend to behave much more like images. Their accuracy depends on the number of cells describing as few features as possible.
In the raster data model, the map's accuracy also depends on the map's scale. The map's resolution and, hence, accuracy depend on the real-world area that each grid cell represents. The data model's comparable simplicity lends itself to modeling data acquired by GPS devices and satellite imaging. Several data formats fit the raster model well. For example, Digital Elevation Model (DEM) data points are equally arranged in a grid pattern. The DEM format encodes elevation data to create highly detailed terrains. The U.S. Geological Survey (USGS) released an extremely popular global DEM data set into the public domain several years ago.
Life changes for the GIS specialist when vector-based data formats appear. New data viewers and editors like Thuban and QGIS never had to struggle with the much more expressive vector-based formats because the ability to edit and add vector-based data layers was built in from the start. GRASS, with its more than 20-year history, acquired this ability fairly recently.
In short, vector data take the simplest topological entities -- points, lines, and polygons -- and anchor them within a 2-D Cartesian coordinate system to describe geographical features. The connecting lines are arcs, and the points within the Cartesian coordinate system are nodes. The data structures are reminiscent of graphs, and the mathematical basis lies in graph theory. So-called arc node lists contain arcs and nodes. The lists define polygons and can be layered on top of each other to map data sets of completely different origins while describing the same geographical area to form extremely data-rich maps.
QGIS and other simpler data viewers lend themselves to exploring various data formats and the world of data sets covering this planetary system. As opposed to some of the commercial offerings and even GRASS, you can install these viewers easily and use them on almost all major operating systems. Mileage may vary slightly, but success is almost always guaranteed.
QGIS supports many vector data formats, like Shapefiles, MapInfo layers, and ArcInfo coverages. Vector data require far less storage than raster data because arc node lists simplify and reduce the data required to make sense of the features contained in the map. These data also make it much easier to search the map or the vector representation's various layers. Shapefiles became available when the commercial ArcInfo GIS software suite entered the market in the early 1990s. Other file and data formats have since emerged, but without the free and open source programming community, none of these efforts would have evolved beyond the specialist concerns of academic geographers and military planners.
GRASS recently reached version 6.0 and supports about 40 data formats. It goes beyond 2-D raster formats to include voxel, or 3-D, raster format. The large number of imaging and mapping modules make it a lot easier for GIS specialists to analyze data in new ways. Long-term simulations and sophisticated map-making are possible.
However, the problem users still need to address lies in a fairly cluttered interface and complex installation routine that favor UNIX and Linux specialists. To some extent, this complexity results from the large number of libraries and facilities accompanying GRASS. Fortunately, students at all levels of GIS expertise wrote most of the documentation available with GRASS, thus enabling the Linux and UNIX neophyte to cope with a GRASS installation's considerable demands.
GDAL and OGR
When you talk about GIS data formats, you must consider the large number of formats to support to make each data viewer and GIS application widely usable. Open source GIS applications have to cover most open data format standards, from ArcInfo to X Window System. Formats such as GeoTIFF are fairly common examples of open standards routinely supported by most applications.
Within the open source world, GRASS, QGIS, Thuban, and many other GIS applications use a base library known as the Geospatial Data Abstraction Library (GDAL). Written in C and C++, GDAL proper covers raster formats only, although another library, called the OGR Simple Features Library (originally known as OpenGIS Simple Features Reference Implementation), aims at vector formats that exist as part of the GDAL source tree. OGR relies on GDAL. In fact, without the open source-licensed GDAL, most modern geodata viewers would be unthinkable. The library gives programmers a common data model covering all raster data formats and -- through OGR -- vector data formats. GDAL also enables programmers to project raster data onto worldwide geographical coordinates, known as georeferenced coordinates.
PostGIS and OpenGIS
Public domain GIS data cannot exist without some fairly sophisticated data-storage mechanism. However, storage might not be quite as important for raster data, whose spatial component likely contains fairly simple numerical data. Programmers must take only a few higher-level constructs into account, although most programmers regard raster data as cumbersome and memory-intensive.
The OpenGIS standard addresses these problems by making vector data -- namely, geometrical objects, such as points, lines, polygons, and their composites -- accessible in a 3-D-enabled database called PostgreSQL. (The implementation of the OpenGIS standard for PostgreSQL is known as PostGIS.) GIS data stored within a PostgreSQL database are fully searchable using SQL-92.
Today, programmers can access a whole continuum of open source GIS applications, largely developed on UNIX and Linux systems. These efforts rest on standards that are usually open, and most of the Internet map-making area tends to rely on these standards. Any programmer dealing with geographical data in any form or fashion is likely to encounter the base libraries in the same way any Linux systems programmer would encounter
glibc. But no GIS programmer, even if he wields the keyboard just to script data filters or hack a tool chain, should have to figure out which data formats and base libraries to use.
The realm of geographical data sets and GIS applications may seem a bit difficult to understand in a world that talks about open source and global environmental phenomena. However, open source applications, such as GRASS and QGIS, seek to make public-domain GIS data sets available to programmers and technical users so they can avoid commercial alternatives. Libraries like GDAL and OGR make it possible to put GIS data processing on a common open source foundation without compromising the integrity of open GIS data standards.
- Check out Quantum GIS for full download links and references to documentation.
- GRASS contains vital download links for GRASS.
- Be sure to bookmark the GRASS tutorials, which is important not so much for its links to GRASS documentation but because it provides links to GIS software in several languages. The English-language tutorials are excellent. For readers of German, the GRASS 6 tutorial is vital.
- If you need global data sets, start your search with the Global GIS Data Resources.
- GDAL contains the complete API reference and a summary of all data formats accessible to GDAL programmers and command-line users.
- Vector data information for C++ programmers can be hard to come by. The terse but thorough OGR Simple Features Library provides the documentation you need.
- Check out PostGIS for information about PostgreSQL.
- For standards documents, including and going beyond OpenGIS, and general news on ongoing work in GIS standards, visit the Open Geospatial Consortium.
- The USGS National Map Viewer integrates maps from various U.S. government sources and displays them on a browser.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
- Innovate your next open source development project with IBM trial software, available for download or on DVD.
- Browse for books on these and other technical topics.
- Get involved in the developerWorks community by participating in developerWorks blogs.