Geo-routing tasks with PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL
If you need to plan your UK castle tour but don’t know how to start, let us show you how PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL can get you started on this and any other geo-routing tasks you may need your database to perform.
With IBM Cloud Databases (ICD) for PostgreSQL, you have the capability to run complex geospatial queries using PostGIS and pgRouting. In this article, we’ll show you how to get started using pgRouting by getting the shortest paths between castles in the United Kingdom (UK).
While PostGIS is an extension for PostgreSQL, pgRouting can be viewed as an extension for PostGIS. What pgRouting does is extend PostGIS’s functionality by providing you with several functions based on popular network algorithms to solve problems like the shortest path and the traveling salesperson problem.
In order to do that, however, pgRouting must have a valid topology. Take, for example, a road network. Each road segment is joined together at intersections, which we’ll call “nodes.” In order for pgRouting to work, each node must be connected to a road segment so that it can figure out how to get from one node to the next. pgRouting helps us do that using a few functions create network topologies and inspect their validity.
In addition to topology functions, pgRouting gives us various routing functions that we can use to find the shortest path between nodes. For this article, we’ll only look at two of them—
pgr_dijkstra (or Dijkstra’s algorithm) and
pgr_ksp (or Yen’s algorithm).
Before using these pgRouting functions, we’ll give you a short introduction to setting up a dataset so that it’s capable of using pgRouting because having only a dataset of geographic points is not enough. First, we’ll create a network of nodes using the locations of castles in the UK and connect them together. These connections are called “edges.” From there, we’ll let pgRouting do all the heavy lifting by analyzing our routing topology and computing the shortest routes from one node to the next.
To show you how this is done, let’s first work on a sample dataset of castles in the UK where we’ll create the nodes and edges manually. In the next article, we’ll move on to using an ESRI shapefile that has nodes and edges already established and how to make sure that it has a valid topology.
Getting set up
You’ll need an IBM Cloud Databases (ICD) for PostgreSQL database up and running first. Then, add the PostGIS and pgRouting extensions from the PostgreSQL shell – psql. To do that, we’ll create the PostGIS and pgRouting extensions together using the
CASCADE parameter for
CREATE EXTENSION that comes with PostgreSQL 9.6. It will automatically download all the extensions the pgRouting depends on:
CREATE EXTENSION pgrouting CASCADE; -- adds pgRouting and PostGIS 2.2.5 SELECT upgrade_postgis_23x(); -- upgrade PostGIS
Now that we have the extensions set up, let’s create a table to hold some data. For this exercise, our table will contain a list of the names and coordinates of 12 popular castles in the UK based on The Telegraph’s twelve best castles in the UK. So, the table will contain three columns named
geom is defined as a geometry data type containing a
POINT object since each castle is a point on the map.
CREATE TABLE castles( id SERIAL PRIMARY KEY, name VARCHAR(75), geom GEOMETRY(POINT) );
Next, we’ll insert the castles into the table. Notice that we use the PostGIS functions
ST_Point. We’ll have to use
ST_Point so that PostGIS can process the coordinates. Then, we’ll set each coordinate to the Esri code WGS 84 or SRID 4326. For more information on Esri and SRIDs, see the article on spatial reference systems and databases.
insert into castles (name, geom) values ('Edinburgh Castle', ST_SetSRID(ST_Point(-3.200833, 55.948611), 4326)), ('Dover Castle', ST_SetSRID(ST_Point(1.3214, 51.1297), 4326)), ('Tower of London', ST_SetSRID(ST_Point(-0.076111, 51.508056), 4326)), ('Warwick Castle', ST_SetSRID(ST_Point(-1.584722, 52.279167), 4326)), ('Caernarfon Castle', ST_SetSRID(ST_Point(-4.2769, 53.1393), 4326)), ('Stirling Castle', ST_SetSRID(ST_Point(-3.948889, 56.123889), 4326)), ('Carrickfergus Castle', ST_SetSRID(ST_Point(-5.806446, 54.713314), 4326)), ('Cardiff Castle', ST_SetSRID(St_Point(-3.1811, 51.4824), 4326)), ('York Castle', ST_SetSRID(ST_Point(-1.08, 53.9558), 4326)), ('Leeds Castle', ST_SetSRID(ST_Point(0.628889, 51.248056), 4326)), ('Lancaster Castle', ST_SetSRID(ST_Point(-2.80562, 54.04981), 4326)), ('Arundel Castle', ST_SetSRID(ST_Point(-0.553611, 50.856111), 4326));
The output of this table will look something like the following:
Map of UK castles
Connecting the castles
Now that we have castles and their locations, we can set up a table that will connect them all together called
SELECT DISTINCT ON (LEAST(X.name || Y.name, Y.name || X.name)) dense_rank() OVER(ORDER BY LEAST(X.name || Y.name, Y.name || X.name))::INTEGER AS id, X.name AS name_1, Y.name AS name_2, ST_MakeLine(X.geom, Y.geom)::GEOMETRY(LineString,4326) AS geom, ROUND(ST_Distance(X.geom::GEOGRAPHY, Y.geom::GEOGRAPHY)/1000) AS distance, X.id AS source, Y.id AS target INTO castle_routes FROM castles X CROSS JOIN castles Y WHERE X.name <> Y.name;
What we’re doing above is creating a new table called
castle_routes that contains edges that connect each castle to one another, but not to themselves. Using the PostgreSQL window function
idcolumn of our edges table will create a sequence of edges. We define the origin and destination names of each edge with
geom column contains the line string we created to connect each castle one another. The line strings are also set to SRID 4326, the same as our castle coordinates, using the geometry typemod construct. The
distance column contains the length of the edge, which is the distance between one castle and another in kilometers. To get the distance in kilometers, we’ll have to cast our geometry type data to geography. Why? Because the geography data type calculates distances in meters, unlike the geometry data type which uses degrees. Then, we’ll use the ids from the
castles table to populate the
target columns, which indicate the beginning and end nodes of each edge, respectively.
When this SQL query is executed, we’ll get a table containing 66 edges like the following (we’ve kept the
geom column out for brevity):
Since we’re creating our own map that has routes connecting from one castle to the next, we’ll have to reconstruct the table of nodes (our castles) based on the edges table we’ve created (
castle_routes). To do that, use the pgRouting function
pgr_createVerticesTable using the
castle_routes as the table name, the geometry column
geom, and the
target column names.
The only name the function requires is the edges table name
castle_routes. If you omit
geom, the function will look for
the_geom as the geometry column name; therefore, it must be defined.
target also are default names that the function looks for, but for showcasing how the function works, they’ve been added.
The output of running this function will show the process of creating a vertices table with containing an id and
the_geom column with the geometries of each castle location. The output will show that it has created 12 vertices from the 66 edges, which corresponds to the number of castles we have.
From this, a new table called
castle_routes_vertices_pgr is created that looks like this:
Map of UK castles with vertices
That’s all that’s needed now for the general set up of the routes. Creating your own points and generating the edges is not really necessary if you’re using a shapefile or OpenStreetMap data. But, it’s always best to understand how routes are generated if you decide to generate your own routes from your maps.
Getting the shortest path
Now that we have the routes prepared, we can start finding out the optimal path to get from one castle to the next. To do that, pgRouting provides various functions that’ll calculate the distance and cost of each route, but we’ll focus on perhaps the most basic shortest path function —
To use the function to calculate the distance from the Tower of London to Stirling Castle, we’d write something like this:
SELECT p.seq, p.node, p.cost, r.geom as edge_geom, c.name FROM pgr_dijkstra( 'SELECT id, source, target, distance AS cost FROM castle_routes', (SELECT id FROM castles WHERE name = 'Tower of London'), (SELECT id FROM castles WHERE name = 'Stirling Castle'), FALSE ) AS p LEFT JOIN castle_routes AS r ON p.edge = r.id LEFT JOIN castles AS c ON p.node = c.id ORDER BY p.seq;
The first argument of the function takes a string which needs the
cost columns from the edge table. Since the
castle_routes table doesn’t include a
cost column, we’re assigning
cost to the
distance column. The second and third arguments are the
target columns for the route. We can either hard code an id, or if we don’t remember the id of the castle, we can select the id of the castle we want using an SQL query like above with a WHERE filter. The fourth argument we provided is set to FALSE since we’re using an undirected graph. If you experiment with the function and set it to TRUE, the function will expect another column in
reverse_cost that shows the distance in reverse, which we don’t need here. For maps with road networks or OpenStreetMap data, you’ll usually find the reverse cost added in the data they give you.
The output from this query will give us the
node (castle ids), the cost (distance) of 573 kilometers, the castle names, and the edge geometry connecting both castles.
It’s well known that shortest path between two points is a straight line. Selecting the shortest path becomes more interesting, however, if we select other castles we’d like to visit in between the Tower of London and Stirling Castle. Let’s say that we want to determine what the cost of going to Stirling Castle and visiting other castles along the way. For that, we’d use the pgRouting function
pgr_dijkstra, it will give you multiple paths to your destination that we can choose from.
Below is an example that will calculate five of the shortest paths between the Tower of London and Stirling Castle. You can substitute the number
5 in the SQL query with any number to get a different number of paths.
SELECT x.path_id, x.path_seq, COALESCE(s.name || ' - ' || t.name, 'Total Trip') as route, CASE WHEN edge = -1 THEN agg_cost ELSE NULL END AS "total_cost(distance)" FROM pgr_ksp( 'SELECT id, source, target, distance AS cost FROM castle_routes', (SELECT id FROM castles WHERE name = 'Tower of London'), (SELECT id FROM castles WHERE name = 'Stirling Castle'), 5, directed := TRUE ) as x LEFT JOIN castle_routes AS r ON x.edge = r.id LEFT JOIN castles AS s ON r.source = s.id LEFT JOIN castles AS t ON r.target = t.id ORDER BY x.path_id, x.path_seq;
pgr_ksp function, we return a
path_id that groups the rows containing the same route together. The
path_seq column gives us the sequence of the route (i.e., 1, 2, …). For the
route column, we’ll show the names of the origin and destination of each route segment with “Total Trip” as the last row of each route. The last column is
total_cost, which gives us the total aggregated distance (
agg_cost) of a route. We probably don’t want the distance for each segment of a route, so those have been taken out and only the aggregated distance is shown next to the “Total Trip.”
pgr_ksp takes as the first argument the same SQL query string to our
castle_routes table. Also, we select the origin and destination of our castles in the next two arguments. The number of routes that you want is the next argument. We’ve chosen five, but you can experiment with as many as you’d like. Finally, the last argument we added is
directed := TRUE, which means that the route goes from the Tower of London to Stirling Castle. If this is set to FALSE, then we’ll still go to Stirling Castle, but Stirling Castle might not be the end of the route.
Executing this query, we’ll get five of routes with the total distance of each.
We’ve introduced some of the basics of pgRouting in this article: nodes, edges, and creating paths between them with
pgr_ksp. Next time, we’ll use ESRI shapefiles to see what else pgRouting can do.
Enjoyed this article? Get started with Databases for PostgreSQL now.
Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.