January 3, 2019 By Phil Alger 8 min read

Geo-routing tasks with PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL

If you need to plan your UK castle tour but don’t know how to start, let us show you how PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL can get you started on this and any other geo-routing tasks you may need your database to perform.

With IBM Cloud Databases (ICD) for PostgreSQL, you have the capability to run complex geospatial queries using PostGIS and pgRouting. In this article, we’ll show you how to get started using pgRouting by getting the shortest paths between castles in the United Kingdom (UK).

While PostGIS is an extension for PostgreSQL, pgRouting can be viewed as an extension for PostGIS. What pgRouting does is extend PostGIS’s functionality by providing you with several functions based on popular network algorithms to solve problems like the shortest path and the traveling salesperson problem.

In order to do that, however, pgRouting must have a valid topology. Take, for example, a road network. Each road segment is joined together at intersections, which we’ll call “nodes.” In order for pgRouting to work, each node must be connected to a road segment so that it can figure out how to get from one node to the next. pgRouting helps us do that using a few functions create network topologies and inspect their validity.

In addition to topology functions, pgRouting gives us various routing functions that we can use to find the shortest path between nodes. For this article, we’ll only look at two of them— pgr_dijkstra (or Dijkstra’s algorithm) and pgr_ksp (or Yen’s algorithm).

Before using these pgRouting functions, we’ll give you a short introduction to setting up a dataset so that it’s capable of using pgRouting because having only a dataset of geographic points is not enough. First, we’ll create a network of nodes using the locations of castles in the UK and connect them together. These connections are called “edges.” From there, we’ll let pgRouting do all the heavy lifting by analyzing our routing topology and computing the shortest routes from one node to the next.

To show you how this is done, let’s first work on a sample dataset of castles in the UK where we’ll create the nodes and edges manually. In the next article, we’ll move on to using an ESRI shapefile that has nodes and edges already established and how to make sure that it has a valid topology.

Getting set up

You’ll need an IBM Cloud Databases (ICD) for PostgreSQL database up and running first. Then, add the PostGIS and pgRouting extensions from the PostgreSQL shell – psql. To do that, we’ll create the PostGIS and pgRouting extensions together using the CASCADE parameter for CREATE EXTENSION that comes with PostgreSQL 9.6. It will automatically download all the extensions the pgRouting depends on:

CREATE EXTENSION pgrouting CASCADE; -- adds pgRouting and PostGIS 2.2.5
SELECT upgrade_postgis_23x(); -- upgrade PostGIS

Now that we have the extensions set up, let’s create a table to hold some data. For this exercise, our table will contain a list of the names and coordinates of 12 popular castles in the UK based on The Telegraph’s twelve best castles in the UK. So, the table will contain three columns named idname, and geomgeom is defined as a geometry data type containing a POINT object since each castle is a point on the map.

CREATE TABLE castles(
    id SERIAL PRIMARY KEY,
    name VARCHAR(75),
    geom GEOMETRY(POINT)
);

Next, we’ll insert the castles into the table. Notice that we use the PostGIS functions ST_SetSRID and ST_Point. We’ll have to use ST_Point so that PostGIS can process the coordinates. Then, we’ll set each coordinate to the Esri code WGS 84 or SRID 4326. For more information on Esri and SRIDs, see the article on spatial reference systems and databases.

insert into castles (name, geom)
values
('Edinburgh Castle', ST_SetSRID(ST_Point(-3.200833, 55.948611), 4326)),
('Dover Castle', ST_SetSRID(ST_Point(1.3214, 51.1297), 4326)),
('Tower of London', ST_SetSRID(ST_Point(-0.076111, 51.508056), 4326)),
('Warwick Castle', ST_SetSRID(ST_Point(-1.584722, 52.279167), 4326)),
('Caernarfon Castle', ST_SetSRID(ST_Point(-4.2769, 53.1393), 4326)),
('Stirling Castle', ST_SetSRID(ST_Point(-3.948889, 56.123889), 4326)),
('Carrickfergus Castle', ST_SetSRID(ST_Point(-5.806446, 54.713314), 4326)),
('Cardiff Castle', ST_SetSRID(St_Point(-3.1811, 51.4824), 4326)),
('York Castle', ST_SetSRID(ST_Point(-1.08, 53.9558), 4326)),
('Leeds Castle', ST_SetSRID(ST_Point(0.628889, 51.248056), 4326)),
('Lancaster Castle', ST_SetSRID(ST_Point(-2.80562, 54.04981), 4326)),
('Arundel Castle', ST_SetSRID(ST_Point(-0.553611, 50.856111), 4326));

The output of this table will look something like the following:

Map of UK castles

Connecting the castles

Now that we have castles and their locations, we can set up a table that will connect them all together called castle_routes.

SELECT DISTINCT ON 
    (LEAST(X.name || Y.name, Y.name || X.name)) 
    dense_rank()
        OVER(ORDER BY LEAST(X.name || Y.name, Y.name || X.name))::INTEGER AS id,
    X.name AS name_1, 
    Y.name AS name_2,
    ST_MakeLine(X.geom, Y.geom)::GEOMETRY(LineString,4326) AS geom,
    ROUND(ST_Distance(X.geom::GEOGRAPHY, Y.geom::GEOGRAPHY)/1000) AS distance,
    X.id AS source, 
    Y.id AS target
INTO 
    castle_routes
FROM 
    castles X CROSS JOIN castles Y
WHERE 
    X.name <> Y.name;

What we’re doing above is creating a new table called castle_routes that contains edges that connect each castle to one another, but not to themselves. Using the PostgreSQL window function dense_rank, the idcolumn of our edges table will create a sequence of edges. We define the origin and destination names of each edge with X.name and Y.name. The geom column contains the line string we created to connect each castle one another. The line strings are also set to SRID 4326, the same as our castle coordinates, using the geometry typemod construct. The distance column contains the length of the edge, which is the distance between one castle and another in kilometers. To get the distance in kilometers, we’ll have to cast our geometry type data to geography. Why? Because the geography data type calculates distances in meters, unlike the geometry data type which uses degrees. Then, we’ll use the ids from the castles table to populate the source and target columns, which indicate the beginning and end nodes of each edge, respectively.

When this SQL query is executed, we’ll get a table containing 66 edges like the following (we’ve kept the geom column out for brevity):

...

Since we’re creating our own map that has routes connecting from one castle to the next, we’ll have to reconstruct the table of nodes (our castles) based on the edges table we’ve created (castle_routes). To do that, use the pgRouting function pgr_createVerticesTable using the castle_routes as the table name, the geometry column geom, and the source and target column names.

select pgr_createVerticesTable('castle_routes','geom','source','target');

The only name the function requires is the edges table name castle_routes. If you omit geom, the function will look for the_geom as the geometry column name; therefore, it must be defined. source and target also are default names that the function looks for, but for showcasing how the function works, they’ve been added.

The output of running this function will show the process of creating a vertices table with containing an id and the_geom column with the geometries of each castle location. The output will show that it has created 12 vertices from the 66 edges, which corresponds to the number of castles we have.

From this, a new table called castle_routes_vertices_pgr is created that looks like this:

 

Map of UK castles with vertices

That’s all that’s needed now for the general set up of the routes. Creating your own points and generating the edges is not really necessary if you’re using a shapefile or OpenStreetMap data. But, it’s always best to understand how routes are generated if you decide to generate your own routes from your maps.

Getting the shortest path

Now that we have the routes prepared, we can start finding out the optimal path to get from one castle to the next. To do that, pgRouting provides various functions that’ll calculate the distance and cost of each route, but we’ll focus on perhaps the most basic shortest path function — pgr_dijkstra.

To use the function to calculate the distance from the Tower of London to Stirling Castle, we’d write something like this:

SELECT 
    p.seq, p.node, p.cost, r.geom as edge_geom, c.name
FROM 
    pgr_dijkstra(
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'), 
        (SELECT id FROM castles WHERE name = 'Stirling Castle'), 
        FALSE
    ) AS p
    LEFT JOIN castle_routes AS r ON p.edge = r.id
    LEFT JOIN castles AS c ON p.node = c.id
ORDER BY 
    p.seq;

The first argument of the function takes a string which needs the idsourcetarget, and cost columns from the edge table. Since the castle_routes table doesn’t include a cost column, we’re assigning cost to the distance column. The second and third arguments are the source and target columns for the route. We can either hard code an id, or if we don’t remember the id of the castle, we can select the id of the castle we want using an SQL query like above with a WHERE filter. The fourth argument we provided is set to FALSE since we’re using an undirected graph. If you experiment with the function and set it to TRUE, the function will expect another column in castle_routes called reverse_cost that shows the distance in reverse, which we don’t need here. For maps with road networks or OpenStreetMap data, you’ll usually find the reverse cost added in the data they give you.

The output from this query will give us the node (castle ids), the cost (distance) of 573 kilometers, the castle names, and the edge geometry connecting both castles.

It’s well known that shortest path between two points is a straight line. Selecting the shortest path becomes more interesting, however, if we select other castles we’d like to visit in between the Tower of London and Stirling Castle. Let’s say that we want to determine what the cost of going to Stirling Castle and visiting other castles along the way. For that, we’d use the pgRouting function pgr_ksp. Unlike pgr_dijkstra, it will give you multiple paths to your destination that we can choose from.

Below is an example that will calculate five of the shortest paths between the Tower of London and Stirling Castle. You can substitute the number 5 in the SQL query with any number to get a different number of paths.

SELECT x.path_id, x.path_seq, COALESCE(s.name || ' - ' || t.name, 'Total Trip') as route,
CASE 
    WHEN edge = -1 THEN agg_cost ELSE NULL END AS "total_cost(distance)"
FROM 
    pgr_ksp(
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'),
        (SELECT id FROM castles WHERE name = 'Stirling Castle'),
        5,
        directed := TRUE
    ) as x
    LEFT JOIN castle_routes AS r ON x.edge = r.id
    LEFT JOIN castles AS s ON r.source = s.id
    LEFT JOIN castles AS t ON r.target = t.id
ORDER BY 
    x.path_id, x.path_seq;

Using the pgr_ksp function, we return a path_id that groups the rows containing the same route together. The path_seq column gives us the sequence of the route (i.e., 1, 2, …). For the route column, we’ll show the names of the origin and destination of each route segment with “Total Trip” as the last row of each route. The last column is total_cost, which gives us the total aggregated distance (agg_cost) of a route. We probably don’t want the distance for each segment of a route, so those have been taken out and only the aggregated distance is shown next to the “Total Trip.”

The function pgr_ksp takes as the first argument the same SQL query string to our castle_routes table. Also, we select the origin and destination of our castles in the next two arguments. The number of routes that you want is the next argument. We’ve chosen five, but you can experiment with as many as you’d like. Finally, the last argument we added is directed := TRUE, which means that the route goes from the Tower of London to Stirling Castle. If this is set to FALSE, then we’ll still go to Stirling Castle, but Stirling Castle might not be the end of the route.

Executing this query, we’ll get five of routes with the total distance of each.

Summing up

We’ve introduced some of the basics of pgRouting in this article: nodes, edges, and creating paths between them with pgr_dijkstra and pgr_ksp. Next time, we’ll use ESRI shapefiles to see what else pgRouting can do.

Learn more

Enjoyed this article? Get started with Databases for PostgreSQL now.

Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.

Was this article helpful?
YesNo

More from Cloud

Hyperscale vs. colocation: Go big or go rent?

9 min read - Here’s the situation: You’re the CIO or similarly empowered representative of an organization. Different voices within your business are calling attention to the awesome scalability and power of hyperscale computing, which you’ve also noticed with increasing interest. Now the word comes down from on high that you’ve been tasked with designing and implementing your company’s hyperscale computing solution—whatever that should be. Your organization already has an ambitious agenda in mind for whatever IT infrastructure you wind up choosing. The company…

IBM Tech Now: March 25, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 95 On this episode, we're covering the IBM X-Force Threat Intelligence Index 2024: IBM X-Force Cyber Range Combating deepfakes Stay plugged in You can check out the IBM Blog Announcements for a full rundown…

Types of 5G: Which one is right for your organization?

7 min read - 5G technology isn’t a one-size-fits-all solution that can enable digital transformation at the touch of a button. There are three kinds of 5G, each with its own specific use cases and capabilities, that business leaders need to understand. 5G wireless is broken down into three types—low, mid and high band—named for the spectrum of radio frequencies they support. Low-band 5G transmits data on frequencies between 600 and 900 MHz Mid-band 5G transmits between 1 and 6 GHz High-band 5G transmits…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters