January 3, 2019 By Phil Alger 8 min read

Geo-routing tasks with PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL

If you need to plan your UK castle tour but don’t know how to start, let us show you how PostgreSQL, PostGIS, and pgRouting on IBM Cloud Databases for PostgreSQL can get you started on this and any other geo-routing tasks you may need your database to perform.

With IBM Cloud Databases (ICD) for PostgreSQL, you have the capability to run complex geospatial queries using PostGIS and pgRouting. In this article, we’ll show you how to get started using pgRouting by getting the shortest paths between castles in the United Kingdom (UK).

While PostGIS is an extension for PostgreSQL, pgRouting can be viewed as an extension for PostGIS. What pgRouting does is extend PostGIS’s functionality by providing you with several functions based on popular network algorithms to solve problems like the shortest path and the traveling salesperson problem.

In order to do that, however, pgRouting must have a valid topology. Take, for example, a road network. Each road segment is joined together at intersections, which we’ll call “nodes.” In order for pgRouting to work, each node must be connected to a road segment so that it can figure out how to get from one node to the next. pgRouting helps us do that using a few functions create network topologies and inspect their validity.

In addition to topology functions, pgRouting gives us various routing functions that we can use to find the shortest path between nodes. For this article, we’ll only look at two of them— pgr_dijkstra (or Dijkstra’s algorithm) and pgr_ksp (or Yen’s algorithm).

Before using these pgRouting functions, we’ll give you a short introduction to setting up a dataset so that it’s capable of using pgRouting because having only a dataset of geographic points is not enough. First, we’ll create a network of nodes using the locations of castles in the UK and connect them together. These connections are called “edges.” From there, we’ll let pgRouting do all the heavy lifting by analyzing our routing topology and computing the shortest routes from one node to the next.

To show you how this is done, let’s first work on a sample dataset of castles in the UK where we’ll create the nodes and edges manually. In the next article, we’ll move on to using an ESRI shapefile that has nodes and edges already established and how to make sure that it has a valid topology.

Getting set up

You’ll need an IBM Cloud Databases (ICD) for PostgreSQL database up and running first. Then, add the PostGIS and pgRouting extensions from the PostgreSQL shell – psql. To do that, we’ll create the PostGIS and pgRouting extensions together using the CASCADE parameter for CREATE EXTENSION that comes with PostgreSQL 9.6. It will automatically download all the extensions the pgRouting depends on:

CREATE EXTENSION pgrouting CASCADE; -- adds pgRouting and PostGIS 2.2.5
SELECT upgrade_postgis_23x(); -- upgrade PostGIS

Now that we have the extensions set up, let’s create a table to hold some data. For this exercise, our table will contain a list of the names and coordinates of 12 popular castles in the UK based on The Telegraph’s twelve best castles in the UK. So, the table will contain three columns named idname, and geomgeom is defined as a geometry data type containing a POINT object since each castle is a point on the map.

    name VARCHAR(75),

Next, we’ll insert the castles into the table. Notice that we use the PostGIS functions ST_SetSRID and ST_Point. We’ll have to use ST_Point so that PostGIS can process the coordinates. Then, we’ll set each coordinate to the Esri code WGS 84 or SRID 4326. For more information on Esri and SRIDs, see the article on spatial reference systems and databases.

insert into castles (name, geom)
('Edinburgh Castle', ST_SetSRID(ST_Point(-3.200833, 55.948611), 4326)),
('Dover Castle', ST_SetSRID(ST_Point(1.3214, 51.1297), 4326)),
('Tower of London', ST_SetSRID(ST_Point(-0.076111, 51.508056), 4326)),
('Warwick Castle', ST_SetSRID(ST_Point(-1.584722, 52.279167), 4326)),
('Caernarfon Castle', ST_SetSRID(ST_Point(-4.2769, 53.1393), 4326)),
('Stirling Castle', ST_SetSRID(ST_Point(-3.948889, 56.123889), 4326)),
('Carrickfergus Castle', ST_SetSRID(ST_Point(-5.806446, 54.713314), 4326)),
('Cardiff Castle', ST_SetSRID(St_Point(-3.1811, 51.4824), 4326)),
('York Castle', ST_SetSRID(ST_Point(-1.08, 53.9558), 4326)),
('Leeds Castle', ST_SetSRID(ST_Point(0.628889, 51.248056), 4326)),
('Lancaster Castle', ST_SetSRID(ST_Point(-2.80562, 54.04981), 4326)),
('Arundel Castle', ST_SetSRID(ST_Point(-0.553611, 50.856111), 4326));

The output of this table will look something like the following:

Map of UK castles

Connecting the castles

Now that we have castles and their locations, we can set up a table that will connect them all together called castle_routes.

    (LEAST(X.name || Y.name, Y.name || X.name)) 
        OVER(ORDER BY LEAST(X.name || Y.name, Y.name || X.name))::INTEGER AS id,
    X.name AS name_1, 
    Y.name AS name_2,
    ST_MakeLine(X.geom, Y.geom)::GEOMETRY(LineString,4326) AS geom,
    ROUND(ST_Distance(X.geom::GEOGRAPHY, Y.geom::GEOGRAPHY)/1000) AS distance,
    X.id AS source, 
    Y.id AS target
    castles X CROSS JOIN castles Y
    X.name <> Y.name;

What we’re doing above is creating a new table called castle_routes that contains edges that connect each castle to one another, but not to themselves. Using the PostgreSQL window function dense_rank, the idcolumn of our edges table will create a sequence of edges. We define the origin and destination names of each edge with X.name and Y.name. The geom column contains the line string we created to connect each castle one another. The line strings are also set to SRID 4326, the same as our castle coordinates, using the geometry typemod construct. The distance column contains the length of the edge, which is the distance between one castle and another in kilometers. To get the distance in kilometers, we’ll have to cast our geometry type data to geography. Why? Because the geography data type calculates distances in meters, unlike the geometry data type which uses degrees. Then, we’ll use the ids from the castles table to populate the source and target columns, which indicate the beginning and end nodes of each edge, respectively.

When this SQL query is executed, we’ll get a table containing 66 edges like the following (we’ve kept the geom column out for brevity):


Since we’re creating our own map that has routes connecting from one castle to the next, we’ll have to reconstruct the table of nodes (our castles) based on the edges table we’ve created (castle_routes). To do that, use the pgRouting function pgr_createVerticesTable using the castle_routes as the table name, the geometry column geom, and the source and target column names.

select pgr_createVerticesTable('castle_routes','geom','source','target');

The only name the function requires is the edges table name castle_routes. If you omit geom, the function will look for the_geom as the geometry column name; therefore, it must be defined. source and target also are default names that the function looks for, but for showcasing how the function works, they’ve been added.

The output of running this function will show the process of creating a vertices table with containing an id and the_geom column with the geometries of each castle location. The output will show that it has created 12 vertices from the 66 edges, which corresponds to the number of castles we have.

From this, a new table called castle_routes_vertices_pgr is created that looks like this:


Map of UK castles with vertices

That’s all that’s needed now for the general set up of the routes. Creating your own points and generating the edges is not really necessary if you’re using a shapefile or OpenStreetMap data. But, it’s always best to understand how routes are generated if you decide to generate your own routes from your maps.

Getting the shortest path

Now that we have the routes prepared, we can start finding out the optimal path to get from one castle to the next. To do that, pgRouting provides various functions that’ll calculate the distance and cost of each route, but we’ll focus on perhaps the most basic shortest path function — pgr_dijkstra.

To use the function to calculate the distance from the Tower of London to Stirling Castle, we’d write something like this:

    p.seq, p.node, p.cost, r.geom as edge_geom, c.name
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'), 
        (SELECT id FROM castles WHERE name = 'Stirling Castle'), 
    ) AS p
    LEFT JOIN castle_routes AS r ON p.edge = r.id
    LEFT JOIN castles AS c ON p.node = c.id

The first argument of the function takes a string which needs the idsourcetarget, and cost columns from the edge table. Since the castle_routes table doesn’t include a cost column, we’re assigning cost to the distance column. The second and third arguments are the source and target columns for the route. We can either hard code an id, or if we don’t remember the id of the castle, we can select the id of the castle we want using an SQL query like above with a WHERE filter. The fourth argument we provided is set to FALSE since we’re using an undirected graph. If you experiment with the function and set it to TRUE, the function will expect another column in castle_routes called reverse_cost that shows the distance in reverse, which we don’t need here. For maps with road networks or OpenStreetMap data, you’ll usually find the reverse cost added in the data they give you.

The output from this query will give us the node (castle ids), the cost (distance) of 573 kilometers, the castle names, and the edge geometry connecting both castles.

It’s well known that shortest path between two points is a straight line. Selecting the shortest path becomes more interesting, however, if we select other castles we’d like to visit in between the Tower of London and Stirling Castle. Let’s say that we want to determine what the cost of going to Stirling Castle and visiting other castles along the way. For that, we’d use the pgRouting function pgr_ksp. Unlike pgr_dijkstra, it will give you multiple paths to your destination that we can choose from.

Below is an example that will calculate five of the shortest paths between the Tower of London and Stirling Castle. You can substitute the number 5 in the SQL query with any number to get a different number of paths.

SELECT x.path_id, x.path_seq, COALESCE(s.name || ' - ' || t.name, 'Total Trip') as route,
    WHEN edge = -1 THEN agg_cost ELSE NULL END AS "total_cost(distance)"
        'SELECT id, source, target, distance AS cost FROM castle_routes',
        (SELECT id FROM castles WHERE name = 'Tower of London'),
        (SELECT id FROM castles WHERE name = 'Stirling Castle'),
        directed := TRUE
    ) as x
    LEFT JOIN castle_routes AS r ON x.edge = r.id
    LEFT JOIN castles AS s ON r.source = s.id
    LEFT JOIN castles AS t ON r.target = t.id
    x.path_id, x.path_seq;

Using the pgr_ksp function, we return a path_id that groups the rows containing the same route together. The path_seq column gives us the sequence of the route (i.e., 1, 2, …). For the route column, we’ll show the names of the origin and destination of each route segment with “Total Trip” as the last row of each route. The last column is total_cost, which gives us the total aggregated distance (agg_cost) of a route. We probably don’t want the distance for each segment of a route, so those have been taken out and only the aggregated distance is shown next to the “Total Trip.”

The function pgr_ksp takes as the first argument the same SQL query string to our castle_routes table. Also, we select the origin and destination of our castles in the next two arguments. The number of routes that you want is the next argument. We’ve chosen five, but you can experiment with as many as you’d like. Finally, the last argument we added is directed := TRUE, which means that the route goes from the Tower of London to Stirling Castle. If this is set to FALSE, then we’ll still go to Stirling Castle, but Stirling Castle might not be the end of the route.

Executing this query, we’ll get five of routes with the total distance of each.

Summing up

We’ve introduced some of the basics of pgRouting in this article: nodes, edges, and creating paths between them with pgr_dijkstra and pgr_ksp. Next time, we’ll use ESRI shapefiles to see what else pgRouting can do.

Learn more

Enjoyed this article? Get started with Databases for PostgreSQL now.

Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.

Was this article helpful?

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters