Getting Started with pgRouting on IBM Cloud Databases for PostgreSQL Using Esri Shapefiles
8 min read
pgRouting on IBM Cloud Databases for PostgreSQL Using Esri Shapefiles
We’re going to show you how to plan your travel around Seattle’s museums and sights, building on our introduction to pgRouting on IBM Cloud Databases for PostgreSQL.
Today, we are going to look at how to upload an Esri shapefile of Seattle streets, layer on a set of locations throughout Seattle, and then find the shortest paths to those points using the street network.
In the previous article, “Getting Started with pgRouting on IBM Cloud Databases for PostgreSQL,” we covered adding the extension to IBM Cloud Databases (ICD) for PostgreSQL, creating a dataset of nodes, connecting them together, and then finding the shortest path using the pgRouting routing functions
pgr_ksp. Now, we’ll build on that.
Importing an Esri shapefile
If you don’t already have PostGIS installed and you’re using macOS with homebrew, you can install it using the terminal with:
brew install postgis
Directions on how to install PostGIS on other platforms are found here.
We’ll also be using shapefiles to get the streets that we’ll be routing on and the tool we’ll be using is shp2pgsql, which comes with PostGIS out of the box. PostGIS has some good documentation on all its features for more advanced scenarios.
The shapefile we’ll be using is the City of Seattle street network. It’s a large ZIP file that includes all the streets within the Seattle area. Once you’ve downloaded it, open the ZIP file, and in it, you’ll have two folders named “StatePlain” and “WGS84.” We’ll use the shapefile included in the “WGS84” file. Why? Because it’s already set to SRID 4326, which is a frequently used SRID for geographic data.
Now, we’ll load the shapefile into PostgreSQL using the
shp2pgsql tool. The syntax we’ll use is the following:
shp2pgsql -I -s 4326 -S -t 2D Street_Network_Database.shp streets | \ PGPASSWORD=mypass PGSSLROOTCERT=path/to/cert/file \ psql --host 28f31285-eddd-4999-80c8-359909fde2cc.8q7bfd8f3faa4218aec56e069eb46187.databases.appdomain.cloud \ --port 99999 --dbname ibmclouddb --user ibm_cloud --set=sslmode=verify-full
Let’s go over the options we passed to
shp2pgsql. The two most common options that are frequently used are the
-s switches. The
-I switch automatically sets up the index for the geometry column, which is important for any geospatial data set, and the
-s switch sets the SRID to 4326. Our shapefile already is set to SRID 4326, but we’ll use the flag as a precaution.
The next two options focus on converting the geometry of the dataset. The
-S switch converts any multilinestrings to linestrings. pgRouting is known to have difficulties with multi-geometries and will only consider the first linestring in a multilinestring so it’s best to just do the conversion to avoid any issues that might arise. We’ll also use the
-t switch to set the dimensions of the streets to “2D,” or two dimensions. The reasoning behind this is that sometimes Z or M coordinates are added to indicate three-dimensional data for underpasses, intersections, etc. This can lead to data that’s not noded correctly leading to problems when creating a routing topology. Therefore, by setting the shapefile dimension to “2D”, we’re solving this problem.
The last parts of the command are the shapefile and an optional name of the table that will be created that’ll contain the shapefile data; we’ve called ours “streets.” That name is optional. By default,
shp2pgsql will create a table using the shapefile file name.
psql command, you’ll have to substitute it by entering your ICD PostgreSQL credentials. You can find them by clicking on the Service Credentials link in the left panel and then generating a new credential by clicking on the New Credential button. Once you’ve done that, decode the
certificate_base64 certificate and save that somewhere on your system. Then, within
composed, use that command to connect to PostgreSQL and substitute the location of your self-signed certificate with the default location of the
PGPASSWORD=mypass PGSSLROOTCERT=path/to/cert/file \ psql --host 28f31285-eddd-4999-80c8-359909fde2cc.8q7bfd8f3faa4218aec56e069eb46187.databases.appdomain.cloud \ --port 99999 --dbname ibmclouddb --user ibm_cloud --set=sslmode=verify-full
Once we’ve executed
shp2pgsql, it will take a couple minutes for the shapefile to be converted and imported into PostgreSQL (depending on your internet connection). When it’s finished, you’ll have a table named “streets” in your database.
Now, let’s start exploring the table and preparing the data for pgRouting.
Preparing shapefiles for pgRouting
We have a table called “streets” inside our database, so we’ll now need to set up the table to use it with pgRouting. In the last article, we mentioned that tables containing routes, called edges, need to have “source,” “target,” and “cost” columns (or a column that will hold a cost value) in order to know where nodes begin and end and a meaningful cost to each edge.
Let’s set up three columns: “source,” “target,” and “length” (for cost). We can do that with these SQL commands:
ALTER TABLE streets ADD COLUMN source INTEGER; ALTER TABLE streets ADD COLUMN target INTEGER; ALTER TABLE streets ADD COLUMN length FLOAT8;
Next, we’ll populate the “source” and “target” columns using
pgr_createTopology. In the last article, we populated these columns ourselves since we created the nodes and edges. With the imported street data, the edges and nodes have been created for us so we’ll let pgRouting figure out the source and target nodes by running:
This function takes the table name “streets,” a tolerance number, the name of the geometry column, and the name id field. We’re using the names
gid in the function because the default column names the function looks for are
the_geom. As for the tolerance number, it should be as low as possible—the lower the better. The tolerance should be lower than the distance between nodes; otherwise, the topology function will think that some nodes are connected and they’ll be snapped together and we don’t want that otherwise some streets will be connected at the wrong intersections.
Running this function, the “source” and “target” columns will be populated with the source and target nodes for each edge. Additionally, a table called “streets_vertices_pgr” will be created which includes the id and geometry of each node in the “streets” table. This table serves as a lookup for each node that pgRouting goes to when finding the locations of the nodes.
The next column in the “streets” table that needs to be populated is the “length” column. This serves as the cost of each edge. For the length, we’ll use the edge length between the source and target nodes using the
geom column from streets. To do that, we’ll let PostGIS determine the length of each street segment using
ST_Length and casting the
geom column to the geography data type to get the street segment length in meters:
UPDATE streets SET length = ST_Length(geom::geography);
Once, we’ve done that, all the lengths will be inserted into the database and we’ll be set up to do the routing.
What about locations to route to?
Technically, we don’t need point locations to make pgRouting work; all you have to know are the source and target nodes for the origin and destination edges. That means we’ll have to figure out what the source and target nodes are for each street using, for instance, some SQL to find the street name from the “streets” table or by using a geospatial tool like QGIS to find the source and target node ids through a UI.
Instead of trying to figure out the nodes of each street, let’s overlay some locations and use those as source and target nodes. How will we do that?
First, let’s download a CSV file with the locations from a community-driven project by the City of Seattle called My Neighborhood Map. The CSV file has a lot of different points of interest, some of which will not relevant for our use case, such as traffic cameras. Next, create another table called “places.”
CREATE TABLE places ( id SERIAL PRIMARY KEY, city_feature TEXT, common_name TEXT, address VARCHAR(256), website VARCHAR(256), longitude FLOAT8, latitude FLOAT8, location VARCHAR(256), place_id INTEGER, geom GEOMETRY(POINT,4326) );
Use the following command in the shell to copy the CSV file into the “places” table:
\COPY places (city_feature, common_name, address, website, longitude, latitude, location) FROM 'path_to_file/My_Neighborhood_Map.csv' CSV HEADER
Now we’ll populate the
geom column using PostGIS, set the SRID to 4326 like our streets, and add an index on the column:
UPDATE places SET geom = ST_SetSRID(ST_Point(longitude,latitude),4326); CREATE INDEX idx_places_geom ON places_t USING GIST(geom);
Now we’ll update the
place_id column by using the “streets_vertices_pgr” which stored the locations of each node from “streets.” The
place_id column is where we’ll snap the locations to the street nodes using the by associating each street node with a location. Using some SQL, we can search for street nodes that are within a very minimal distance from each location using the PostGIS function
ST_DWithin and specifying an optional distance parameter with a distance of
0.001 meters. This function will populate the
place_idcolumn with the ids of edges that are within 0.001 meters from a location.
UPDATE places SET place_id = v.id FROM streets_vertices_pgr v WHERE ST_DWithin(places.geom, v.the_geom, 0.001);
You’ll find that some of the places in the table do not have an id associated with them. That’s because they are not within the specified distance to a street node. In this case, we can delete them or just ignore them. To delete them execute:
DELETE FROM places WHERE place_id IS NULL;
Routing between locations
We have streets and locations all set up for routing, and now we’ll use pgRouting to get the shortest routes between locations. If you’ve read the previous article “Getting Started with pgRouting” then you’ll be familiar with how to set up the routes. We’ll be using the same pgRouting routing functions here:
Let’s say we want to go to the Museum of Flight and then we want to go to a concert at Benaroya Hall. Using
pgr_dijkstra, we’ll set up the query with:
SELECT d.seq, d.node, d.edge, d.cost, e.geom AS edge_geom FROM pgr_dijkstra( -- edges 'SELECT gid AS id, source, target, length AS cost FROM streets', -- source node (SELECT place_id FROM places WHERE common_name = 'Museum Of Flight'), -- target node (SELECT place_id FROM places WHERE common_name = 'Benaroya Hall' AND city_feature = 'General Attractions'), FALSE ) as d LEFT JOIN streets AS e ON d.edge = e.gid ORDER BY d.seq;
In this query, we’re selecting the source and target nodes from their
place_id using the
common_namecolumn to select them by name. Some names are repeated because they have two or more category names in
city_feature. Therefore, for this data set, you might have to specify the
city_feature, which we’ve done for Benaroya Hall; otherwise, you’ll get the following error when running the above query:
ERROR: more than one row returned by a subquery used as an expression
Executing this query in the shell will give us a long list of nodes with the edge geometries. However, to give this query a little more context, we’ll execute this query in QGIS using our “streets” and “place” tables giving us the route:
pgr_ksp, the set up will be similar to that of
pgr_dijkstra within the function, but now we’ll add an extra parameter to indicate how many routes we want to be returned to the destination. In the following example, we’ll determine the three shortest paths from the Pacific Science Center to the Museum of History and Industry. Set up the query like this:
SELECT k.seq, k.path_id, k.node, k.edge, k.cost, e.geom AS edge_geom FROM pgr_ksp( -- edges 'SELECT gid AS id, source, target, length AS cost FROM streets', -- source node (SELECT place_id FROM places WHERE common_name = 'Pacific Science Center' AND city_feature = 'Seattle Center'), -- target node (SELECT place_id FROM places WHERE common_name = 'Museum Of History And Industry'), -- # of routes 3, directed := TRUE ) as k LEFT JOIN streets AS e ON k.edge = e.gid ORDER BY k.seq;
Again, giving this some context, we’ve put the query in QGIS, and the route looks something like:
You’ll notice that the majority of all three routes overlap. We’ve labeled all three routes in the image above. Number one is the fastest route, number two take another street but merges into number one, and number three takes a small side street off of number one, which is barely visible, but then merges back to the same route as number one.
To get an idea about the distances each route covers, we’ll use
psql. If we modify the query a little to get only the locations that each route goes through and the number of kilometers they have to travel to get from the Pacific Science Center to the Museum of History and Industry, we’d substitute everything in the query except what’s within the
SELECT k.path_id AS path_id, p.common_name AS route, MAX(CASE WHEN k.edge = -1 THEN k.agg_cost/1000 ELSE NULL END) AS "distance/km" FROM pgr_ksp( ... ) as k INNER JOIN places AS p ON k.node = p.place_id GROUP BY k.path_id, p.common_name, k.seq ORDER BY k.seq;
We’ll be able to see each place that the route travels past and the total distance in kilometers:
In the two articles that have covered pgRouting, we’ve only scratched the surface of what’s available and how the extension can be used in your GIS applications. pgRouting is particularly interesting when using it with OpenStreetMap data, which we’ll cover in the near future. Nevertheless, we encourage you to explore more and do more with pgRouting on IBM Cloud Databases for PostgreSQL.
Enjoyed this article? Get started with Databases for PostgreSQL now.
Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.