Importing OneStreetMap data into IBM Cloud Databases for PostgreSQL
GeoFile is a series dedicated to looking at geographical data, its features, and uses. In today’s article, we’re going to introduce you to OpenStreetMap data, import that data into an IBM Cloud Databases for PostgreSQL deployment, and make queries on some of the non-conventional data stored in hstore columns.
OpenStreetMap (OSM) is one of the most recognized and popular community-driven, open data sources for maps worldwide. If you’re going to do anything that happens to use maps, more than likely, you’ll run into it and use it or know someone who does. Most web and mobile applications use it because of its reliability since it’s continuously updated by a vast number of contributors.
This article comprises part one of a two-part series on OSM data and IBM Cloud Databases (ICD) for PostgreSQL. In this post, we’ll be taking a look at how to import OSM data into an ICD for PostgreSQL database. We’ll also show you some of the OSM data features and how to query city amenities like restaurants. As we query restaurants, we’ll find the type of cuisine they serve, which is information that is only found in the hstore column.
Let’s get some OSM.
Retrieving OSM Data
For our examples, we’ll be using OSM data for Seattle, WA. On OSM’s website, you can search for Seattle in the search bar at the top left of the screen. Once Seattle’s on the screen, we’ll export the map.
Since OSM provides us with a map of more than just the city of Seattle, we’ll want to zoom in on Seattle. We’ll zoom in because OSM will export what’s shown on the screen, not just Seattle. Therefore, let’s zoom in a little to get just the city in our browser window.
Once we’re happy with what we see on the screen, we can export the map by pressing the Export button at the top of the window. It will give you several options to export data.
If we click on the blue Export button, we’ll receive an error in our browser. This error happens when we are trying to download an area that contains too many nodes (> 50000). By running CURL in a terminal to reproduce the error, we’ll see the following message appear:
If we were to zoom in on a specific area of Seattle, we could successfully export a portion of the city. However, since we’re interested in exporting a map of the entire city, go ahead and press the Overpass APIlink, which gives us more nodes and will automatically start downloading a file called map that includes the entire city.
Adding OSM data to PostgreSQL
You will need to add .osm to the map file in order to import it to PostgreSQL. To import OSM data, we’ll use the command line tool osm2pgsql.
osm2pgsql is a command line tool that transforms OSM data into a format that can be saved into PostgreSQL. If you’re using MacOS, you can download it using Homebrew
brew install osm2pgsql. For other operating systems, osm2pgsql provides installation instructions on their Github repository.
Once osm2pgsql has been installed, we need to create a database. You can create a database via the terminal after logging into your ICD for PostgreSQL deployment. We’ll use the terminal which requires only the command line string provided in your deployment’s Connection info panel. Once we’re logged in, create a database called
osm by typing:
After that, connect to the database using
\c osm; and then install the PostGIS and HStore extensions using:
We’ll need PostGIS installed to successfully import OSM data since it uses geometry data that PostGIS requires. If you don’t install the PostGIS extension and try to import data, you’ll receive an error. We’ll also install the hstore extension, which will be used to store and retrieve non-standardized data that doesn’t fit into a column. This data is inserted as key value pairs in a column named
Once the database and extensions have been set up, we can use osm2pgsql in our terminal like this:
Start out with the
osm2pgsql command and add
-U with your database username (usually
-W, indicating that you need a password prompt. Next, add
-d and your database name (here osm). The
-Hoption is the deployment’s hostname and
-P is the port number. We’ve added the
--hstore to create a tagscolumns for each table that contains the supplemental non-standardized data and the
--hstore-add-index option sets up indexes on those columns. Finally, we add the
map.osm file that we downloaded when we exported the Seattle map.
After running the command, enter the deployment password,and we’ll see our data processed in the terminal and imported to the osm PostgreSQL database, creating tables and indexes for primary key and tagscolumns. After the data has been processed and imported, log into the PostgreSQL deployment and connect to the database.
Now, let’s look at some of the tables that have been created. When listing the tables using the
\d command, we should see a list of tables like this:
The tables that we are mostly concerned about are those that begin with _planet_osm_*_. These are the tables that we’ll use when setting up our queries. They contain the necessary geometry data that is used to view the map on a GIS client. Here’s a breakdown of what each of these tables contains:
planet_osm_polygon contains all the polygon and multipolygon data.
planet_osm_point contains points of interest, business names, and tags which contain additional information stored as key value data.
planet_osm_line contains all pathways.
planet_osm_roads contains a subset of planet_osm_line for rendering at low-level zooms.
Now that our data has been imported and indexes have been created, let’s see what it looks like and make some queries.
Querying OSM Data
Loading our data from PostgreSQL into a GIS mapping program will provide us with a visual representation of our OSM data, which will look similar to this:
As we can see, the map is made up of the geodata that was exported by OSM that includes streets, structures, and location points. The abrupt cut off from the sides of the map is due to us only exporting the portion of the OSM map that we needed which was visible in the browser window.
Now that we have a map, let’s query some of the data to find all of the restaurants within our Seattle map. We’ll find various types of facilities, like restaurants, banks, and schools, in the amenity column of our tables. For these queries, we’ll be looking at the planet_osm_point table that will provide us with the exact location of restaurants and other facilities throughout the city. To get all the restaurants from the table, the SQL would look like the following:
This will give us a list of about 644 restaurants and the number of branches they have in the city. If we refine this query a little more, we can select only those restaurants that have three or more branches.
This refines our data and gives us five restaurants that have three branches:
The names of restaurants, however, are not very helpful since we don’t know what type of cuisine they serve. So how do we get this data? This is where our hstore data comes in handy. Non-standard information like “cuisine” is stored in the tags column we set up when importing our data to PostgreSQL. To get the type of cuisine for each of these restaurants, we could write the query as follows:
This query will retrieve the “cuisine” key for each restaurant, which will give us:
What’s noticeable is that two restaurants don’t have values in the cuisine column. This is perhaps one of the drawbacks of using OSM data since it can be inconsistent at times. However, community members can update the missing data and update their map as needed.
Since we’re interested in looking at cuisines, let’s look at the top 10 most popular cuisines in Seattle. To do that, we’d run the following SQL query, which will group the number of restaurants that serve the same type of cuisine:
Here. we’re selecting the restaurants that have a “cuisine” key and then grouping and counting all of those restaurants according to the cuisine they serve. This query produces the following results:
Overwhelmingly, Seattlites love their coffee, followed by sandwiches, Mexican food, and pizza. However, we can also see the strong Asian influence in Seattle with Vietnamese, Thai, Chinese, and Japanese food all very popular throughout the city. On the map, it shows us that the largest concentration of these cuisines is served downtown.
To make this data a little more interesting, we might want to see what coffee shops dominate Seattle. To view the top 10 coffee shops in the city, we’d run the following SQL query:
This will group the names of all the coffee shops that have “coffee_shop” as their cuisine. Then it will give us the total number of branches that they have in the city, and give us the top 10 results. The result will look like:
So, Starbucks wins as expected. However, it’s also surprising that there are at least 95 different coffee shops throughout the city.
OSM data is perhaps the most popular GIS data used by organizations all over the world. In this article, we showed you how to select data from OSM, export it, import it to your PostgreSQL deployment, query some of the data, and show what it looks like on a map. For the next installment of GeoFile, we’ll be working more with our Seattle dataset and use PostGIS functions to narrow down some of our queries as well as add and transform external datasets on our OSM data.
Enjoyed this article? Get started with Databases for PostgreSQL now.
Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.