This post describes my work setting up a mapping pipeline. I was initially inspired by Eric Theise’s presentation on creating a Geostak. However, the tutorial was from a few years ago and I had trouble setting up some of the applications on my Ubuntu 18.04 machine. I used this as an opportunity to leverage docker containers, and specifically, the
docker-compose functionality to run and synchronize multiple docker containers.
This post covers everything from downloading free map data, creating and styling map layers, and rendering the map dynamically in a browser. I choose to create a map of Pittsburgh, but it will work with any location. Also, the open source ecosystem supporting map data processing and visualization is pretty large. I only use a subset of tools in this pipeline but there are likely alternatives at each step of the pipeline which could be used.
The core components of the pipeline are a database to store the map data, a program to process the map data into vector tiles, and a web server to serve the files.
For the database, we will be using the open source database PostgreSQL. PostgreSQL is then spatially enabled with PostGIS, a PostgreSQL extension that adheres to the OpenGIS Simple Features Specification for SQL. A typical database is optimized for processing numeric and character data types. The PostGIS extension enables additional functionality for storing and querying data that represents objects defined in geometric space.
For more information configuring PostgreSQL with PostGIS, refer to the overview of kartoza’s docker-postgis container, which is used in this pipeline.
For processing the mapping data, we will use MapBox’s TileMill. TileMill is a tool used to design maps for the web using custom data. TileMill can export files using the SQLite-based MBTiles file format. These maps are always projected to “Web Mercator” projection which can be displayed in the browser using tools like Leaflet.
This application has been deprecated in favor of MapBox Studio, but it can be run in a docker container and it has enough basic functionality for the purposes of this project.
Our primary source of data will be from the OpenStreetMap project. This is a long-running open map project built on user-contributed data.
Interline provides city and region sized extracts of the entire OpenStreetMap planet. These are updated on a daily basis. You need to sign up for a free account to get an API token in order to download data. In this example, I’m creating a map of Pittsburg, so I’m going to download the extract for Pittsburgh.
If you are interested in other datasets, LearnOSM has a guide to even more ways to get OSM data.
Running the Pipeline
Now that we have the data downloaded, we want to use our pipeline to process it. To get PostgreSQL/PostGIS and Tilemill up and running quickly, we will use Hans Meine’s tilemill_docker repository.
Clone the repository and navigate to the
osm-bright directory. Inside this directory is the
docker-compose file which defines and configures the containers that will be run. We can see that
kartoza/postgis docker image will be loaded as well as the
hansmeine/osm-bright docker image.
Also take note of the PostgreSQL environment variables. We’ll use these later on to extract data from the database into TileMill.
From the directory, run
docker-compose up to start the services.
You can run
docker container ls to verify that the containers are up and running.
Both of these containers are now talking to each other. You should be able to connect to http://localhost:20009 and see the Tilemill GUI.
Loading the Data into PostGIS
OSM data comes in two formats, XML and PBF (Protocolbuffer Binary Format). The download from interline will have the
.osm.pbf file extension.
First, we want to copy the data from our host machine into the tilemill docker container. We’ll use the
docker cp command to copy the OSM extract (
pittsburgh_pennsylvania.osm.pbf) to the root directory of the
docker cp ~/Documents/MapBox/pittsburgh_pennsylvania.osm.pbf tilemill:/pitt.osm.pbf
Next, we use
imposm to import the OpenStreetMap data into the PostgreSQL/PostGIS databases. We will use
docker exec to run a command in the
tilemill container to import the
pitt.osm.pbf data into the database named
gis. The database name, host, username, and password can all be referenced from the environment variables stored in the
docker-compose.yaml file discussed previously.
The full command is:
docker exec tilemill imposm -m /osm-bright/imposm-mapping.py --connection postgis://docker:[email protected]/gis --read --write --optimize --deploy-production-tables /pitt.osm.pbf
Imposm creates about a dozen tables for the most important features. Default tables are categorized as point (such as places) , polygon (such as water areas), or linestring (such as motorways) tables.
We can log into the
postgis container to examine the database. First, log into the container with an interactive session using the following command:
docker exec -tiu postgres postgis psql
Connect to the
gis database from the command line by typing:
\d command can then be run to describe the database. This displays a list of the different table names and types
It’s also possible to look at the schema of individual tables. To examine the
osm_mainroads table run
The output appears below.
You can view the mapping from OSM data types to imposm tables here.
Styling the Map Data
Now that the data is loaded into the PostGIS database, we will use TileMill to load and style the data. Once we are happy with the map content and styling, we will export the data.
First, we can connect to the TileMill container and load the GUI on our host machine by navigating to to http://localhost:20009 in the browser. Create a new project using the form.
Now we want to start loading some data. We will start with adding a layer for the
connection field, we will populate it using the PostGIS database information.
dbname=gis host=postgis user=docker password=docker
table field, we will just load the entire table,
unique key field will be
osm_id. When adding additional data, we will keep the
unique key field the same and just update the
Save the query and navigate to the
style.mss file in the side panel. Here we will enter some basic styling parameters to view the content we just loaded. Add the following code block to the end of the file and save.
Now navigate and zoom into the Pittsburgh region and the roads should be visible.
We can repeat this process, creating layers for the
osm_mainroads tables and using the following styling blocks:
The resulting image is shown below.
In it’s current form, the styling is fairly generic. All of the different subclasses of data are styled the same and don’t change based on the zoom level.
We can create a more specific set of styling commands that distinguishes between the different subclasses and changes the style based on different zoom levels.
Delete the code blocks added to the
style.mss file. Click on the
+ to create a new Carto stylesheet and name it
roads.mss. Copy the content from this file
roads.mss and save.
Looking at the stylesheet, we can see that the
mainroads data is broken down into the
tertiary types. Then, based on the zoom level, the
line-width and l
ine-color properties are set. This distinguishes the different types by color and also vary the thickness of the line by the zoom level to create a dynamic map.
The map with the new stylesheet should look similar to the image below:
We can follow a similar process to load and view water data. We create new layers for the
osm_waterways tables and style them using the
The updated map with the water features is shown below:
Lastly, we can visualize the
name field in each table to label the different features of the map. For this example, we will just label the road features. Since we want to style the labels differently from the features, we will create a new layer for the labels and style the labels specifically.
We’ll start by creating a new layer for each of the road data tables,
minorroads. This time, we will use a SQL query to only extract a subset of data from the table. In the
table field, use the following query to create a layer for the
motorways table for the label.
(SELECT name, osm_id, type, geometry
Use a similar query for the other road data tables. Next, update the
roads.mss style sheet with styling for these specific layers using this file as a reference.
Save the style and the updated map should appear as below.
Exporting the Map Data
Once all the map layers are loaded and styled, we can export the content into MBTiles. MBTiles is an open specification for storing map tiles in a SQLite database.
Using the export GUI, set the bounding box dimensions and zoom levels you want to extract and export the files.
The exported file, for example
pitt_from_osm.mbtiles is now available in the
tilemill docker container.
We can copy the file to our host machine using the
docker cp command.
docker cp tilemill:/root/Documents/MapBox/export/pitt_from_osm.mbtiles .
Extracting and Serving the Data
Any web server can serve tiles as individual image files, organized into
z/x/y subdirectories. MBUtil is a utility for importing and exporting the MBTiles format, typically created with Mapbox TileMill.
We’ll use mbutil to export our mbtiles file to a directory of files.
docker exec -tiu root tilemill bash
apt-get install python-setuptools
Once installed, run the primary mbutil command:
mb-util pitt_from_osm.mbtiles ./tiles/
Once the export process is finished, you can view the
tiles directory. It contains a folder for each of the zoom levels exported. Inside each directory contains
.png images of the map organized by their
Python has a built-in HTTP server that provides standard GET and HEAD request handlers. This doesn’t require any additional installations or configuration and is useful when you want to turn a directory into a quick web server.
python -m SimpleHTTPServer 8887
We specify port
8887 since the
tilemill container was configured with this port open.
Navigate to http://localhost:8887 and navigate the directory until you get to a single tile. The tile is displayed in the web browser as shown below.
Viewing the Data
Eric Theise has some HTML files already pre-built. Clone his
geostack-map-pages repository from GitHub.
We’ll make some minor modifications to the
- update the
setViewcommand with latitude and longitude values of the area you are mapping.
- update the
- update the
markercoordinates and popup text.
Once the updates are made, you can drag the file into your browser and view the map.