Utilising OSM data in geospatial representation learning
07-19, 16:00–16:20 (Poland), Room CA3

In this talk, we will present multiple geospatial representation learning methods based on OpenStreetMap features. We will cover contextual embeddings, road network representations and word2vec-like semantic embeddings. Finally, we will present a library that aggregates those methods with additional data engineering capabilities.


Representation learning has been proven as a very capable approach in many domains of artificial intelligence like Natural Language Processing (NLP), Computer Vision (CV) or Network Science (NS). Representation learning of spatial and geographic data is a rapidly developing field that allows for similarity detection between areas and high-quality inference using deep neural networks. Existing approaches concentrate on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. We propose methods for learning vector representations of OpenStreetMap regions concerning urban functions, land use, POI location and road network with additional features. We also propose an approach to include public transport availability in the representation learning approach. We have designed our representation learning methods operating on various types of OpenStreetMap (OSM) data, one of the largest open databases with structured geospatial data. All our methods operate on the hexagonal grid from Uber's H3 spatial index to increase reproducibility. We presented our results at different workshops accompanying the SIGSPATIAL conference.

See also: slides (3.2 MB)