Quality assessment of OSM address points for selected administrative units in Poland
07-20, 12:40–12:50 (Poland), Room CA4

As part of the research carried out into selected aspects of database quality OSM database, an analysis of a collection of address points collected in the database of OpenStreetMap (OSM) database for selected administrative units of Poland. This presentation presents the obtained results of the analysis. Validation was performed using appropriately developed tool and based on comparison of address data with the reference database State Register of Borders (PRG). The analysed descriptive and spatial features of the address data in the OSM, such as the name of the of locality, street, building number and coordinates, were compared in a specific way with analogous metadata from the PRG database. The analysis using an appropriately designed tool aimed to improve the quality of address data in OSM through systematic validation and comparison with reference data, as well as to developing good practices in the context of working with address data. A number of interesting results and conclusions in this respect. Furthermore, on the basis of the conducted tests and analyses, it was confirmed, among other things, that the state of addresses in OpenStreetMap depends on the the involvement of the community, which takes care of the integrity of the points. Relevant examples, including the possession of very good quality data superior to the timeliness of the state equivalent. Needs and directions for further research were pointed out.


The purpose of the study (in reference to the title) was to conduct an analysis of the set of address
points collected in the OpenStreetMap (OSM) database for the territory of Poland. In order to carry
out this study, it was necessary to create a suitable tool performing validation based on a comparison
of address data with the reference database of the State Register of Borders (PRG), maintained by
the General Office of Geodesy and Cartography.
The tool developed as part of the work, in the form of the osm_matcha application, analyzes
descriptive and spatial features of address data, such as town name, street, building number and
coordinates, and then compares them with analogous metadata from the PRG database. The
osm_matcha application makes as many address connections as possible between databases. Using a
sophisticated mechanism of processing techniques and matching even records containing errors and
differences due to the different genesis of the data, acts of vandalism or mistakes of those responsible
for collecting the data. All this treating the analyzed collection as Big Data.
The implementation of the application uses several different solutions. The data processing engine
is the SQL language along with the PostgreSQL database management system with the PostGIS
extension. The Python language and Docker software were used to standardize the process and
facilitate implementation by users, as well as a language for communicating with the database.
Analysis using the tool was aimed at improving the quality of address data in OSM through
systematic validation and comparison with reference data, as well as developing good practices in the
context of working with address data. The tool has the potential to assist the OSM editing community
in the process of improving data and providing users with a tool for analyzing address quality. This
is an important step in providing the community with a dedicated analysis tool and improving
the OpenStreetMap database, as well as developing cooperation between Warsaw University of
Technology and OpenStreetMap Poland.

🧠As a trained surveyor, I have honed my skills in data analysis and interpretation, which I have leveraged in my transition towards becoming a Junior Data Scientist. Over the last two years, I have been exploring the realm of GIS, remote sensing, LiDAR, and data processing, which have become my primary areas of interest. My vision of the spatial data industry is that of a puzzle with open datasets, Python, SQL, and Machine Learning techniques being the pieces that need to fit together. In my opinion, the future of GIS lies in Big Data solutions and cloud computing, and I am fortunate to be developing my skills in this direction while working at CloudFerro.

🌎I am familiar with:
👉🏻 Python programming - Numpy, PyQt, ArcPy libraries, GeoPandas, SciPy
👉🏻 PostgreSQL, Postgis
👉🏻 QGIS and ArcGIS
👉🏻Remote Sensing, EO
👉🏻 Extensive processing and interpretation of spatial data

😸Privately I am interested in drawing, cats, dogs and alternative rock.

This speaker also appears in: