Open City Data Pipeline

Bischof, Stefan and Kämpgen, Benedikt and Harth, Andreas and Polleres, Axel ORCID: and Schneider, Patrik (2017) Open City Data Pipeline. Working Papers on Information Systems, Information Business and Operations, 01/2017. Department für Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business, Vienna. ISSN 2518-6809


Download (847kB)


Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia. Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.

Item Type: Paper
Additional Information: The published article can be found under: Stefan Bischof, Andreas Harth, Benedikt Kämpgen, Axel Polleres, Patrik Schneider: Enriching integrated statistical open city data by combining equational knowledge and missing value imputation, Journal of Web Semantics, Volume 48, 2018, Pages 22-47, ISSN 1570-8268, To quote please follow the link:
Keywords: open data, data cleaning, data integration
Divisions: Departments > Informationsverarbeitung u Prozessmanag. > Informationswirtschaft
Depositing User: Stefan Bischof
Date Deposited: 08 Mar 2017 13:14
Last Modified: 04 Aug 2020 04:29


View Item View Item


Downloads per month over past year

View more statistics