A service provided by the WU Library and the WU IT-Services

Open City Data Pipeline

Bischof, Stefan and Kämpgen, Benedikt and Harth, Andreas and Polleres, Axel and Schneider, Patrik (2017) Open City Data Pipeline. Working Papers on Information Systems, Information Business and Operations, 01/2017. Department für Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business, Vienna. ISSN 2518-6809

[img]
Preview
PDF
Download (828Kb) | Preview

Abstract

Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia. Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.

Item Type: Paper
Keywords: open data, data cleaning, data integration
Divisions: Departments > Informationsverarbeitung u Prozessmanag. > Informationswirtschaft
Depositing User: Stefan Bischof
Date Deposited: 08 Mar 2017 13:14
Last Modified: 30 Mar 2017 14:39
URI: http://epub.wu.ac.at/id/eprint/5438

Actions

View Item