Charles Explorer logo
🇬🇧

Linked open data aggregation: Conflict resolution and aggregate quality

Publication at Faculty of Mathematics and Physics |
2012

Abstract

The paradigm of publishing governmental data is shifting from data trapped in relational databases, scanned images, or PDF files to open data, or even linked open data, bringing the information consumers (citizens, companies) unrestricted access to the data and enabling an agile information aggregation, which has up to now not been possible. Such information aggregation comes with inherent problems, such as provision of poor quality, inaccurate, irrelevant or fraudulent information.

As part of the OpenData.cz initiative, we are developing projects which will enable creation, maintenance, and usage of the data infrastructure formed by the Czech governmental linked open data. In particular, the project ODCleanStore will enable data consumers seamless automated data aggregation to simplify the manual aggregation process, which would have to be performed otherwise, and will also provide provenance tracking and justifications why the aggregated data should be trusted by the consumer in the given situation.

In this paper, we describe two crucial aspects of the data aggregation process in ODCleanStore - resolution of data conflicts and computation of aggregate quality helping consumers to decide whether the aggregated data are worth using. Since the data aggregation algorithm is executed during query time, we show that the proposed algorithm is fast enough to work in real-world settings.