Highlights • A framework for a data quality aware query system is proposed. • User preferences on data quality are used as the basis for query answering. • Advanced techniques to estimate data quality of query results have been developed and evaluated. • The framework is demonstrated through a prototype implementation.
The issue of data quality is increasingly important as individuals as well as corporations are relying on multiple, often external sources of data to make decisions. Traditional query systems do not factor in data quality considerations in their response. Further, studies into the diverse interpretations of data quality indicate that fitness for use is a fundamental criterion in the evaluation of data quality. In this paper we address the issue of data quality aware query systems by developing a query answering framework that considers user data quality preferences over a collaborative information systems architecture. Our work is motivated by an extensive study of data quality literature that revealed a lack of holistic solutions that encompass both business and technological aspects of data quality management. Accordingly the developed framework for data quality aware query systems takes an end-to-end view of the problem. In this paper we have focused on three major aspects relating to quality aware query systems, namely measuring data quality, modeling of user׳s data quality preferences, and answering the query in consideration of the defined preferences and measures. We then address each of these issues by introducing data quality profiling, data quality aware SQL, and data quality aware query answering methods. Contributions of this paper have been evaluated on real and simulated data. The individual components have also been assembled into a running prototype.