Progressive diversification for column-based data exploration platforms

Khan, Hina A. and Sharaf, Mohamed A. (2015). Progressive diversification for column-based data exploration platforms. In: 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015. IEEE International Conference on Data Engineering, Seoul, South Korea, (327-338). 13-17 April 2015. doi:10.1109/ICDE.2015.7113295


Author Khan, Hina A.
Sharaf, Mohamed A.
Title of paper Progressive diversification for column-based data exploration platforms
Conference name IEEE International Conference on Data Engineering
Conference location Seoul, South Korea
Conference dates 13-17 April 2015
Convener IEEE
Proceedings title 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015   Check publisher's open access policy
Journal name Proceedings - International Conference on Data Engineering   Check publisher's open access policy
Place of Publication Piscataway, NJ, United States
Publisher IEEE
Publication Year 2015
Sub-type Fully published paper
DOI 10.1109/ICDE.2015.7113295
ISBN 9781479979639
9781479979646
ISSN 1084-4627
Volume 2015-May
Start page 327
End page 338
Total pages 12
Collection year 2016
Abstract/Summary In Data Exploration platforms, diversification has become an essential method for extracting representative data, which provide users with a concise and meaningful view of the results to their queries. However, the benefits of diversification are achieved at the expense of an additional cost for the post-processing of query results. For high dimensional large result sets, the cost of diversification is further escalated due to massive distance computations required to evaluate the similarity between results. To address that challenge, in this paper we propose the Progressive Data Diversification (pDiverse) scheme. The main idea underlying pDiverse is to utilize partial distance computation to reduce the amount of processed data. Our extensive experimental results on both synthetic and real data sets show that our proposed scheme outperforms existing diversification methods in terms of both I/O and CPU costs.
Subjects 1710 Information Systems
1711 Signal Processing
1712 Software
Q-Index Code E1
Q-Index Status Confirmed Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 15 Sep 2015, 03:24:40 EST by System User on behalf of Examinations