Efficient common items extraction from multiple sorted lists

Lu, Wei, Rong, Chuitian, Chen, Jinchuan, Du, Xiaoyong, Cheong Fung, Gabriel Pui and Zhou, Xiaofang (2010). Efficient common items extraction from multiple sorted lists. In: Advances in Web Technologies and Applications - Proceedings of the 12th Asia-Pacific Web Conference, APWeb 2010. 12th International Asia Pacific Web Conference, APWeb 2010, Busan, Korea, (219-225). 6 - 8 April 2010. doi:10.1109/APWeb.2010.16


Author Lu, Wei
Rong, Chuitian
Chen, Jinchuan
Du, Xiaoyong
Cheong Fung, Gabriel Pui
Zhou, Xiaofang
Title of paper Efficient common items extraction from multiple sorted lists
Conference name 12th International Asia Pacific Web Conference, APWeb 2010
Conference location Busan, Korea
Conference dates 6 - 8 April 2010
Proceedings title Advances in Web Technologies and Applications - Proceedings of the 12th Asia-Pacific Web Conference, APWeb 2010
Place of Publication Piscataway, NJ United States
Publisher I E E E
Publication Year 2010
Year available 2010
Sub-type Fully published paper
DOI 10.1109/APWeb.2010.16
Open Access Status
ISBN 9780769540122
9781424466009
Start page 219
End page 225
Total pages 7
Collection year 2011
Language eng
Abstract/Summary Given a set of lists, where items of each list are sorted by the ascending order of their values, the objective of this paper is to figure out the common items that appear in all of the lists efficiently. This problem is sometimes known as common items extraction from sorted lists. To solve this problem, one common approach is to scan all items of all lists sequentially in parallel until one of the lists is exhausted. However, we observe that if the overlap of items across all lists is not high, such sequential access approach can be significantly improved. In this paper, we propose two algorithms, MergeSkip and MergeESkip, to solve this problem by taking the idea of skipping as many items of lists as possible. As a result, a large number of comparisons among items can be saved, and hence the efficiency can be improved. We conduct extensive analysis of our proposed algorithms on one real dataset and two synthetic datasets with different data distributions. We report all our findings in this paper.
Subjects 1705 Computer Networks and Communications
1706 Computer Science Applications
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 27 Nov 2013, 12:32:19 EST by System User on behalf of School of Information Technol and Elec Engineering