Approximate processing of massive continuous quantile queries over high-speed data streams

Lin, Xuemin, Xu, Jian, Zhang, Qing, Lu, Hongjun, Yu, Jeffrey Xu, Zhou, Xiaofang and Yuan, Yidong (2006) Approximate processing of massive continuous quantile queries over high-speed data streams. Ieee Transactions On Knowledge And Data Engineering, 18 5: 683-698. doi:10.1109/TKDE.2006.73

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Lin, Xuemin
Xu, Jian
Zhang, Qing
Lu, Hongjun
Yu, Jeffrey Xu
Zhou, Xiaofang
Yuan, Yidong
Title Approximate processing of massive continuous quantile queries over high-speed data streams
Journal name Ieee Transactions On Knowledge And Data Engineering   Check publisher's open access policy
ISSN 1041-4347
Publication date 2006
Sub-type Article (original research)
DOI 10.1109/TKDE.2006.73
Volume 18
Issue 5
Start page 683
End page 698
Total pages 16
Editor X. Wu
Place of publication California, USA
Publisher IEEE Computer Society
Collection year 2006
Language eng
Subject C1
280103 Information Storage, Retrieval and Management
700103 Information processing services
Abstract Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Keyword Query Processing
Online Computation
Data Mining
Computer Science, Artificial Intelligence
Computer Science, Information Systems
Engineering, Electrical & Electronic
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 5 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 8 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 15 Aug 2007, 08:26:21 EST