One-class classification of text streams with concept drift

Zhang, Y., Li, X. and Orlowska, M. E. (2008). One-class classification of text streams with concept drift. In: F. Bonchi, B. Berendt, F. Giannotti, D. Gunupulos, F. Turini, C. Zaniolo, N. Ramakrishnan and X. Wu, IEEE International Conference on Data Mining Workshops, 2008. ICDMW '08.. IEEE International Conference on Data Mining Workshops 2008 (ICDMW '08), Pisa, Italy, (116-125). 15-19 December 2008. doi:10.1109/ICDMW.2008.54

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Zhang, Y.
Li, X.
Orlowska, M. E.
Title of paper One-class classification of text streams with concept drift
Conference name IEEE International Conference on Data Mining Workshops 2008 (ICDMW '08)
Conference location Pisa, Italy
Conference dates 15-19 December 2008
Proceedings title IEEE International Conference on Data Mining Workshops, 2008. ICDMW '08.
Journal name Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008
Place of Publication Piscataway, NJ, U.S.A.
Publisher IEEE
Publication Year 2008
Sub-type Fully published paper
DOI 10.1109/ICDMW.2008.54
ISBN 978-0-7695-3503-6
Editor F. Bonchi
B. Berendt
F. Giannotti
D. Gunupulos
F. Turini
C. Zaniolo
N. Ramakrishnan
X. Wu
Start page 116
End page 125
Total pages 10
Collection year 2009
Language eng
Abstract/Summary Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later - this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.
Subjects E1
890301 Electronic Information Storage and Retrieval Services
080201 Analysis of Algorithms and Complexity
Keyword Text analysis
Q-Index Code E1
Q-Index Status Confirmed Code

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 26 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 17 Apr 2009, 09:17:00 EST by Ms Kimberley Nunes on behalf of School of Information Technol and Elec Engineering