Learning from data streams with only positive and unlabeled data

Qin, Xiangju, Zhang, Yang, Li, Chen and Li, Xue (2013) Learning from data streams with only positive and unlabeled data. Journal of Intelligent Information Systems, 40 3: 405-430. doi:10.1007/s10844-012-0231-6

Author Qin, Xiangju
Zhang, Yang
Li, Chen
Li, Xue
Title Learning from data streams with only positive and unlabeled data
Journal name Journal of Intelligent Information Systems   Check publisher's open access policy
ISSN 0925-9902
Publication date 2013-06-01
Year available 2013
Sub-type Article (original research)
DOI 10.1007/s10844-012-0231-6
Open Access Status Not Open Access
Volume 40
Issue 3
Start page 405
End page 430
Total pages 26
Place of publication New York, United States
Publisher Springer
Language eng
Abstract Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.’s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
Keyword Positive and unlabeled learning
Data stream classification
Incremental learning
Functional leaves
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2014 Collection
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 6 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 7 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 23 Jun 2013, 10:22:25 EST by System User on behalf of School of Information Technol and Elec Engineering