Ensemble approach for the classification of imbalanced data

Nikulin, Vladimir, McLachlan, Geoffrey J. and Ng, Shu Kay (2009). Ensemble approach for the classification of imbalanced data. In: Ann Nicholson, Xiaodong Li, Randy Goebel, Jörg Siekmann and Wolfgang Wahlster, Lecture Notes in Computer Science. AI 2009: Advances in Artificial Intelligence. 22nd Australasian Joint Conference. Proceedings. AI 2009: Advances in Artificial Intelligence, Melbourne, VIC, Australia, (291-300). 1-4 December 2009. doi:10.1007/978-3-642-10439-8


Author Nikulin, Vladimir
McLachlan, Geoffrey J.
Ng, Shu Kay
Title of paper Ensemble approach for the classification of imbalanced data
Conference name AI 2009: Advances in Artificial Intelligence
Conference location Melbourne, VIC, Australia
Conference dates 1-4 December 2009
Convener Faculty of Information Technology, Monash University
Proceedings title Lecture Notes in Computer Science. AI 2009: Advances in Artificial Intelligence. 22nd Australasian Joint Conference. Proceedings   Check publisher's open access policy
Journal name Ai 2009: Advances in Artificial Intelligence, Proceedings   Check publisher's open access policy
Place of Publication Berlin, Germany
Publisher Springer
Publication Year 2009
Sub-type Fully published paper
DOI 10.1007/978-3-642-10439-8
ISBN 9783642104381
364210438X
ISSN 0302-9743
Editor Ann Nicholson
Xiaodong Li
Randy Goebel
Jörg Siekmann
Wolfgang Wahlster
Volume 5866
Start page 291
End page 300
Total pages 10
Collection year 2010
Language eng
Formatted Abstract/Summary
Ensembles are often capable of greater prediction accuracy than any of their individual members. As a consequence of the diversity between individual base-learners, an ensemble will not suffer from overfitting. On the other hand, in many cases we are dealing with imbalanced data and a classifier which was built using all data has tendency to ignore minority class. As a solution to the problem, we propose to consider a large number of relatively small and balanced subsets where representatives from the larger pattern are to be selected randomly. As an outcome, the system produces the matrix of linear regression coefficients whose rows represent random subsets and columns represent features. Based on the above matrix we make an assessment of how stable the influence of the particular features is. It is proposed to keep in the model only features with stable influence. The final model represents an average of the base-learners, which are not necessarily a linear regression. Test results against datasets of the PAKDD-2007 data-mining competition are presented.
© Springer-Verlag Berlin Heidelberg 2009
Subjects E1
970101 Expanding Knowledge in the Mathematical Sciences
010401 Applied Statistics
Keyword Ensemble classifier
Gradient-based optimisation
Boosting
Random forest
Decision trees
Q-Index Code E1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Presented during the session "A2: Data Mining & Statistical Learning IV". Published under "Data Mining and Statistical Learning".

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 2 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 06 Apr 2010, 16:42:54 EST by Kay Mackie on behalf of School of Mathematics & Physics