Random sets approach and its applications

Nikulin, Vladimir (2008). Random sets approach and its applications. In: I. Guyon, C. Aliferis, G. Cooper, A. Elissee, J.-P. Pellet, P. Spirtes and A. Statnikov, JMLR Workshop and Conference Proceedings Volume 3: Causation and Prediction Challenge (WCCI 2008). Workshop and Conference Proceedings of the Journal of Machine Learning Research, Whistler, Canada, (65-76). 12 December, 2008.


Author Nikulin, Vladimir
Title of paper Random sets approach and its applications
Conference name Workshop and Conference Proceedings of the Journal of Machine Learning Research
Conference location Whistler, Canada
Conference dates 12 December, 2008
Proceedings title JMLR Workshop and Conference Proceedings Volume 3: Causation and Prediction Challenge (WCCI 2008)   Check publisher's open access policy
Place of Publication U.S.
Publisher Journal of Machine Learning Research (Online)
Publication Year 2008
Sub-type Fully published paper
ISSN 1938-7228
Editor I. Guyon
C. Aliferis
G. Cooper
A. Elissee
J.-P. Pellet
P. Spirtes
A. Statnikov
Volume 3
Start page 65
End page 76
Total pages 12
Language eng
Abstract/Summary The random sets approach is heuristic in nature and has been inspired by the growing speed of computations. For example, we can consider a large number of classifiers where any single classifier is based on a relatively small subset of randomly selected features or random sets of features. Using cross-validation we can rank all random sets according to the selected criterion, and use this ranking for further feature selection. Another application of random sets was motivated by the huge imbalanced data, which represent significant problem because the corresponding classifier has a tendency to ignore patterns with smaller representation in the training set. Again, we propose to consider a large number of balanced training subsets where representatives from both patterns are selected randomly. The above models demonstrated competitive results in two data mining competitions.
Subjects 08 Information and Computing Sciences
280502 Data Storage Representations
Keyword Causal relationships
Random forest
Boosting
SVM
CLOP
Cross validation
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ

 
Versions
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Tue, 02 Feb 2010, 13:42:56 EST by Rosalind Blair on behalf of School of Mathematics & Physics