Projected outlier detection in high dimensional mixed-attributes data set

Ye, Mao, Li, Xue and Orlowska, Maria E. (2009) Projected outlier detection in high dimensional mixed-attributes data set. Expert Systems With Applications, 36 3: 7104-7113. doi:10.1016/j.eswa.2008.08.030


Author Ye, Mao
Li, Xue
Orlowska, Maria E.
Title Projected outlier detection in high dimensional mixed-attributes data set
Journal name Expert Systems With Applications   Check publisher's open access policy
ISSN 0957-4174
Publication date 2009-04
Year available 2008
Sub-type Article (original research)
DOI 10.1016/j.eswa.2008.08.030
Open Access Status
Volume 36
Issue 3
Start page 7104
End page 7113
Total pages 10
Publisher Elsevier
Language eng
Subject 080109 Pattern Recognition and Data Mining
890205 Information Processing Services (incl. Data Entry and Capture)
Abstract Detecting outlier efficiently is an active research issue in data mining, which has important applications in the field of fraud detection, network intrusion detection, monitoring criminal activities in electronic commerce, etc. Because of the sparsity of high dimensional data, it is reasonable and meaningful to detect the outliers in suitable projected subspaces. We call such subspace and outliers in the subspace as anomaly subspace and projected outlier respectively. Many efficient algorithms have already been proposed for outlier detection based on different approaches, but there are few literatures on projected outlier detection for high dimensional data sets with mixed continuous and categorical attributes. In this paper, a novel projected outlier detection algorithm is proposed to detect projected outliers in high-dimensional mixed attribute data set. Our main contributions are: (1) combined with information entropy, a novel measure of anomaly subspace is proposed. In this anomaly subspace, meaningful outliers could be detected and explained. Unlike the previous projected outlier detection methods, the dimension of anomaly subspace is not decided beforehand; (2) theoretical analysis about this measure is presented; (3) bottom-up method is proposed to find the interesting anomaly subspaces; (4) the outlying degree of projected outlier is defined, which has good explanations; (5) the data set with mixed data type is handled; (6) experiments on synthetic and real data sets to evaluate the effectiveness of our approach are performed.
Keyword Outlier detection
High-dimensional spaces
Data mining
Mixed-attribute data sets
Q-Index Code C1
Q-Index Status Provisional Code

Document type: Journal Article
Sub-type: Article (original research)
Collections: Excellence in Research Australia (ERA) - Collection
School of Information Technology and Electrical Engineering Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 10 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 19 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Mon, 01 Mar 2010, 14:35:59 EST by Dr Xue Li on behalf of School of Information Technol and Elec Engineering