Missing value estimation for mixed-attribute data sets

Zhu, Xiaofeng, Zhang, Shichao, Jin, Zhi, Zhang, Zili and Xu, Zhuoming (2011) Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering, 23 1: 5487520-110-5487520-121. doi:10.1109/TKDE.2010.99

Author Zhu, Xiaofeng
Zhang, Shichao
Jin, Zhi
Zhang, Zili
Xu, Zhuoming
Title Missing value estimation for mixed-attribute data sets
Journal name IEEE Transactions on Knowledge and Data Engineering   Check publisher's open access policy
ISSN 1041-4347
Publication date 2011-01-01
Year available 2010
Sub-type Article (original research)
DOI 10.1109/TKDE.2010.99
Open Access Status Not yet assessed
Volume 23
Issue 1
Start page 5487520-110
End page 5487520-121
Total pages 12
Place of publication Piscataway, NJ, United States
Publisher I E E E
Language eng
Subject 1710 Information Systems
1706 Computer Science Applications
1703 Computational Theory and Mathematics
Abstract Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios. © Copyright 2011 IEEE – All Rights Reserved
Keyword Classification
Data mining
Machine learning
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Article number 5487520. Date of Publication: 17 June 2010.

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2011 Collection
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 95 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 127 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 12 Dec 2010, 10:07:10 EST