Optimized cartesian k-means

Wang, Jianfeng, Wang, Jingdong, Song, Jingkuan, Xu, Xin-Shun, Shen, Heng Tao and Li, Shipeng (2014) Optimized cartesian k-means. IEEE Transactions on Knowledge and Data Engineering, 27 1: 180-192. doi:10.1109/TKDE.2014.2324592

Author Wang, Jianfeng
Wang, Jingdong
Song, Jingkuan
Xu, Xin-Shun
Shen, Heng Tao
Li, Shipeng
Title Optimized cartesian k-means
Journal name IEEE Transactions on Knowledge and Data Engineering   Check publisher's open access policy
ISSN 1041-4347
ISBN 978-3-319-10983-1
Publication date 2014-05-16
Year available 2014
Sub-type Article (original research)
DOI 10.1109/TKDE.2014.2324592
Open Access Status Not yet assessed
Volume 27
Issue 1
Start page 180
End page 192
Total pages 13
Place of publication Piscataway, NJ, United States
Publisher Institute of Electrical and Electronics Engineers
Language eng
Subject 1710 Information Systems
1706 Computer Science Applications
1703 Computational Theory and Mathematics
Abstract Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian K-Means (ock-means), to better encode the data points for more accurate approximate nearest neighbor search. In ock-means, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The highdimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.
Keyword Biology
Computer Science, Artificial Intelligence
Life Sciences & Biomedicine - Other Topics
Computer Science
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Published online ahead of print 16 May 2014

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2015 Collection
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 13 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 17 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 16 May 2014, 19:21:55 EST by Dr Heng Tao Shen on behalf of School of Information Technol and Elec Engineering