Scalable video event retrieval by visual state binary embedding

Yu, Litao, Huang, Zi, Cao, Jiewei and Shen, Heng Tao (2016) Scalable video event retrieval by visual state binary embedding. IEEE Transactions on Multimedia, 18 8: 1590-1603. doi:10.1109/TMM.2016.2557059

Author Yu, Litao
Huang, Zi
Cao, Jiewei
Shen, Heng Tao
Title Scalable video event retrieval by visual state binary embedding
Journal name IEEE Transactions on Multimedia   Check publisher's open access policy
ISSN 1520-9210
Publication date 2016-08-01
Sub-type Article (original research)
DOI 10.1109/TMM.2016.2557059
Open Access Status Not yet assessed
Volume 18
Issue 8
Start page 1590
End page 1603
Total pages 14
Place of publication Piscataway, NJ, United States
Publisher Institute of Electrical and Electronics Engineers
Language eng
Subject 1711 Signal Processing
2214 Media Technology
1706 Computer Science Applications
2208 Electrical and Electronic Engineering
Abstract With the exponential increase of media data on the web, fast media retrieval is becoming a significant research topic in multimedia content analysis. Among the variety of techniques, learning binary embedding (hashing) functions is one of the most popular approaches that can achieve scalable information retrieval in large databases, and it is mainly used in the near-duplicate multimedia search. However, till now most hashing methods are specifically designed for near-duplicate retrieval at the visual level rather than the semantic level. In this paper, we propose a visual state binary embedding (VSBE) model to encode the video frames, which can preserve the essential semantic information in binary matrices, to facilitate fast video event retrieval in unconstrained cases. Compared with other video binary embedding models, one advantage of our proposed VSBE model is that it only needs a limited number of key frames from the training videos for hash function training, so the computational complexity is much lower in the training phase. At the same time, we apply the pairwise constraints generated from the visual states to sketch the local properties of the events at the semantic level, so accuracy is also ensured. We conducted extensive experiments on the challenging TRECVID MED dataset, and have proved the superiority of our proposed VSBE model.
Keyword Hashing
video event retrieval
visual state
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 3 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 14 Aug 2016, 10:21:46 EST by System User on behalf of Learning and Research Services (UQ Library)