Web video event recognition by semantic analysis from ubiquitous documents

Yu, Litao, Yang, Yang, Huang, Zi, Wang, Peng, Song, Jingkuan and Shen, Heng Tao (2016) Web video event recognition by semantic analysis from ubiquitous documents. IEEE Transactions on Image Processing, 26 12: 5689-5701. doi:10.1109/TIP.2016.2614136

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Yu, Litao
Yang, Yang
Huang, Zi
Wang, Peng
Song, Jingkuan
Shen, Heng Tao
Title Web video event recognition by semantic analysis from ubiquitous documents
Journal name IEEE Transactions on Image Processing   Check publisher's open access policy
ISSN 1057-7149
Publication date 2016-12-01
Sub-type Article (original research)
DOI 10.1109/TIP.2016.2614136
Open Access Status Not yet assessed
Volume 26
Issue 12
Start page 5689
End page 5701
Total pages 13
Place of publication Piscataway, NJ, United States
Publisher Institute of Electrical and Electronics Engineers
Language eng
Subject 1712 Software
1704 Computer Graphics and Computer-Aided Design
Abstract In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyze video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of Web video event recognition, where Web videos often describe large-granular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from Web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous Web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model, which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video data sets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of Web video event recognition.
Keyword Video event recognition
Event knowledge base
Two-view adaptive regression
Q-Index Code C1
Q-Index Status Provisional Code
Grant ID 61572108
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 20 Nov 2016, 10:24:58 EST by System User on behalf of Learning and Research Services (UQ Library)