A fuzzy document clustering approach based on domain-specified ontology

Yue, Lin, Zuo, Wanli, Peng, Tao, Wang, Ying and Han, Xuming (2015) A fuzzy document clustering approach based on domain-specified ontology. Data and Knowledge Engineering, 100 Part A: 148-166. doi:10.1016/j.datak.2015.04.008

Author Yue, Lin
Zuo, Wanli
Peng, Tao
Wang, Ying
Han, Xuming
Title A fuzzy document clustering approach based on domain-specified ontology
Journal name Data and Knowledge Engineering   Check publisher's open access policy
ISSN 0169-023X
Publication date 2015-11-01
Year available 2015
Sub-type Critical review of research, literature review, critical commentary
DOI 10.1016/j.datak.2015.04.008
Open Access Status Not yet assessed
Volume 100
Issue Part A
Start page 148
End page 166
Total pages 19
Place of publication Amsterdam, Netherlands
Publisher Elsevier
Language eng
Formatted abstract
Document clustering techniques include automatic document organization, topic extraction, fast information retrieval or filtering, etc. Numerous methods have been developed for document clustering research. Despite the advances achieved, however, document clustering still presents certain challenges such as optimizing feature selection for low-dimensional document representation and incorporating mutual information between the documents into a clustering algorithm. This paper mainly focuses on these two questions. First, we construct a domain-specific ontology that provides the controlled vocabulary describing the hazards related to dairy products. Synonyms of the controlled vocabulary in document set are considered to be relatively prevalent and fundamentally important for feature selection. Second, in combination with the vector space model (VSM), we perform singular value decomposition (SVD) to translate all of the term-document vectors into a concept space. We then obtain the mutual information between documents by calculating the similarity of every two document vectors in the orthogonal matrix of right singular vectors. As the mutual information matrix is also a fuzzy compatible relation, a fuzzy equivalence can be derived by calculating max–min transitive closure. Finally, based on the fuzzy equivalence relation, all of the data sequences are easily allocated into clusters under the guidance of a cluster validation index. Our method both reduces the dimensionality of the original data and considers the correlation between the terms. The experimental results show that encoding the ontologies in the aggregation process could provide better clustering results. Moreover, the proposed work has been applied to food safety supervision which is beneficial for government and society.
Keyword Domain-specified ontology
Document clustering
Feature selection
Singular value decomposition (SVD)
Fuzzy equivalence relation
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Critical review of research, literature review, critical commentary
Collections: Official 2016 Collection
School of Information Technology and Electrical Engineering Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 7 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 10 Jan 2016, 10:25:17 EST by System User on behalf of Scholarly Communication and Digitisation Service