On authorship attribution via Markov chains and sequence kernels

Sanderson, Conrad and Guenter, Simon (2006). On authorship attribution via Markov chains and sequence kernels. In: Pattern Recognition 2006 (ICPR 2006). 18th International Conference on Pattern Recognition 2006 (ICPR 2006), Hong Kong, China, (437-440). 20-24 August 2006. doi:10.1109/ICPR.2006.899


Author Sanderson, Conrad
Guenter, Simon
Title of paper On authorship attribution via Markov chains and sequence kernels
Conference name 18th International Conference on Pattern Recognition 2006 (ICPR 2006)
Conference location Hong Kong, China
Conference dates 20-24 August 2006
Proceedings title Pattern Recognition 2006 (ICPR 2006)   Check publisher's open access policy
Journal name 18th International Conference on Pattern Recognition, Vol 3, Proceedings   Check publisher's open access policy
Place of Publication Piscataway, NJ, U.S.A.
Publisher IEEE - Institute of Electrical and Electronic Engineers Inc.
Publication Year 2006
Year available 2006
Sub-type Fully published paper
DOI 10.1109/ICPR.2006.899
ISBN 0-7695-2521-0
ISSN 1051-4651
Volume 3
Start page 437
End page 440
Total pages 4
Language eng
Abstract/Summary We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.
Subjects 080107 Natural Language Processing
010401 Applied Statistics
080109 Pattern Recognition and Data Mining
Keyword Markov chains
Authorship attribution
Character sequence kernels
Q-Index Code E1

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 4 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 02 Apr 2009, 13:05:08 EST by Mary-Anne Marrington on behalf of School of Information Technol and Elec Engineering