How few is too few? Determining the minimum acceptable number of LSA dimensions to visualise text cohesion with Lex

McKinnon, Caroline, Baazeem, Ibtehal and Angus, Daniel (2015). How few is too few? Determining the minimum acceptable number of LSA dimensions to visualise text cohesion with Lex. In: Ben Hachey and Kellie Webster, Australasian Language Technology Association Workshop 2015: Proceedings of the Workshop. Australasian Language Technology Association Workshop, Parramatta, NSW, Australia, (75-83). 8-9 December 2015.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
Author McKinnon, Caroline
Baazeem, Ibtehal
Angus, Daniel
Title of paper How few is too few? Determining the minimum acceptable number of LSA dimensions to visualise text cohesion with Lex
Conference name Australasian Language Technology Association Workshop
Conference location Parramatta, NSW, Australia
Conference dates 8-9 December 2015
Proceedings title Australasian Language Technology Association Workshop 2015: Proceedings of the Workshop
Publisher ALTA
Publication Year 2015
Sub-type Fully published paper
ISSN 1834-7037
Editor Ben Hachey
Kellie Webster
Volume 13
Start page 75
End page 83
Total pages 9
Collection year 2016
Language eng
Formatted Abstract/Summary
Building comprehensive language models using latent semantic analysis (LSA) requires substantial processing power. At the ideal parameters suggested in the literature (for an overview, see Bradford, 2008) it can take up to several hours, or even days, to complete. For linguistic researchers, this extensive processing time is inconvenient but tolerated— but when LSA is deployed in commercial software targeted at non-specialists, these processing times become untenable. One way to reduce processing time is to reduce the number of dimensions used to build the model. While the existing research has found that the model’s reliability starts to degrade as dimensions are reduced, the point at which reliability becomes unacceptably poor varies greatly depending on the application. Therefore, in this paper, we set out to determine the lowest number of LSA dimensions that can still produce an acceptably reliable language model for our particular application: Lex, a visual cohesion analysis tool. We found that, across all three texts that we analysed, the cohesion-relevant visual motifs created by Lex start to become apparent and consistent at 50 retained dimensions.
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ
Additional Notes http://www.alta.asn.au/events/alta2015/proceedings/

 
Versions
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Thu, 14 Jan 2016, 11:00:26 EST by Jennifer Yared on behalf of School of Communication and Arts