Multiple Versions and Overlap in Digital Text

Desmond Schmidt (2009). Multiple Versions and Overlap in Digital Text PhD Thesis, School of ITEE, The University of Queensland.

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s37862437_phd_abstract.pdf Final Thesis Lodgement abstract application/pdf 22.81KB 0
s37862437_phd_totalthesis.pdf Final Thesis Lodgement application/pdf 3.91MB 10
Author Desmond Schmidt
Thesis Title Multiple Versions and Overlap in Digital Text
School, Centre or Institute School of ITEE
Institution The University of Queensland
Publication date 2009-03
Thesis type PhD Thesis
Supervisor Prof. Xiaofang Zhou
Prof. Robert Colomb
Total pages 165
Subjects 08 Information and Computing Sciences
Abstract/Summary This thesis is unusual in that it tries to solve a problem that exists between two widely separated disciplines: the humanities (and to some extent also linguistics) on the one hand and information science on the other. Chapter 1 explains why it is essential to strike a balance between study of the solution and problem domains. Chapter 2 surveys the various models of cultural heritage text, starting in the remote past, through the coming of the digital era to the present. It establishes why current models are outdated and need to be revised, and also what significance such a revision would have. Chapter 3 examines the history of markup in an attempt to trace how inadequacies of representation arose. It then examines two major problems in cultural heritage and lin- guistics digital texts: overlapping hierarchies and textual variation. It assesses previously proposed solutions to both problems and explains why they are all inadequate. It argues that overlapping hierarchies is a subset of the textual variation problem, and also why markup cannot be the solution to either problem. Chapter 4 develops a new data model for representing cultural heritage and linguistics texts, called a ‘variant graph’, which separates the natural overlapping structures from the content. It develops a simplified list-form of the graph that scales well as the number of versions increases. It also describes the main operations that need to be performed on the graph and explores their algorithmic complexities. Chapter 5 draws on research in bioinformatics and text processing to develop a greedy algorithm that aligns n versions with non-overlapping block transpositions in O(M N ) time in the worst case, where M is the size of the graph and N is the length of the new version being added or updated. It shows how this algorithm can be applied to texts in corpus linguistics and the humanities, and tests an implementation of the algorithm on a variety of real-world texts.
Keyword overlapping hierarchies
textual variation
multiple sequence alignment
textual criticism
Additional Notes 18,63,64,95,105,109,110,147

 
Citation counts: Google Scholar Search Google Scholar
Created: Tue, 05 Jan 2010, 20:58:59 EST by Dr Desmond Schmidt on behalf of Library - Information Access Service