Determiners and number in English contrasted with Japanese, as exemplified in machine translation

Bond, Francis, 1967- (2001). Determiners and number in English contrasted with Japanese, as exemplified in machine translation PhD Thesis, Centre for Language Teaching and Research, The University of Queensland.

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
THE16257.pdf Full text application/pdf 11.00MB 3
Author Bond, Francis, 1967-
Thesis Title Determiners and number in English contrasted with Japanese, as exemplified in machine translation
School, Centre or Institute Centre for Language Teaching and Research
Institution The University of Queensland
Publication date 2001
Thesis type PhD Thesis
Supervisor Roland Sussex
Rodney Huddleston
Total pages 217
Collection year 2001
Language eng
Subjects L
380202 Computational Linguistics
751005 Communication across languages and cultures
Formatted abstract
The fact that concepts are grammaticalized differently in different languages is a major problem for translation, especially for machine translation. Two major examples of this are syntactic number, and the use of (in)definite articles (a, some, the). In languages such as English, nouns are marked for number and the choice of article (or of no article) must be made for every noun phrase. In contrast, for languages such as Japanese, number distinctions are not normally made, and there are no articles. This means that whenever a noun phrase is translated from Japanese to English, even if the denotation is perfectly understood and a good translation equivalent found, generating the noun phrase still requires two difficult choices: should the head noun be singular or plural, and which article, if any, should be generated.

This thesis proposes a semantic representation and a series of three heuristic algorithms that make possible the appropriate generation of articles and number when translating from Japanese to English. The semantic representation provides a tractable set of features to represent (1) the referential use of a noun phrase, as either referential, generic, ascriptive or idiomatic; (2) the interpretation of the noun phrase's referent as either a countable individual or a mass, with seven detailed subtypes; (3) the definiteness of the noun phrase, as either definite, indefinite, definite and extensive, or possessed. The three algorithms automatically acquire values for these features from the analysis of the Japanese text and the lexical properties of the English translation equivalents, and then use them to generate English. The first algorithm determines the referential use of Japanese noun phrases, based on a defeasible hierarchy of pragmatic rules that are applied top-down, from the clause to the noun phrase. The second algorithm determines the appropriate interpretation for English noun phrases, while the third determines which determiner, if any, should be generated. These algorithms use rules based on the different referential uses of the noun phrase.

The proposed algorithms are implemented in a Japanese-to-English machine translation system, and the detailed lexical information is entered into its lexicon. The use of the algorithms improves the percentage of noun phrases generated with correct use of articles and number from 65% to 85%.
Keyword Japanese language -- Translating into English.

Document type: Thesis
Collection: UQ Theses (RHD) - UQ staff and students only
 
Citation counts: Google Scholar Search Google Scholar
Created: Fri, 24 Aug 2007, 17:45:06 EST