An acoustic-to-articulatory transformation for vowel-like sounds and its applications in a speech training aid for hearing-impaired speakers

Chalker, Dean Andrew (1987). An acoustic-to-articulatory transformation for vowel-like sounds and its applications in a speech training aid for hearing-impaired speakers PhD Thesis, School of Computer Science and Electrical Engineering, The University of Queensland. doi:10.14264/uql.2015.188

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
THE6063.pdf Thesis full text application/pdf 12.64MB 1

Author Chalker, Dean Andrew
Thesis Title An acoustic-to-articulatory transformation for vowel-like sounds and its applications in a speech training aid for hearing-impaired speakers
School, Centre or Institute School of Computer Science and Electrical Engineering
Institution The University of Queensland
DOI 10.14264/uql.2015.188
Publication date 1987
Thesis type PhD Thesis
Supervisor Unknown
Total pages 386
Language eng
Subjects 020301 Acoustics and Acoustical Devices; Waves
Formatted abstract
The primary objective of this dissertation is to document the development of an acoustic-to-articulatory transformation for non-nasalized vowel-like speech samples. The implementation of this transformation relies on a database of 11385 synthetic speech samples, generated using a version of Mermelstein's articulatory model (DOI:10.1121/1.1913427) that represents an average adult male speaker. This model was used to determine an area function for each of the synthetic articulatory states. The speech samples were synthesized by applying the transmission line analogue for the human vocal tract. Both time- and frequency-domain applications of this representation were investigated. The frequency domain was selected for the final implementation to permit accurate representation of the distributed vocal tract losses, and radiation load effects. No glottal termination was used. The peaks of the synthesized power spectra were then determined. As a result, each of the synthetic states has known acoustic and articulatory attributes, and these data are stored in the database.

To perform the transformation from a natural speech sample to the articulatory state, formant information is estimated from segments of vowel-like speech using single-pitch pitch-synchronous linear predictive analyses. The estimated formants are then used to access the database of synthetic speech samples, resulting in a candidate list from which the best estimate state is chosen. This final selection employs a distance function, which not only considers acoustic distance, but also factors relating to articulatory continuity and neutrality.

The results of a number of experimental tests that validate the operation of the acoustic-to-articulatory transformation are presented. The database storage and access techniques have been verified using a number of examples based on published formant data and synthetically generated vowels. The capabilities of the complete transformation are demonstrated using a selection of natural speech samples produced by an adult male Australian English speaker. Specifically, these samples consist of isolated "h-d" words containing both vowels and diphthongs, as well as words, taken from passages of continuous speech, that demonstrate consonant-vowel and vowel-consonant co-articulation effects.

As a secondary objective of this dissertation, the design of a speech therapy aid, based on this acoustic-to-articulatory transformation, is outlined. This system is intended for use by speech therapist and teachers, and provides the speaker with acoustic and articulatory information to supplement the aural feedback normally used in learning to speak. The aid is expected to be especially valuable for hearing-impaired speakers who lack effective aural feedback. The suggested implementation of the aid utilizes state-of-the-art 16-bit microprocessor and VLSI digital signal processing technology. Particular attention is paid to the development of a low-cost multi-feature aid that operates in near-real-time, is easy to use, and is capable of addressing many of the instrumentation problems of modern speech therapy and training.

A number of other topics are examined, as they are required in the implementation of the proposed speech therapy aid. These topics include the segmentation of natural speech into basic phonetic classes, the validation of the raw formant data by tracking the formants through successive pitch periods, and the normalization of speaker-dependent features. This latter topic is especially important for a speech therapy aid, where the speakers are likely to range from young children to adults. Algorithms for each of these topics are selected on the basis of maximum return for minimum effort.

The segmentation procedure used is an enhanced version of a previously published algorithm (DOI:10.1109/ICASSP.1982.1171793), and is computationally efficient, requiring only measures of signal amplitude, zero-crossing rate, and wide-band energy. The enhanced procedure is capable of accurately delineating segments of natural speech to reflect the four basic speech classes: voiced, unvoiced, voiced-unvoiced, and silence. Plosives are also identified, although with an accuracy somewhat less than for the basic classes.

A novel formant tracking algorithm was developed, in order to track the formants accurately through consecutive pitch periods. The ability of this new procedure to track formant transitions during diphthongs and co-articulation effects is demonstrated by a number of examples. As with the segmentation algorithm, the developed formant tracking algorithm is computationally efficient, and only requires a small amount of a priori speaker-dependent information in addition to the formant data. Within this dissertation, this algorithm is applied to vowel-like sounds only, but is not limited to this, and could easily be extended to the general formant-tracking task.

A survey of published speaker normalizations is presented, and from this representative examples of speech production- and perception-based normalizations are chosen. Experimental results for a speaker-normalized version of the acoustic-to-articulatory transformation, using each of the selected procedures, are provided.
Keyword Speech -- Research
Deaf -- Means of communication
Speech therapy
Additional Notes Other Title: Acoustic-to-articulatory transformation.

Document type: Thesis
Collection: UQ Theses (RHD) - UQ staff and students only
Citation counts: Google Scholar Search Google Scholar
Created: Fri, 23 Jan 2015, 11:43:13 EST by Mary-Anne Marrington on behalf of Scholarly Communication and Digitisation Service