The process of hearing involves converting the varying pressure values detected at each ear into a continuously updated set of representations of sound sources. The mechanisms in the brain that assign components of sound to probable sources are complex and not yet well understood. The present enquiry contributes to the investigation of auditory perception by assessing and extending one relatively simple computational model of an element of auditory processing. Neurobiological studies of audition measure responses of individual cells, but can not form a picture of the general mechanisms that underlie audition. Conversely, psychological studies of audition, measuring the response of the whole organism to sound, can suggest organizing principles, but have not yielded a description of audition that explains how those principles might arise in the brain. Computational modelling of the neural mechanisms that perform auditory grouping can provide an understanding of audition at an intermediate level of description that may be reconciled with both biological and behavioural accounts. The process by which the components of simultaneous complex sounds are assigned to different sources is called auditory stream segregation.
The subject of this thesis is a computational model of auditory stream segregation: Wang's Segregation Network (SegNet). SegNet is a network of oscillating units representing neurons that organizes a representation of sounds into streams according to the main grouping principle: grouping by relative time-frequency proximity of component sounds. SegNet was chosen for this study rather than competing models because its components directly represent features of the human auditory system. For example, SegNet's oscillating units communicate by the relative timing of identical spikes, as do real neurons, rather than by continuous values. However, the human auditory system groups sounds into streams by more complex criteria than just time-frequency proximity. Many other auditory grouping principles have been identified. An accurate computational model of auditory stream segregation would be able to simulate those psychophysical experiments that demonstrate each of the known auditory grouping principles. A review of auditory psychophysical experiments and theories, auditory physiology and computational models of audition is given in an introductory chapter.
To assess SegNet's usefulness as a model of auditory streaming, it was firstly used to simulate those psychophysical experiments that demonstrate grouping by relative time-frequency proximity, which it was designed to implement. Then two modifications to the model were proposed to allow it to simulate two more of the known auditory grouping principles - grouping by onset synchrony, and stream bias adaptation. A simplified implementation of SegNet based on the Singular Limit Theorem algorithm was used for simulations to avoid the complicated "shortcuts" used in the original published description of SegNet [Wang 1996]. Also, a new measure of synchrony, based on the time a pair of oscillators is active relative to the time they are simultaneously active, was defined to provide a means of directly comparing the behaviour of simulations to human psychophysical data. Psychophysical experiments that demonstrate relative grouping by time-frequency proximity were simulated to determine firstly whether the modified implementation of SegNet could reproduce Wang's original experiments, and secondly how accurately SegNet simulates these phenomena. The Singular Limit Theorem version of SegNet given identical data formed much the same groupings as were reported for the original SegNet. A large suite of simulations of alternating tone stimuli was used to map out the combinations of frequency ratios and inter-onset intervals for which one stream or two streams is formed. Although the pattern of behaviours did not correspond closely to data from human auditory streaming experiments, the shapes of the boundaries between regions were found to be surprisingly similar. The similarity suggests that the Gaussian distribution of weights used in SegNet, which determines the pattern of grouping, is an appropriate choice of function to represent a mechanism that relates components of a sound in time and frequency.
To determine whether SegNet is a useful model for exploring possible mechanisms of auditory streaming, it was extended to simulate two different auditory grouping principles in addition to grouping by time-frequency proximity: grouping by onset synchrony, and stream bias adaptation. These two target phenomena were chosen because they are well documented phenomena which the original SegNet clearly does not simulate, and because they are examples of different types of grouping: onset synchrony is a simultaneous cue, whereas stream bias adaptation is based on sequential cues.
The first extension to SegNet implements grouping by onset synchrony, whereby sounds that begin at the same time tend to group more strongly than sounds with asynchronous onsets. The extension to SegNet involves adding an extra set of excitatory connections between the existing set of oscillatory units. Tone onsets were assumed to have been detected and were passed to SegNet as extra inputs. Simulations revealed that the extended network can group simultaneous components more strongly than would the original network, however the effect of varying onset synchrony between two tones on the strength of grouping with a third captor tone was not convincingly simulated. Some more complex mechanism may be needed to implement grouping by simultaneous onsets.
The second extension to SegNet implements stream bias adaptation. The organization of sounds into groups adapts gradually over several seconds, so the organization of sounds heard after a silence of a second or two is biased by what was heard before the silence. The extension to SegNet to add stream bias adaptation involves adding an extra set of connections between the existing oscillatory units, as well as adding a value to each unit that represents the bias. Again, tone onsets were passed to SegNet as extra inputs. Simulations revealed that the extended network qualitatively simulates both the accumulation and the decay of streaming, however the pattern of decay over time does not closely match human psychophysical data. Although the extended model fell short of accurately simulating stream bias adaptation, the proposed mechanism warrants further research because unlike competing models that implement stream bias adaptation, more than two streams simultaneous can be represented and none is assumed to be the foreground.
Whether multiple auditory streams are organized by primitive grouping processes remains controversial. Concurrently with this work, SegNet has been used as a component of more complex models of audition. However, SegNet as a simple stand-alone system has some advantages, despite its quantitative inaccuracies. It is easy to suggest and implement possible mechanisms to account for particular streaming phenomena, and testing proposed models is made easy by the ability to directly compare the model's behaviour to human psychophysical data.