This thesis is about the neocognitron, a neural network that was proposed by Fukushima in 1979. Inspired by Hubel and Wiesel's serial model of processing in the visual cortex, the neocognitron was initially intended as a self-organizing model of vision, however, we are concerned with the supervised version of the network, put forward by Fukushima in 1983. Through "training with a teacher", Fukushima hoped to obtain a character recognition system that was tolerant of shifts and deformations in input images. Until now though, it has not been clear whether Fukushima's approach has resulted in a network that can rival the performance of other recognition systems.
In the first three chapters of this thesis, the biological basis, operational principles and mathematical implementation of the supervised neocognitron are presented in detail. At the end of this thorough introduction, we consider a number of important issues that have not previously been addressed. How should S-cell selectivity and other parameters be chosen so as to maximize the network's performance? How sensitive is the network's classification ability to the supervisor's choice of training patterns? Can the neocognitron achieve state-of-the-art recognition rates and, if not, what is preventing it from doing so?
Chapter 4 looks at the Optimal Closed-Form Training (OCFT) algorithm, a method for adjusting S-cell selectivity, suggested by Hildebrandt in 1991. Experiments reveal flaws in the assumptions behind OCFT and provide motivation for the development and testing (in Chapter 5) of three new algorithms for selectivity adjustment: SOFT, SLOG and SHOP. Of these methods, SHOP is shown to be the most effective, determining appropriate selectivity values through the use of a validation set of handwritten characters.
SHOP serves as a method for probing the behaviour of the neocognitron and is used to investigate the effect of cell masks, skeletonization of input data and choice of training patterns on the network*s performance. Even though SHOP is the best selectivity adjustment algorithm to be described to date, the system's peak correct recognition rate (for isolated ZIP code digits from the CEDAR database) is around 75% (with 75% reliability) after SHOP training. It is clear that the neocognitron, as originally described by Fukushima, is unable to match the performance of today's most accurate digit recognition systems which typically achieve 90% correct recognition with near 100% reliability.
After observing the neocognitron's failure to exploit the distinguishing features of different kinds of digits in its classification of images, Chapter 6 proposes modifications to enhance the networks ability in this regard. Using this new architecture, a correct clcissification rate of 84.62% (with 96.36% reliability) was obtained on CEDAR ZIP codes, a substantial improvement but still a level of performance that is somewhat less than state-of-the-art recognition rates. Chapter 6 concludes with a critical review of the hierarchical feature extraction paradigm.
The final chapter summarizes the material presented in this thesis and draws the significant findings together in a series of conclusions. In addition to the investigation of the neocognitron, this thesis also contains a derivation of statistical bounds on the errors that arise in multilayer feedforward networks as a result of weight perturbation (Appendix E).