Boltzmann machine learning : analysis and improvements

Wood, Ian Andrew. (2004). Boltzmann machine learning : analysis and improvements PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
THE17753.pdf Full text application/pdf 22.78MB 2
Author Wood, Ian Andrew.
Thesis Title Boltzmann machine learning : analysis and improvements
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution The University of Queensland
Publication date 2004
Thesis type PhD Thesis
Supervisor Prof Tom Downs
Dr. Marcus Gallagher
Total pages 312
Collection year 2004
Language eng
Subjects L
280212 Neural Networks, Genetic Alogrithms and Fuzzy Logic
700101 Application packages
Formatted abstract
The Boltzmann machine (BM) is an artificial neural network capable of modelling high-dimensional joint binary probability distributions. It has not been widely applied since it is generally considered to be too computationally intensive. This thesis examines the BM and the pre-sampling, sampling and learning algorithms used with it, aiming to improve the effectiveness and the computational efficiency of each. The equilibrium distribution of BM states is the Boltzmann distribution, which must be approximated for larger architectures. The Markov chain Monte Carlo (MCMC) sampling approach is advocated for approximation of the Boltzmann distribution due to the shortcomings of the competing mathematical approximation and tractable architecture approaches, both of which are reviewed.

Pre-sampling methods such as simulated annealing (SA) can be used to find a suitable state in which to begin MCMC sampling. SA has traditionally been used for BM pre-sampling, but takes up a significant proportion of the overall computation time in BM simulation. Four optimising demon algorithms are proposed for use in optimisation and pre-sampling and are expected to be faster than SA. Methods for automatically choosing their free parameters are described. The optimising demon algorithms are tested against SA on 200- and 442-city travelling salesperson problems (TSPs) and found to obtain similar near-optimal results. Computation time analysis of all these algorithms reveals that the two non-random optimising demon algorithms are twice as fast as SA in TSP optimisation.

Nine pre-sampling algorithms, including SA and the optimising demon algorithms are then experimentally evaluated on the BM. They are tested on four weight sets taken from different stages of BM learning on a distribution version of the 8-bit encoder problem. On three weight sets, pre-sampling offers no benefit. The last weight set produced the most challenging sampling environment and optimising demon and SA pre-sampling offer significant advantages here. Only 10% as many pre-sampling trials as sampling trials are needed for near-maximum benefit from pre-sampling.

Computation time analysis of pre-sampling, sampling and sample processing in BM learning and testing makes clear that at the 10% level, any of the pre-sampling algorithms could be used with little effect on overall computation time. The random annealed demon produced the best performance of the nine algorithms so is recommended for BM pre-sampling. Other sampling-related issues are discussed, including the role of pre-sampling, the use of correlation times to predict required numbers of samples and the use of penalty terms in KL-divergence measures.

Gradient descent learning for the BM is examined using detailed measurements from two runs of 3000 epochs. The first uses exact gradient descent and the second uses a sample-based estimate. This comparison allows the effect of sample error on learning to be examined. Both learning runs perform similarly and become captured by the same type of sub-optimal solution. This implies that the limitations of the gradient descent learning algorithm are a more significant problem for BM learning than sampling inaccuracy. 'Bad' sampling runs which can disrupt learning are found in the later stages of all training runs. Monitoring mean sample energy can allow bad sampling runs to be detected and re-run more successfully.

A range of potential improvements to BM learning are then proposed, including Gest optimisation, conjugate gradient, online learning, the free probability method and a principled method of choosing the initial weights. It is hoped that further research to test these methods will allow the BM to become a useful tool for distribution modelling and density estimation.
Keyword Neural networks (Computer science)

Document type: Thesis
Collection: UQ Theses (RHD) - UQ staff and students only
Citation counts: Google Scholar Search Google Scholar
Created: Fri, 24 Aug 2007, 18:26:06 EST