The Boltzmann machine (BM) is an artificial neural network capable of modelling high-dimensional joint binary probability distributions. It has not been widely applied since it is generally considered to be too computationally intensive. This thesis examines the BM and the pre-sampling, sampling and learning algorithms used with it, aiming to improve the effectiveness and the computational efficiency of each. The equilibrium distribution of BM states is the Boltzmann distribution, which must be approximated for larger architectures. The Markov chain Monte Carlo (MCMC) sampling approach is advocated for approximation of the Boltzmann distribution due to the shortcomings of the competing mathematical approximation and tractable architecture approaches, both of which are reviewed.
Pre-sampling methods such as simulated annealing (SA) can be used to find a suitable state in which to begin MCMC sampling. SA has traditionally been used for BM pre-sampling, but takes up a significant proportion of the overall computation time in BM simulation. Four optimising demon algorithms are proposed for use in optimisation and pre-sampling and are expected to be faster than SA. Methods for automatically choosing their free parameters are described. The optimising demon algorithms are tested against SA on 200- and 442-city travelling salesperson problems (TSPs) and found to obtain similar near-optimal results. Computation time analysis of all these algorithms reveals that the two non-random optimising demon algorithms are twice as fast as SA in TSP optimisation.
Nine pre-sampling algorithms, including SA and the optimising demon algorithms are then experimentally evaluated on the BM. They are tested on four weight sets taken from different stages of BM learning on a distribution version of the 8-bit encoder problem. On three weight sets, pre-sampling offers no benefit. The last weight set produced the most challenging sampling environment and optimising demon and SA pre-sampling offer significant advantages here. Only 10% as many pre-sampling trials as sampling trials are needed for near-maximum benefit from pre-sampling.
Computation time analysis of pre-sampling, sampling and sample processing in BM learning and testing makes clear that at the 10% level, any of the pre-sampling algorithms could be used with little effect on overall computation time. The random annealed demon produced the best performance of the nine algorithms so is recommended for BM pre-sampling. Other sampling-related issues are discussed, including the role of pre-sampling, the use of correlation times to predict required numbers of samples and the use of penalty terms in KL-divergence measures.
Gradient descent learning for the BM is examined using detailed measurements from two runs of 3000 epochs. The first uses exact gradient descent and the second uses a sample-based estimate. This comparison allows the effect of sample error on learning to be examined. Both learning runs perform similarly and become captured by the same type of sub-optimal solution. This implies that the limitations of the gradient descent learning algorithm are a more significant problem for BM learning than sampling inaccuracy. 'Bad' sampling runs which can disrupt learning are found in the later stages of all training runs. Monitoring mean sample energy can allow bad sampling runs to be detected and re-run more successfully.
A range of potential improvements to BM learning are then proposed, including Gest optimisation, conjugate gradient, online learning, the free probability method and a principled method of choosing the initial weights. It is hoped that further research to test these methods will allow the BM to become a useful tool for distribution modelling and density estimation.