Abstract for <I>Bayesian Learning for Neural Networks</I>

Bayesian Learning for Neural Networks

Radford M. Neal, Dept. of Computer Science, University of Toronto

Two features distinguish the Bayesian approach to learning models from data. First, beliefs derived from background knowledge are used to select a prior probability distribution for the model parameters. Second, predictions of future observations are made by integrating the model's predictions with respect to the posterior parameter distribution obtained by updating this prior to take account of the data. For neural network models, both these aspects present difficulties - the prior over network parameters has no obvious relation to our prior knowledge, and integration over the posterior is computationally very demanding.

I address the first problem by defining classes of prior distributions for network parameters that reach sensible limits as the size of the network goes to infinity. In this limit, the properties of these priors can be elucidated. Some priors converge to Gaussian processes, in which functions computed by the network may be smooth, Brownian, or fractionally Brownian. Other priors converge to non-Gaussian stable processes. Interesting effects are obtained by combining priors of both sorts in networks with more than one hidden layer.

The problem of integrating over the posterior can be solved using Markov chain Monte Carlo methods. I demonstrate that the hybrid Monte Carlo algorithm, which is based on dynamical simulation, is superior to methods based on simple random walks.

I use a hybrid Monte Carlo implementation to test the performance of Bayesian neural network models on several synthetic and real data sets. Good results are obtained on small data sets when large networks are used in conjunction with priors designed to reach limits as network size increases, confirming that with Bayesian learning one need not restrict the complexity of the network based on the size of the data set. A Bayesian approach is also found to be effective in automatically determining the relevance of inputs.

Ph.D. Thesis, Dept. of Computer Science, University of Toronto, 1994, 195 pages: postscript, pdf, associated software.

Associated references: A revised version of my thesis, with some new material, was published by Springer-Verlag:

Neal, R. M. (1996) Bayesian Learning for Neural Networks, Lecture Notes in Statistics No. 118, New York: Springer-Verlag: blurb, associated references, associated software, Springer site.

Chapter 2 of Bayesian Learning for Neural Networks is very similar to the follow technical report:

Neal, R. M. (1994) ``Priors for infinite networks'', Technical Report CRG-TR-94-1, Dept. of Computer Science, University of Toronto, 22 pages: abstract, postscript, pdf.

Chapter 3 is a further development of ideas in the following papers:

Neal, R. M. (1993) ``Bayesian learning via stochastic dynamics'', in C. L. Giles, S. J. Hanson, and J. D. Cowan (editors) Advances in Neural Information Processing Systems 5, pp. 475-482, San Mateo, California: Morgan Kaufmann: abstract.
Neal, R. M. (1992) ``Bayesian training of backpropagation networks by the hybrid Monte Carlo method'', Technical Report CRG-TR-92-1, Dept. of Computer Science, University of Toronto, 21 pages: abstract, postscript, pdf.