Bayesian Training of Backpropagation Networks by the Hybrid Monte Carlo Method

Radford M. Neal, Dept. of Computer Science, University of Toronto

It is shown that Bayesian training of backpropagation neural networks can feasibly be performed by the "Hybrid Monte Carlo" method. This approach allows the true predictive distribution for a test case given a set of training cases to be approximated arbitrarily closely, in contrast to previous approaches which approximate the posterior weight distribution by a Gaussian. In this work, the Hybrid Monte Carlo method is implemented in conjunction with simulated annealing, in order to speed relaxation to a good region of parameter space. The method has been applied to a test problem, demonstrating that it can produce good predictions, as well as an indication of the uncertainty of these predictions. Appropriate weight scaling factors are found automatically. By applying known techniques for calculation of "free energy" differences, it should also be possible to compare the merits of different network architectures. The work described here should also be applicable to a wide variety of statistical models other than neural networks.

Technical Report CRG-TR-92-1 (April 1992), 21 pages: postscript, pdf.

Associated references: Work related to that reported in ``Bayesian training of backpropagation networks by the hybrid Monte Carlo method'' appears in the following conference paper:
Neal, R. M. (1993) ``Bayesian learning via stochastic dynamics'', in C. L. Giles, S. J. Hanson, and J. D. Cowan (editors) Advances in Neural Information Processing Systems 5, pp. 475-482, San Mateo, California: Morgan Kaufmann: abstract.
The technical report is longer, and contains material on annealing not present in the conference paper. The conference paper contains material on uncorrected stochastic dynamics and on comparisons with standard network training that are not in the technical report.

Further developments along the same lines are reported in Chapter 3 of my thesis:

Neal, R. M. (1994) Bayesian Learning for Neural Networks, Ph.D. Thesis, Dept. of Computer Science, University of Toronto, 195 pages: abstract, postscript, pdf, associated references, associated software.
A revised version of this thesis, with some new material, was published by Springer-Verlag:
Neal, R. M. (1996) Bayesian Learning for Neural Networks, Lecture Notes in Statistics No. 118, New York: Springer-Verlag: blurb, associated references, associated software.