The attempt to find a single ``optimal'' weight vector in conventional network training can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.
In C. L. Giles, S. J. Hanson, and J. D. Cowan (editors) Advances in Neural Information Processing Systems 5 (aka NIPS*1992), pp. 475-482, San Mateo, California: Morgan Kaufmann: postscript, pdf.
Also available via the NIPS siteAssociated references: The work reported in ``Bayesian learning via stochastic dynamics'' is similar to that reported in the following technical report:
Neal, R. M. (1992) ``Bayesian training of backpropagation networks by the hybrid Monte Carlo method'', Technical Report CRG-TR-92-1, Dept. of Computer Science, University of Toronto, 21 pages: abstract, postscript, pdf.The technical report is longer, and contains material on annealing not present in the conference paper. The conference paper contains material on uncorrected stochastic dynamics and on comparisons with standard network training that are not in the technical report.
Further developments along the same lines are reported in Chapter 3 of my thesis:
Neal, R. M. (1994) Bayesian Learning for Neural Networks, Ph.D. Thesis, Dept. of Computer Science, University of Toronto, 195 pages: abstract, postscript, pdf, associated references, associated software.A revised version of this thesis, with some new material, was published by Springer-Verlag:
Neal, R. M. (1996) Bayesian Learning for Neural Networks, Lecture Notes in Statistics No. 118, New York: Springer-Verlag: blurb, associated references, associated software.