## Priors for Infinite Networks

**Radford M. Neal,
Dept. of Computer Science, University of Toronto**
Bayesian inference begins with a prior distribution for model
parameters that is meant to capture prior beliefs about the
relationship being modeled. For multilayer perceptron networks,
where the parameters are the connection weights, the prior lacks
any direct meaning - what matters is the prior over functions
computed by the network that is implied by this prior over
weights. In this paper, I show that priors over weights can be
defined in such a way that the corresponding priors over
functions reach reasonable limits as the number of hidden units
in the network goes to infinity. When using such priors, there
is thus no need to limit the size of the network in order to
avoid "overfitting". The infinite network limit also provides
insight into the properties of different priors. A Gaussian prior
for hidden-to-output weights results in a Gaussian process prior
for functions, which can be smooth, Brownian, or fractional
Brownian, depending on the hidden unit activation function and the
prior for input-to-hidden weights. Quite different effects can be
obtained using priors based on non-Gaussian stable distributions.
In networks with more than one hidden layer, a combination of
Gaussian and non-Gaussian priors appears most interesting.

Technical Report CRG-TR-94-1 (March 1994), 22 pages:
postscript, pdf.

**Associated reference:**
With minor changes, ``Priors for infinite networks'' became Chapter 2 of
my Ph.D. thesis:
Neal, R. M. (1994) *Bayesian Learning for Neural Networks*, Ph.D.
Thesis, Dept. of Computer Science, University of Toronto, 195 pages:
abstract,
postscript, pdf,
associated references,
associated software.

A revised version of this thesis, with some new material, was
published by Springer-Verlag:
Neal, R. M. (1996) *Bayesian Learning for Neural Networks*,
Lecture Notes in Statistics No. 118, New York: Springer-Verlag:
blurb,
associated references,
associated software.