FACILITIES PROVIDED BY THIS SOFTWARE

This software implements flexible Bayesian models for regression,
classification, and probability or density estimation applications.
The regression and classification models are based on multilayer
perceptron neural networks or on Gaussian processes.  The probability
and probability density models are based on finite or countably
infinite mixture models; the infinite models are also know as
Dirichlet process mixture models.  Bayesian inference for these model
is done using Markov chain Monte Carlo methods.  Software modules that
support Markov chain sampling are included in the distribution, and
may be useful in other applications.  Note that I am distributing this
software to facilitate research in this area.  Potential users should
make note of the copyright notice at the beginning of this document
(or accessible via the first hypertext link).  You must obtain
permission from me before using this software for purposes other than
research or education.  You should also note that the software may
have bugs, particularly regarding recently added or experimental
features.

The neural network models are described in my thesis, "Bayesian
Learning for Neural Networks", which has now been published by
Springer-Verlag (ISBN 0-387-94724-8).  The neural network models
implemented are essentially as described in the Appendix of this book.
The Gaussian process models are in many ways analogous to the network
models.  The Gaussian process models implemented in this software, and
computatonal methods that used, are described in my technical report
entitled "Monte Carlo implementation of Gaussian process models for
Bayesian regression and classification", available in compressed
Postscript at URL ftp://ftp.cs.utoronto.ca/pub/radford/mc-gp.ps.Z.
The Gaussian process models for regression are similar to those
evaluated by Carl Rasmussen in his thesis, "Evaluation of Gaussian
Processes and other Methods for Non-Linear Regression", available from
his home page, at the URL http://www.cs.utoronto.ca/~carl/; he also
talks about neural network models.  To understand how to use the
software implementing these models, it is essential for you to have
read at least one of these references.

The neural network software supports Bayesian learning for regression
problems, classification problems, and survival analysis (experimental), 
using models based on networks with any number of hidden layers, with
a wide variety of prior distributions for network parameters and
hyperparameters.  The Gaussian process software supports regression
and classification models that are similar to neural network models
with an infinite number of hidden units, using Gaussian priors.

The advantages of Bayesian learning for both types of model include
the automatic determination of "regularization" hyperparameters,
without the need for a validation set, the avoidance of overfitting
when using large networks, and the quantification of uncertainty in
predictions.  The software implements the Automatic Relevance
Determination (ARD) approach to handling inputs that may turn out to
be irrelevant (developed with David MacKay).  

For problems and networks of moderate size (eg, 200 training cases, 10
inputs, 20 hidden units), fully training a neural network model (to
the point where one can be reasonably sure that the correct Bayesian
answer has been found) typically takes several hours to a day on our
SGI machine.  However, quite good results, competitive with other
methods, are often obtained after training for under an hour. The time
required to train the Gaussian process models depends a lot on the
number of training cases.  For 100 cases, these models may take only a
few minutes to train (again, to the point where one can be reasonably
sure that convergence to the correct answer has occurred).  For 1000
cases, however, training might well require a day of computation.

The finite mixture models are similar to those which have been used by
many people - for example, Lavine and West (Canadian Journal of
Statistics, vol. 20, pp. 451-461, 1992) fit similar models using
similar Markov chain Monte Carlo methods.  The countably infinite
mixture models are equivalent to Dirichlet process mixtures.  Markov
chain sampling for these models has been described by Escobar and 
West (Journal of the American Statistical Association, vol. 90,
pp. 577-588, 1995).  Both finite and infinite mixture models (for
binary data) are described in my tech report, "Bayesian mixture
modeling by Monte Carlo simulation", available by anonymous ftp at the
URL ftp://ftp.cs.utoronto.ca/pub/radford/bmm.ps.Z.  The models and
Markov chain methods used in the software are not identical to those
described in any of these references, however.  The details can be
found only in the software documentation.  For this reason, the
mixture software may be a bit difficult to figure out until such time
as I get around to writing up a paper describing this implemention.
This part of the software is rather preliminary in other respects as
well.

The software consists of a number of programs and modules.  Four major
components are included in this distribution, each with its own
directory:
  
    util    Modules and programs of general utility.

    mc      Modules and programs that support sampling using Markov 
            chain Monte Carlo methods, using modules from util.

    net     Modules and programs that implement Bayesian inference
            for models based on multilayer perceptrons, using the
            modules from util and mc.

    gp      Modules and programs that implement Bayesian inference
            for models based on Gaussian processes, using the modules
            from util and mc.

    mix     Modules and programs that implement Bayesian inference
            for finite and infinite mixture models, using modules
            from util and mc.

In addition, the 'bvg' directory contains modules and programs for
sampling from a bivariate Gaussian distribution, as a simple
demonstration of the capabilities of the Markov chain Monte Carlo
facilities.  Other than by providing this example, and the detailed
documentation on various commands, I have not attempted to document
how you might go about using the Markov chain Monte Carlo modules for
another application.

The 'examples' directory contains the data sets that are used in the
tutorial examples, along with shell scripts containing the commands
used.

It is possible to use this software to do learning and prediction
without any knowledge of how the programs are written (assuming that
the software can be installed as described below without any
problems).  However, the complete source code is included so that
researchers can modify the programs to try out their own ideas.

The software is written in ANSI C, and is meant to be run in a UNIX
environment.  Specifically, it was developed on an SGI machine running
IRIX Release 5.3.  It also seems to run OK on a SPARC machine running
SunOS 5, using the 'gcc' C compiler, and on DEC Alpha machines.  As
far as I know, the software does not depend on any peculiarities of
these environments (except perhaps for the use of the drand48
psuedo-random number generator), but you may nevertheless have
problems getting it to work in substantially different environments,
and I can offer little or no assistance in this regard.  There is no
dependence on any particular graphics package or graphical user
interface.  (The 'xxx-plt' programs are designed to allow their output
to be piped directly into the 'xgraph' plotting program, but other
plotting programs can be used instead, or the numbers can be examined
directly.)