This software is meant to support research and education regarding:

   * Flexible Bayesian models for regression and classification 
     based on neural networks and Gaussian processes, for
     probability density estimation using mixtures and Dirichlet
     diffusion trees, and for inferring the sources of atmospheric

   * Markov chain Monte Carlo methods, and their applications to
     Bayesian modeling, including implementations of Metropolis,
     hybrid Monte Carlo, slice sampling, and tempering methods.

   * Neural network training using early stopping.  This is mostly
     for purposes of comparing with Bayesian methods.

   * Molecular simulation using the Lennard-Jones potential.  This
     is mostly for testing MCMC methods in this context.

See References.doc for some references on these various topics.

The facilities provided by this software might be useful for actual
problems, but you should note that many features that might be needed
for real problems have not been implemented, and that the programs
have not been tested to the extent that would be desirable for
important applications.

The complete source code (in C) is provided, allowing researchers to
modify the program to test new ideas.  It is not necessary to know C
to use the programs (assuming you manage to install them correctly).

This software is designed for use on a Unix/Linux/macOS system, or a
MS Windows system with the Cygwin Unix emulation environment, using
commands issued to the command interpreter (shell).  No particular
window system or other GUI is required, but a plotting program will be
very useful.  My version of the 'graph' program, available from
https://github.com/radfordneal/plotutils, is suitable.  I previously
used the xgraph plot program, written by David Harrison, which can be
obtained from my web page at www.cs.utoronto.ca/~radford.  Both of
these programs allow plots to be produced by just piping data from one
of the plotting commands provided by this software.

Markov chain Monte Carlo facilities.

All the Bayesian models are implemented using Markov chains to sample
from the posterior distribution.  For the elaborate models based on
neural networks, Gaussian processes, mixtures, and Dirichlet diffusion
trees, this is done by combining general-purpose Markov chain sampling
procedures with special modules written in C.  Special modules for
other models could also be implemented, but this is a fairly major

To allow people to play around with the various Markov chain methods
more easily, a facility is provided for defining distributions (on
R^n) by giving a simple formula for the probability density.  Many
Markov chain sampling methods, such as the Metropolis algorithm,
hybrid (Hamiltonian) Monte Carlo, slice sampling, simulated tempering,
and annealed importance sampling may then be used to sample from this
distribution.  Bayesian posterior distributions can be defined by
giving a formula for the prior density and for the likelihood based on
each of the cases (which are assumed to be independent).

A long review paper of mine on "Probabilistic Inference Using Markov
Chain Monte Carlo Methods" can be obtained from my web page, at
www.cs.utoronto.ca/~radford.  This review discusses methods based on
Hamiltonian dynamics, including the "hybrid Monte Carlo" method.
Hamiltonian (hybrid) Monte Carlo is also discussed in my review on
"MCMC using Hamiltonian dynamics", and in my book on "Bayesian
Learning for Neural Networks" (based on my PhD thesis).  My web page
also has papers on slice sampling ("Markov chain Monte Carlo methods
based on `slicing' the density function" and "Slice sampling"),
Annealed Importance Sampling, circularly-coupled Markov chain
sampling, and non-reversible updating of the uniform variable for
acceptance decisions, all of which are implemented in this software.

Neural network and Gaussian process models.

The neural network models are described in my thesis, "Bayesian
Learning for Neural Networks", now published by Springer-Verlag (ISBN
0-387-94724-8).  The neural network models implemented are extensions
of those described in the Appendix of that book.  The models currently
implemented are described in net-models.PDF.

The Gaussian process models are in many ways analogous to the network
models.  The Gaussian process models implemented in this software, and
computational methods that used, are described in my technical report
entitled "Monte Carlo implementation of Gaussian process models for
Bayesian regression and classification", available from my web page,
and in my Valencia conference paper on "Regression and classification
using Gaussian process priors", in Bayesian Statistics 6.  The
Gaussian process regression models are similar to those that were
evaluated in Carl Rasmussen's thesis, "Evaluation of Gaussian
Processes and other Methods for Non-Linear Regression"
(mlg.eng.cam.ac.uk/pub/pdf/Ras96b.pdf); he also talks about neural
network models.  To understand how to use the software implementing
these models, it is essential for you to have read at least one of
these references, or similar material.

The neural network software supports Bayesian learning for regression
problems, classification problems, and survival analysis, using models
based on networks with any number of hidden layers, with a wide
variety of prior distributions for network parameters and
hyperparameters.  It is possible to define convolutional models, and
models with other custom patterns of connectins between layers, with
or without weight sharing.

The Gaussian process software supports regression and classification
models that are similar to neural network models with an infinite
number of hidden units, using Gaussian priors.  However, convolutional
models are not currently supported.

The advantages of Bayesian learning with both neural network and
Gaussian process models include the automatic determination of
"regularization" hyperparameters, without the need for a validation
set, the avoidance of overfitting when using large networks, and the
quantification of uncertainty in predictions.  The software implements
the Automatic Relevance Determination (ARD) approach to handling
inputs that may turn out to be irrelevant (developed with David

For problems and networks of modest size (eg, 200 training cases, 10
inputs, 20 hidden units), fully training a neural network model (to
the point where one can be reasonably sure that the correct Bayesian
answer has been found) typically takes only a few seconds or minutes
on a modern personal computer.  Moreover, quite good results,
competitive with other methods, are often obtained with less training.
The time required to train the Gaussian process models depends a lot
on the number of training cases.  For 100 cases, these models may take
only a few seconds or minutes to train (again, to the point where one
can be reasonably sure that convergence to the correct answer has
occurred).  For thousands of cases, however, training might well take

The software also implements neural network training using early
stopping, as described in my paper on "Assessing relevance
determination methods using DELVE", in Neural Networks and Machine
Learning, C. M. Bishop, editor, Springer-Verlag, 1998.  A similar
early stopping method is also described in Carl Rasmussen's thesis
(see above).

Bayesian mixture models and Dirichlet diffusion trees.

The software implements Bayesian mixture models for multivariate real
or binary data, with both finite and countably infinite numbers of
components.  The countably infinite mixture models are equivalent to
Dirichlet process mixture models.  The sampling methods that I have
implemented for these models are described in my technical report on
"Markov chain sampling methods for Dirichlet process mixture models",
which can be obtained from my web page.  See also my technical report
on "Bayesian mixture modeling by Monte Carlo simulation".

The software also implements models based on Dirichlet diffusion
trees, described in my technical report on "Defining priors for
distributions using Dirichlet diffusion trees", and in my Valencia
conference paper on "Density modeling and clustering using Dirichlet
diffusion trees", in the Bayesian Statistics 7.  Dirichlet diffusion
trees can be seen both as a way of modeling distributions and as a
method for hierarchical clustering.

Models for inferring sources of atmospheric contamination.

This module implements Bayesian inference using Markov chain Monte
Carlo for a more specialized set of models than the modules described

Molecular simulation using the Lennard-Jones potential.

This is a commonly-used model in the chemistry and physics literature,
which is implemented in this software to allow testing of MCMC methods
as applied to this application area.

Software components.

The software consists of a number of programs and modules.  Each major
component has its own directory, as follows:
    util    Modules and programs of general utility.

    mc      Modules and programs that support sampling using Markov 
            chain Monte Carlo methods, using modules from util.

    dist    Programs for doing Markov chain sampling on a distribution
            given by a simple formula, or by giving a Bayesian prior
            and likelihood, using the modules from util and mc.

    net     Modules and programs that implement Bayesian inference
            for models based on multilayer perceptron neural networks, 
            using the modules from util and mc.  Also implements simple
            gradient descent training, possibly with early stopping.

    gp      Modules and programs that implement Bayesian inference
            for models based on Gaussian processes, using the modules
            from util and mc.

    mix     Modules and programs that implement Bayesian inference
            for finite and infinite mixture models, using modules
            from util and mc.

    dft     Modules and programs that implement density modeling and
            clustering methods based on Dirichlet diffusion trees.

    src     Modules and programs that implement Bayesian inference
            for models of the source of atmospheric contaminants.

    mol     Modules and programs that implement molecular simulation
            using the Lennard-Jones potential.

In addition, the 'bvg' directory contains modules and programs for
sampling from a bivariate Gaussian distribution, as a simple
demonstration of how the Markov chain Monte Carlo facilities can be
used from a special module written in C.  Other than by providing this
example, and the detailed documentation on various commands, I have
not attempted to document how you might go about using the Markov
chain Monte Carlo modules for another application written in C.

The following directories contain examples of how these programs can
be used, many of which are discussed in the documentation:

    ex-dist   Examples of Markov chain sampling on distributions
              specified by simple formulas.

    ex-circ   Examples of circularly-coupled Markov chain sampling.

    ex-bayes  Examples of Markov chain sampling for Bayesian models
              specified using formulas for the prior and likelihood.

    ex-netgp  Examples of Bayesian regression and classification 
              models based on neural networks and Gaussian processes.

    ex-image  Examples of neural network models for image classification.

    ex-surv   Examples of neural network survival models.

    ex-mixdft Examples of Bayesian mixture models and Dirichlet diffusion
              tree models.  Includes command files for the test in my 
              paper on "Markov chain sampling methods for Dirichlet process 
              mixture models" and for one of the tests in "Density modeling 
              and clustering using Dirichlet diffusion trees".

    ex-gdes   Examples of neural network learning using gradient 
              descent and early stopping.

    ex-src    Examples of source location models.

    ex-ais    Contains command and data files used for the tests in
              my paper on "Annealed importance sampling".

    ex-mol    Examples of molecular simulation.

You should note, however, that these examples do not constitute
"recipes" that can be used unchanged for new problems.  They are
intended to help you understand the models, priors, and computational
methods, so that you can devise an appropriate way of handling
whatever problem you are interested in.

Portability of the software.

The software is written in C, following the C99 standard.  It is meant
to be run in a Unix/Linux/macOS environment.  The various components
of the software are invoked as Unix/Linux shell commands, which may be
run via shell scripts.

There is no dependence on any particular graphics package or graphical
user interface.  The 'xxx-plt' programs are designed to allow their
output to be piped directly into the 'graph' or 'xgraph' plotting
programs, but other suitable plotting programs can be used instead, or
the numbers can be examined directly.  The 'xxx-tbl' programs output
the same information in a different format, which is useful when
plotting or analysing data with R, since this format is convenient for
the R read.table command.

Reproducibility of the results.

The results of running any of the programs should be reproducible if
the program is run again, with the same random number seed, on the
same machine, with the software compiled with the same options, using
the same versions of the compiler and standard library routines, using
the same CPU, or using the same GPU with the same driver software.

As soon as any of these things change, differences in results may
become possible.  For example, roundoff errors from floating-point
summation may change depending on summation order, which may differ
for code written to use various SIMD facilities (such as SSE2, versus
AVX, versus AVX2, for Intel/AMD processors).  Different system
libraries for mathematical functions (such as tanh) may produce
slightly different results.  Any small differences may subsequently be
amplified by chaotic dynamics, or by a small difference changing
whether or not a proposal is accepted.