OVERVIEW OF THE SOFTWARE

The software is organized in a modular fashion.  The 'util' directory
provides a number of general facilities, some of which may be of use
for other purposes.  The 'mc' module provides support for Markov chain
Monte Carlo methods, which the 'dist' module allows you to use for a
distribution specified by a formula.  The 'net', 'gp', and 'mix' are
more specialized modules that use the Markov chain methods to support
Bayesian inference for neural networks, Gaussian process models, and
finite and infinite mixture models.  This section provides an overview
of the various components of the software.


Program usage.

The software is designed to be usable interactively by typing commands
to run the various programs to the Unix shell.  These programs may
create or modify files that hold the results.  You will usually want
to create a directory for each project you are working on, and change
to that directory before issuing these commands, so that the files
pertaining to that project will be separate.

Since I've gotten e-mail from people who were quite confused by the
simplicity of this concept, let me say it again.  YOU USE THE SOFTWARE
BY TYPING UNIX COMMANDS TO RUN THE VARIOUS PROGRAMS.  There is NO
master program that you run and type commands to - or to put it
another way, that master program is the Unix command interpreter (the
shell).

If you're doing serious work, you will probably want to create Unix
command files holding the main commands, so that you won't forget what
you did.


Log files.

Most of the programs make use of a "log file" facility supported by
modules and programs in 'util'.  A log file records all the
information pertaining to a "run" of an iterative program.  The first
few records of the log file (with "indexes" of -1) contain the
specifications for the run (such as the network architecture and the
source of training data).  These records are written by "spec"
programs (eg, 'net-spec' and 'data-spec') that the user invokes at the
beginning of the run.  Once the run has been specified, the program
that performs iterations is invoked (eg, 'net-mc').  This program will
append further records to the log file, one for each iteration for
which the user has asked the state to be saved (this might be every
iteration, or might be every n'th if minimizing disk usage is a
concern).  Each record written has the iteration number as its index,
and contains the complete state of the program at that time (eg, all
the parameters and hyperparameters of the network being trained).

Note that log files contain binary data; they are not human-readable.
They may also not be readable on a different type of machine from the
one on which they were written.

After an iterative program finishes, the user may decide to let the
run continue for more iterations.  This is easily done by just
invoking the program again with a larger iteration limit, whereupon it
restarts using the last state stored in the log file, and then appends
records to the log file for further iterations.

The information about iterations that is stored in the log file can be
examined using various programs both during and after a run.  In
particular, the user can plot the progress of various quantities
during the course of the run, without having to decide beforehand
which quantities will be of interest.  The states saved at various
iterations are also the basis for making Monte Carlo estimates, and in
particular for making Bayesian predictions based on a sample from the
posterior distribution.


Models and data.

The 'util' directory also contains modules and programs that specify
the final portion of a probabilistic model (which is independent of
the details of networks or other functional schemes), that support
reading of numeric input from data files or other sources, and that
specify sets of training and test cases.

The models supported include those for regression, classification,
probability density estimation, and survival analysis.  The survival
analysis models should be regarded as especially experimental.  See
model-spec.doc for details.

The data files used must contain numbers in standard ASCII form, with
one line per case, but there is considerable freedom regarding
separators and in the ordering of items.  "Input" and "target" items
that pertain to a case may come from the same file, or different
files, and the position within a line of each item may be specified
independently.  The set of cases (lines) to be used for training or
testing can be specified to be a subset of all the lines in a file.
The data source can also be specified to be the output of a program,
rather than a data file.

Specifications for where the training and test data comes from are
written to a log file by the 'data-spec' program, which also allows
the user to specify that certain transformations are to be done to the
data items before they are used.  In particular, the data can be
translated and re-scaled in a user-specified way, or by amounts that
are automatically determined from the training data.

The source of "test" data can also be specified explicitly by
arguments to the relevant commands, allowing the final results of
learning to be applied to any data set for which predictions are
desired.

Note that mixture models can presently be used only for data in which
the number of "input" items is zero.  The test data is currently
ignored for mixture models, since no prediction program has yet been
implemented.

See data-spec.doc for further details on how to specify the source and
format of the data.


Random number generation.

A scheme for combining real and pseudo random numbers is implemented
by modules in the 'util' directory, along with procedures for sampling
from various standard distributions, and for saving the state of the
random number generator.

The 'rand-seed' program is used to specify a random number seed to use
for a run.  The state of the random number generator is saved with
each iteration in the log file in order to ensure that resuming a run
produces the same results as if the run had continued without
stopping.


Markov chain Monte Carlo.

The 'mc' directory contains modules and programs that support the use
of Markov chain Monte Carlo methods.  These methods can be applied to
a distribution specified by a formula using the 'dist' programs (see
below).  More elaborate Markov chain Monte Carlo applications can be
created by adding modules in C that compute certain application
specific quantities, of which the most central is the probability
distribution to sample from.  For example, the neural network
application provides a procedure for computing the posterior
probability density of the network parameters.  An application may
also provide implementations of specialized sampling procedures, such
as the procedures for doing Gibbs sampling for hyperparameters in the
neural network application.

For the user of neural network, Gaussian process, and mixture models,
the most important 'mc' program is 'mc-spec', which is used to specify
how the Markov chain sampling is to be done.  There are a large number
of reasonable ways of sampling for neural networks or Gaussian
processes.  The best way is still the subject of research.  Good
results can be obtained using several standard approaches, however, as
described in the examples in other sections of this documentation.
You can also read all about the various methods in mc-spec.doc.  Note,
however, that for each model there are also sampling methods specific
to that model alone, which are documented in net-mc.doc, gp-mc.doc,
and mix-mc.doc.  At present, sampling for mixture models is done only
by the specific procedures described in mix-mc.doc.

Tempering methods and Annealed Importance Sampling are currently
supported for neural network models, but not for Gaussian processes
and mixture models.


Sampling from a specified distribution.

The 'dist' directory contains programs for sampling from a
distribution specified by a formula for its "energy" (minus the log
probability density, plus an arbitrary constant), or by formulas for
the prior and likelihood for a Bayesian model.  The full range of
Markov chain sampling methods implemented by the 'mc' module can be
used for these distributions, including the tempering and Annealed
Importance Sampling facilities.

The 'dist-spec' program is used to specify the distribution, along
with 'data-spec' to say where the data comes from, if the distribution
is the posterior for a Bayesian model.  The 'mc-spec' program is then
used to specify the Markov chain updates, after which 'dist-mc' does
the actual sampling.  The 'dist-display' and 'dist-plt' programs can
be used to monitor the runs, and 'dist-est' can be used to estimate
the expectation of some function with respect to the distribution.


Neural network models.

The 'net' directory contains the modules and programs that implement
Bayesian learning for models based on multilayer perceptron networks,
making use of the modules in the 'util' and 'mc' directories.  The
networks and data models supported are as described in my book,
Bayesian Learning for Neural Networks, with the addition of
experimental models for survival analysis.

A network training run is started with the 'net-spec' program, which
creates a log file to which it writes specifications for the network
architecture and priors.  In a simple run, the 'model-spec',
'data-spec' and 'mc-spec' programs would then be used to specify the
way the outputs of the network are used to model the targets in the
dataset, what data makes up the training set (and perhaps the test
set), and the way the sampling should be done.  The 'net-mc' program
(a specialization of the generic 'xxx-mc' program) would then be
invoked to do the actual sampling.  Finally, the 'net-pred' program
would be used to make predictions for test cases based on the networks
saved in the log file.

Usually, one would want to see how the run had gone before making
predictions.  The 'net-display' program allows one to examine the
network parameters and hyperparameters at any specified iteration.
The 'net-plt' program can be used to obtain the values of various
quantities, such as the training set error, for some range of
iterations.  The output of 'net-plt' would usually be piped to a
suitable plot program for visual examination, though it is also
possible to directly look at the numbers.

Several other programs are also present in the 'net' directory.  Some
of these will probably not be of interest to the ordinary user, as
they were written for debugging purposes, or to do specialized tasks
relating to my thesis.


Gaussian process models.

The 'gp' directory contains the modules and programs that implement
Bayesian inference for Gaussian process models, making use of the
modules in the 'util' and 'mc' directories.  These Gaussian process
programs are analogous to the neural network programs.  The models
based on Gaussian processes are also similar to models based on large
neural networks using Gaussian priors (or other priors with finite
variance).

To start, the 'gp-spec' program is used to specify a Gaussian process
model - that is, to specify the form of the covariance function, and
the priors on the hyperparameters that control this covariance
function.  The 'model-spec' and 'data-spec' programs are then used to
specify how the Gaussian process is used to model data, and the source
of the training data (and possibly test data).  The Markov chain
sampling method is then specified using 'mc-spec', and sampling is
done using 'gp-mc'.  Finally, 'gp-pred' is used to make predictions
for test cases using the Gaussian processes that were saved in the log
file by 'gp-mc'.

The 'gp-display' and 'gp-plt' programs can be used to view the
parameters of the Gaussian processes generated by 'gp-mc', both during
and after the run.  Several other programs in the 'gp' directory may
also be of interest.


Mixture models.

The 'mix' directory contains the modules and programs that implement
Bayesian inference for finite and infinite mixture models, making use
of the modules in the 'util' and 'mc' directories.  These models are
used to model the probabilities for vectors of binary data, or the
probability densities for real vectors.  The infinite mixture models
are equivalent to what are called Dirichlet process mixtures.

The 'mix-spec' program is used to specify a mixture model - that is,
to say how many components there are in the mixture (perhaps countably
infinite), and to specify the priors on the parameters and
hyperparameters.  The 'model-spec' and 'data-spec' programs are then
used to complete the specification of the data model and data source.
For mixture models, the data items are all considered to be "targets",
with the number of "inputs" being zero.  The Markov chain sampling
procedures to use are then specified using 'mix-spec', and the
sampling is done with 'mix-mc'.  At present, there is not all that
much that can be done with the results - in particular, there is no
prediction program.  One can look at the parameter values for states
drawn from the posterior with 'mix-display', however, and generate
future datasets using 'mix-cases'.


Quantities obtainable from log files.

The 'xxx-plt' programs (eg, 'dist-plt', 'net-plt', 'gp-plt', and
'mix-plt') are the principal means by which simulation runs are
monitored.  These programs allow one to see the values of various
"quantities", evaluated for each iteration stored in a log file within
some range.  Some other programs (eg, 'xxx-hist') also use the same
set of quantities.

A quantity is specified by an identifying character, perhaps with a
numeric modifier.  Some quantities are single numeric values
(scalars); others are arrays of values, in which case the desired
range of values is also specified following an "@" sign.  Some
quantities can be either scalars or arrays, depending on whether a
range specification is included.

There is a hierarchy of quantities, as defined by modules at different
levels.  A few quantities are universally defined - principally 't',
the index of the current iteration.  Many more are defined for any
Markov chain Monte Carlo application - such as 'r', the rejection rate
for Metropolis or Hybrid Monte Carlo updates.  The 'dist' module also
defines some quantities, and a large number of quantities are defined
for neural networks and Gaussian processes - for example, 'b', the
average squared error on the training set, and 'n', the current value
of the noise standard deviation (for a regression model) - and for
mixture models - for example, 'Cn', the total probability for the n
largest components in the mixture.  For details, see quantities.doc,
mc-quantities.doc, dist-quantities.doc, net-quantities.doc,
gp-quantities.doc, and mix-quantities.doc.