OVERVIEW OF THE SOFTWARE This software is being distributed primarily to further research in Bayesian learning for models based on neural networks and Gaussian processes. The software is designed for potentially wider use, however. In particular, the programs and modules in the 'util' directory are of general utility, and those in the 'mc' directory provide generic support for Markov chain Monte Carlo methods. These facilities are specialized to neural network learning by the modules and programs in the 'net' directory, to Gaussian process models by the modules and programs in the 'gp' directory, and to finite and infinite mixture modesls by the modules and programs in the 'mix' directory. The 'bvg' directory demonstrates in a simple context how the generic facilities in 'util' and 'mc' can be specialized for other tasks, but users interested only in the neural network, Gaussian process, or mixture models need not concern themselves with this. This section provides an overview of the facilities offered by these various components of the software. Log files All the programs make use of a "log file" facility supported by modules and programs in 'util'. A log file records all the information pertaining to a "run" of an iterative program. The first few records of the log file (with "indexes" of -1) contain the specifications for the run (such as the network architecture and the source of training data). These records are written by "spec" programs (eg, 'net-spec' and 'data-spec') that the user invokes at the beginning of the run. Once the run has been specified, the program that performs iterations is invoked (eg, 'net-mc'). This program will append further records to the log file, one for each iteration for which the user has asked the state to be saved, which will usually be every iteration, unless minimizing disk usage is a concern. Each record written has the iteration number as its index, and contains the complete state of the program at that time (eg, all the parameters and hyperparameters of the network being trained). Note that log files contain binary data; they are not human-readable. After an iterative program finishes, the user may decide to let the run continue for more iterations. This is easily done by just invoking the program again with a larger iteration limit, whereupon it restarts using the last state stored in the log file, and then appends records to the log file for further iterations. The information about iterations that is stored in the log file can be examined using various programs both during and after a run. In particular, the user can plot the progress of various quantities during the course of the run, without having to decide beforehand which quantities will be of interest. The states saved at various iterations are also the basis for making Monte Carlo estimates, and in particular, for making Bayesian predictions based on a sample of networks or Gaussian processes from the posterior distribution. Models and data The 'util' directory also contains modules and programs that specify the final portion of a probabilistic model (which is independent of the details of networks or other functional schemes), that support reading of numeric input from data files or other sources, and that specify sets of training and test cases for supervised learning procedures (such as those based on multilayer perceptron networks). The models supported include those for regression, classification, probability density estimation, and survival analysis. The survial analysis models were recently implemented, and should be regarded as experimental. See model-spec.doc for details The data files used must contain numbers in standard ASCII form, with one line per case, but there is considerable freedom regarding separators and in the ordering of items. "Input" and "target" items that pertain to a case may come from the same file, or different files, and the position within a line of each item may be specified independently. The set of cases (lines) to be used for training or testing can be specified to be a subset of all the lines in a file. The data source can also be specified to be the output of a program, rather than a data file. Specifications for where the training and test data comes from are written to a log file by the 'data-spec' program, which also allows the user to specify that certain transformations are to be done to the data items before they are used. In particular, the data can be translated and re-scaled in a user-specified way, or by amounts that are automatically determined from the training data. The source of "test" data can also be specified explicitly by arguments to the relevant commands, allowing the final results of learning to be applied to any data set for which predictions are desired. Note that mixture models can presently be used only for data in which the number of "input" items is zero. The test data is currently ignored for mixture models, since no prediction program has yet been implemented. See data-spec.doc for futher details on how to specify the data format. Random number generation A scheme for combining real and pseudo random numbers is implemented by modules in the 'util' directory, along with procedures for sampling from various standard distributions, and for saving the state of the random number generator. The 'rand-seed' program is used to specify a random number seed to use for a run. The state of the random number generator is saved with each iteration in the log file in order to ensure that resuming a run produces the same results as if the run had continued without stopping. Markov chain Monte Carlo The 'mc' directory contains modules and programs that support the use of Markov chain Monte Carlo methods. A Markov chain Monte Carlo application is created by adding modules that compute certain application-specific quantities, of which the most central is the probability distribution to sample from. For example, the neural network application provides a procedure for computing the posterior probability density of the network parameters. An application may also provide implementations of specialized sampling procedures, such as the procedures for doing Gibbs sampling for hyperparameters in the neural network application. A variety of Markov chain methods are supported by the 'mc' system, including some that are not of much use in the neural network and Gaussian process applications. In particular, the "tempering" methods are not currently implemented for the neural networks or Gaussian processes, though they may be in future. Users interested only in neural networks or Gaussian processes should therefore ignore the tempering facilities (such as the 'mc-temp-sched' and 'mc-temp-filter' programs). For the user of neural network, Gaussian process, and mixture models, the most important 'mc' program is 'mc-spec', which is used to specify how the Markov chain sampling is to be done. There are a large number of reasonable ways of sampling for neural networks or Gaussian processes. The best way is still the subject of research. Good results can be obtained using several standard approaches, however, as described in the examples in other sections of this documentation. You can also read all about the various methods in mc-spec.doc. Note, however, that for each model there are also sampling methods specific to that model alone, which are documented in net-mc.doc, gp-mc.doc, and mix-mc.doc. At present, sampling for mixture models is done only by the specific procedures described in mix-mc.doc. Neural network models The 'net' directory contains the modules and programs that implement Bayesian learning for models based on multilayer perceptron networks, making use of the modules in the 'util' and 'mc' directories. The networks and data models supported are as described in my book, Bayesian Learning for Neural Networks, with the addition of experimental models for survival analysis. A network training run is started with the 'net-spec' program, which creates a log file to which it writes specifications for the network architecture and priors. In a simple run, the 'model-spec', 'data-spec' and 'mc-spec' programs would then be used to specify the way the outputs of the network are used to model the targets in the dataset, what data makes up the training set (and perhaps the test set), and the way the sampling should be done. The 'net-mc' program (a specialization of the generic 'xxx-mc' program) would then be invoked to do the actual sampling. Finally, the 'net-pred' program would be used to make predictions for test cases based on the networks saved in the log file. Usually, one would want to see how the run had gone before making predictions. The 'net-display' program allows one to examine the network parameters and hyperparameters at any specified iteration. The 'net-plt' program can be used to obtain the values of various quantities, such as the training set error, for some range of iterations. The output of 'net-plt' would usually be piped to a suitable plot program for visual examination, though it is also possible to directly look at the numbers. Several other programs are also present in the 'net' directory. Some of these will probably not be of interest to the ordinary user, as they were written for debugging purposes, or to do specialized tasks relating to my thesis. Gaussian process models The 'gp' directory contains the modules and programs that implement Bayesian inference for Gaussian process models, making use of the modules in the 'util' and 'mc' directories. These Gaussian process programs are analogous to the neural network programs. The models based on Gaussian processes are also similar to models based on large neural networks using Gaussian priors (or other priors with finite variance). To start, the 'gp-spec' program is used to specify a Gaussian process model - that is, to specify the form of the covariance function, and the priors on the hyperparameters that control this covariance function. The 'model-spec' and 'data-spec' programs are then be used to specify how the Gaussian process is used to model data, and the source of the training data (and possibly test data). The Markov chain sampling method is then specified using 'mc-spec', and sampling is done using 'gp-mc'. Finally, 'gp-pred' is used to make predictions for test cases using the Gaussian processes that were saved in the log file by 'gp-mc'. The 'gp-display' and 'gp-plt' programs can be used to view the Gaussian processes generated by 'gp-mc', both during and after the run. Several other programs in the 'gp' directory may also be of interest. Mixture models. The 'mix' directory contains the modules and programs that implement Bayesian inference for finite and infinite mixture models, making use of the modules in the 'util' and 'mc' directories. These models are used to model the probabilities for vectors of binary data, or the probability densities for real vectors. The infinite mixture models are equivalant to what are called Dirichlet process mixtures. The 'mix-spec' program is used to specify a mixture model - that is, to say how many components there are in the mixture (perhaps countably infinite), and to specify the priors on the parameters and hyperparameters. The 'model-spec' and 'data-spec' programs are then used to complete the specification of the data model and data source. For mixture models, the data items are all considered to be "targets", with the number of "inputs" being zero. The Markov chain sampling procedures to used are then specified using 'mix-spec', and the sampling is done with 'mix-mc'. At present, there is not all that much that can be done with the results - in particular, there is no prediction program. One can look at the parameter values for states drawn from the posterior with 'mix-display', however, and generate future datasets using 'mix-cases'. Quantities obtainable from log files The 'xxx-plt' programs (eg, 'net-plt', 'gp-plt', and 'mix-plt') are the principal means by which simulation runs are monitored. These programs allow one to see the values of various "quantities", evaluated for each iteration stored in a log file within some range. Some other programs (eg, 'xxx-hist') also use the same set of quantities. A quantity is specified by an identifying character, perhaps with a numeric modifier. Some quantities are single numeric values (scalars); others are arrays of values, in which case the desired range of values is also specified following an "@" sign. Some quantities can be either scalars or arrays, depending on whether a range specification is included. There is a hierarchy of quantities, as defined by modules at different levels. A few quantities are universally defined - principally 't', the index of the current iteration. Many more are defined for any Markov chain Monte Carlo application - such as 'r', the rejection rate for Metropolis or Hybrid Monte Carlo updates. A large number of quantities specific to neural networks or to Gaussian processes are also defined - for example, 'b', the average squared error on the training set, and 'n', the current value of the noise standard deviation (for a regression model). Quantities specific to mixture models are also defined, such as 'Cn', the total probability for the n largest components in the mixture. For details, see quantities.doc, mc-quantities.doc, net-quantities.doc, gp-quantities.doc, and mix-quantities.doc.