NET-MC: Do Markov chain simulation to sample networks. The net-mc program is the specialization of xxx-mc to the task of sampling from the posterior distribution for a neural network model, or from the prior distribution, if no training set is specified. See xxx-mc.doc for the generic features of this program. Computaton of the network model log likelihood and its gradient may be done on a GPU (except for survival models), if a version of net-mc compiled for a GPU is used. The following applications-specific sampling procedures are implemented: sample-hyper [ group ] Does Gibbs sampling for the hyperparameters controlling the distributions of parameters (weights, biases, etc.). If a group is specified, only the hyperparameters pertaining to that group of parameters are updated. Groups are numbered from 1, as in the output of net-display. sample-noise Does Gibbs sampling for the noise variances. sample-lower-hyper Does Gibbs sampling for all lower-level hyperparameters. rgrid-upper-hyper [ stepsize ] Does random-grid Metropolis updates (one at a time) for the logs of all upper-level hyperparameters, in "precision" form. The default stepsize is 0.1. (Does nothing for uppermost hyperparameters that don't control hyperparameters one level down.) sample-lower-noise Does Gibbs sampling for all lower-level noise variances. rgrid-upper-noise [ stepsize ] Does random-grid Metropolis updates (one at a time) for the logs of all upper-level hyperameters controlling noise variances, in "precison" form. The default stepsize is 0.1. (Does nothing if the uppermost noise hyperparameter doesn't control noise hyperparameters one level down.) sample-sigmas Does the equivalent of both sample-hyper and sample-noise. sample-lower-sigmas Does the equivalent of both sample-lower-hyper and sample-lower-noise. rgrid-upper-sigmas Does the equivalent of both rgrid-upper-hyper and rgrid-upper-noise. An "upper-level" hyperparameter is one that controls the distribution of lower-level hyperparameters or noise variances (which may be either explicit or implicit). The "lower-level" hyperparameters directly control the distributions of weights. Looked at another way, the lower-level hyperparameters are the ones at the bottom level of the hierarchy, or for which all lower-level hyperparameters have degenerate distributions concentrated on the value of the higher-level hyperparameter. The random grid metropolis updates done with the above commands record information (eg, rejection rate) for later display in the same way as generic rgrid-met-1 updates. When coupling is being done, upper-level hyperparameters should be updated only with random-grid updates; Gibbs sampling for these upper-level hyperparamters will desynchronize the random number streams (because of the way it is implemented using ARS), preventing coalescence. Lower-level hyperparameters can be updated with Gibbs sampling, and they will exactly coalesce once the parameters they control and the upper-level hyperparameters that control them have exactly coalesced. Default stepsizes for updates of parameters (weights, biases, etc.) by the generic Markov chain operations are set by a heuristic procedure that is described in net-models.PDF. Tempering methods and Annealed Importance sampling are supported. The effect of running at an inverse temperature other than one is to multiply the likelihood part of the energy by that amount. At inverse temperature zero, the distribution is simply the prior for the hyperparameters and weights. The marginal likelihood for a model can be found using Annealed Importance Sampling, since the log likelihood part of the energy has all the appropriate normalizing constants. Copyright (c) 1995-2022 by Radford M. Neal