NET: Bayesian inference for neural networks using Markov chain Monte Carlo. The 'net' programs implement Bayesian inference for models based on multilayer perceptron networks using Markov chain Monte Carlo methods. For full details, see net-models.PDF. Here is a briefer summary. The networks handled have connections from a set of real-valued input units to each of zero or more layers of real-valued hidden units. Each hidden layer (except the last) has connections to the next hidden layer. The output layer has connections from the input layer and from the hidden layers. Non-sequential cnnections, between hidden layers that aren't ajacent, are also possible. The number of hidden layers is currently limited to fifteen. This architecture is diagramed below, for a network with three hidden layers: ----------------------- | Input Units | ----------------------- | | ---------------------------------------- | | | | | v | | | ------------------ | | | | Hidden layer 0 | | | | ------------------ | | | | | | | | | | | --------- | | | | | | | | | | | v v | | | | ------------------ | | | | | Hidden layer 1 | | | | | ------------------ | | | | | | | | | | | ---------- | | | | | | | | | | | v v | | | | ------------------ | | ------------+------>| Hidden layer 2 | | | | ------------------ | | | | | | | --------------- | | ----------------------------- | | ----------------------------------------- | | | | | | | v v v v ----------------------- | Output Units | ----------------------- Any of the connection groups shown above may be absent, which is the same as their weights all being zero. The number of non-sequential connections between hidden layers, such as the connection from hidden layer 0 to hidden layer 2 above, is limited (to sixteen at present). Layers are by default fully connected to units in the layers feeding into them. However, connections between layers may have their weight configurations specified by a configuration file, as described in net-config.doc. This allows for sparse connections, for connections with shared weights, and in particular for convolutional connections. The hidden units may use the 'tanh' activation function, the 'softplus' activation function, or the identity activation function. Nominally, the output units are real-valued and use the identity activation function, but discrete outputs and non-linearities may be obtained in effect with some data models (see below). Each hidden and output unit has a "bias" that is added to its other inputs before the activation function is applied. Each input and hidden unit has an "offset" that is added to its output after the activation function is applied (or just to the specified input value, for input units). Like connections, biases and offsets may also be absent if desired. Biases may also have a configuration (eg, with sharing) specified by a configuration file. A hierarchical scheme of prior distributions is used for the weights, biases, offsets and gains in a network, in which the priors for all parameters of one class can be coupled. For fully-connected layers, these priors can also be scaled in accord with the number of units in the source layer, in a way that is intended to produce a reasonable limit as the number of units in each hidden layer goes to infinity. Networks with this architecture can also be defined that behave reasonably as the number of hidden layers goes to infinity. A data model may be defined that relates the values of the output units for given inputs to the probability distribution of the data observed in conjunction with these inputs in a training or test case. Targets may be missing for some cases (written as "?"), in which case they are ignored when computing the likelihood (as is appropriate if they are "missing at random"). Copyright (c) 1995-2021 by Radford M. Neal