NOTES ON THE VERSION OF 2022-04-21 This version has some substantial new features for neural network models, and some important performance improvements for neural networks, including support for computation on GPUs. The tutorial examples for neural network models have also been extended. Note: Log files produced by earlier versions cannot be read by this version of the software. There are also some incompatible changes in command syntax (see feature changes (3), (13), and (14) below). Documentation changes in this version. 1) Detailed documentation on the neural network models and their implementation is now provided in doc/net-models.PDF. Some references to the models and Markov chain methods used are now listed in References.doc. 2) All the tutorial examples have been updated, with timing results now given for a modern processor. Some examples are now accompanied by plots, supplied as PNG files. 3) The neural network examples for a simple regression model, in Ex-netgp-r.doc, have been extended to illustrate some general issues with Markov chain Monte Carlo methods and Bayesian inference. 4) There is a new example of image classification with neural network models, including use of a convolutational layer, in Ex-image.doc. Feature changes in the version: 1) For neural network models, the configuration of the connections and their weights into a hidden or output layer, from the input layer or a previous hidden layer, as well as the hidden or output unit biases, may now be set up explicitly, rather than (as previously) layers always being fully-connected, without weight sharing. In particular, this allows for specification of models with convolutional layers. See net-spec.doc, net-config.doc, and net-config-check.doc for details, as well as the tutorial example in Ex-image.doc. 2) Non-sequential hidden layer connections are now supported - ie, hidden layers can now connect to later hidden layers other than their immediate successor. The number of such non-sequential connections is limited to 16. (Note that non-sequential connections from inputs and to outputs were already allowed.) 3) Hidden layers may now use the 'softplus' activation function, given by h(u) = log(1+exp(u)), in addition to the previous options of 'tanh' or 'identity'. There is also a 'softplus0' activation function, which is 'softplus' shifted down to have output zero for input zero. However, the previous option of 'sin' as an activation function has been removed, since the implementation no longer saves the summed input to hidden units (before the activation function is applied), so the activation function must have a derivative computable from its value (which sin does not). See net-spec.doc for how to specify the activation function for a hidden layer. 4) The sample-hyper operation for neural network models can now take an argument in order to restrict the updates to the hyperparameters controlling a single group of parameters. This may be useful in the initial stages of sampling, to avoid some hyperparameters taking on bad values when the data has not yet been fit well. 5) The net-eval program can now optionally display the values of hidden units in some layer, instead of the final output. 6) The data-spec command now has an optional -e argument that causes the training and test inputs and targets read to be echoed to standard output. This can be useful in checking that the data source is specified correctly. 7) New C0, C1, and Cn for n>1 quantities are now defined, to help assess how well metropolis and hybrid updates are exploring the distribution. (But note that these are masked for mixture models, where they have another meaning.) 8) The net-gen program has been extended to allow more flexibility in how parameters are set, including from standard input, rather than to zero, or randomly. See net-gen.doc for details. 9) The xxx-tbl commands can now take a -f argument, which causes them to continually follow iterations being added to the last log file specified, rather than finishing when EOF is encountered. This is useful if the output is piped to a plotting program that is capable of following continuing input, updating the plot in real time, and also when the output is just shown in a terminal window in order to monitor the run. 10) The net-spec program now has a "sizes" option for displaying the number of parameters in each group. See net-spec.doc for details. 11) The xxx-grad-test programs (eg, net-grad-test) can now optionally display only the computed gradient, or only the energy, or can be restricted to display and check against the result from finite differences only a single component of the gradient. (This is useful since checking the full gradient can be very slow for high-dimensional models.) See xxx-grad-test.doc for details. 12) The net-display program now allows for a -P or -H option, for displaying unadorned high-precision parameter or hyperparameter values. 13) The "omit" option in net-spec should now be placed after the corresponding prior on connections to inputs, rather than after the size of the hidden or output layer they connect to (as before). 14) The old (positional) syntax for net-spec is no longer allowed. It would not have supported the new feature of non-sequential hidden layer connections. 15) The maximum number of hidden layers in a neural network is now 15 (up from 7 before). 16) The maximum number of iterations that can be used when making predictions with the median has been increased from 200 to 1000. 17) Specifying use of a symmetric sequence of approximations in a leapfrog trajectory specification by giving a negative value for N-approx is now documented in mc-spec.doc. (It had previously been implemented but not documented.) Performance changes in this version: 1) Forward / backward / gradient computations for neural networks have been sped up, by rewriting the portable code to encourage the compiler to use vector instructions, and by (optionally) using specially-written code to exploit SIMD and FMA instructions, when available (SSE2, SSE3, SSE4.2, AVX, and AVX2 are supported). 2) Computations for neural networks have also been sped up by using the SLEEF library for vectorized mathematical functions (eg, tanh). 3) The precisons for parameter and unit values in neural network models may now be either double-precision (FP64), as previously, or single-precision (float, FP32). The default is single-precision, unless this is changed in the make-all script. Using lower precision typically speeds up the computations (but with some effect on the results). Note that for models other than neural networks arithmetic is still always done in double precision. 4) Computations for neural network models may now be (partially) done on a GPU (one that support CUDA with compute capability 3.5 or later). See Install.doc for how to do this. Note that for networks with few parameters, or that are trained on a small data set, the GPU version will not necessarily be faster. Many GPUs are also slow at double-precision computations. Note that GPU computation is not done for survival models. Bug fixes. 1) Fixed a bug in network function evaluation when input offsets are present and connections from some inputs are omitted. 2) Documentation for the 'plot' mc operation has been corrected. 3) Incorrect prior specification (not matching ccmds.net) corrected in Ex_netgp-c.doc (and Ex_netgp-c.html). 4) Fixed a bug in which an mc-spec command with no trajectory specification would not reset to the default trajectory spec if an earlier mc-spec had had a trajectory specification. Known bugs and other deficiencies. 1) The facility for plotting quantities using "plot" operations in xxx-mc doesn't always work for the first run of xxx-mc (before any iterations exist in the log file). A work-around is to do a run of xxx-mc to produce just one iteration before attempting a run of xxx-mc that does any "plot" operations. 2) The CPU time features (eg, the "k" quantity) will not work correctly if a single iteration takes more than about 71 minutes. 3) The latent value update operations for Gaussian processes may recompute the inverse covariance matrix even when an up-to-date version was computed for the previous Monte Carlo operation. 4) Covariance matrices are stored in full, even though they are symmetric, which sometimes costs a factor of two in memory usage. 5) Giving net-pred several log files that have different network architectures doesn't work, but an error message is not always produced (the results may just be nonsense). 6) Some Markov chain updates for Dirichlet diffusion tree models in which there is no data model (ie, no noise) are not implemented when some of the data is missing. An error message is produced in such cases.