```

GP-SPEC:  Specify a Gaussian process model, or display existing spec.

Gp-spec creates a log file containing a specification of a Gaussian
process model and the associated priors over hyperparameters.  When
invoked with just a log file as argument, it displays the
specifications of the Gaussian process model stored in that log file.

Usage:

gp-spec log-file N-inputs N-outputs
[ const-part [ linear-part [ jitter-part ] ] ] { flag } [ spread ]
{ / scale-prior relevance-prior [ power ] { flag } }

or:

gp-spec log-file

N-inputs and N-outputs are the numbers of input variables and output
variables in the model - ie, the dimensionality of the domain and the
range of the functions that the Gaussian process defines a
distribution over.  For the Gaussian processes allowed here, the
functions from the inputs to the various output variables are
independent, given particular values for the hyperparameters.  The
covariance functions for the each output all have the same form, and
share the same hyperparameters.

The const-part argument gives the prior for the hyperparameter
controlling the constant part of the covariance function, in the form
described in prior.doc.  A value of c for this hyperparameter adds a
term of c^2 to the covariance between all pairs of input points.  The
prior for this hyperparameter can have only one level.  If this
argument is "-", or missing altogether, the covariance function will
not have a constant part.

The linear-part argument gives the prior for the hyperparameters, s_i,
controlling the linear part of the covariance function.  This part of
the covariance between inputs x and x' is

SUM_i x_i x'_i s_i^2

This prior can have up to two levels, allowing for a common
hyperparameter that controls the priors for the coefficients, s_i,
associated with the various inputs.  The "x" option may be used in
order to have the width of this prior scale with the number of inputs,
as described in prior.doc.  The linear part may be "-", or missing, in
which case the covariance function will not have a linear part.

The jitter-part argument gives the prior for a hyperparameter whose
square is added to the covariance of a training or test case with
itself.  Such a contribution may be desirable for modeling reasons, or
in order to improve the numerical methods.  If a regression model with
noise is being used, it is usually not necessary to include such a
contribution to the covariance; if jitter-part is included, it
effectively adds to the noise level (this could be useful if you want
to constrain the noise to be at least some amount).  For a
classification model, a jitter-part should usually be included, as
otherwise the updates of the underlying function values may be very
slow, and numerical problems can arise.  If the jitter-part is "-" or
missing, it is taken to be zero.

Zero or more additional terms in the covariance function may be
specified using further groups of arguments.  A group for which
"power" is absent or positive results in a term in the expression for
the covariance between a particular output at inputs x and x' that has
the form:

v^2 * exp( - SUM_i (w_i |x_i - x'_i|)^R )

The power, R, in this expression must be in the interval (0,2].  It is
given by the last argument in a group, with the default being R=2.
The first argument in the group gives the prior for the scale
hyperparameter, v, which must have only one level.  The prior for the
w_i, which determine the relevance of the various inputs, can have up
to two levels, allowing for a common hyperparameter and for
hyperparameters for each input, i.  Here again, the "x" option may be
used to automatically scale the prior.

A group can also have a "power" of -1, in which case it produces a
term in the covariance function of the form

v^2 / PROD_i (1 + w_i^2 (x_i - x'_i)^2 )

The v and w hyperparameters play the same roles as described above.

Optional flags may be appended to the specifications of the linear
and other parts of the covariance.  They have the form:

<flag>[:[-]<number>{,<number>}]

where <flag> is one of the flag names below, and <number> is the
number of an input or output (starting with 1) that the flag applies
to.  If the "-" is present, the flag applies to all inputs EXCEPT
those mentioned.  If only the flag name is given, it applies to all
inputs.  The possible flags are as follows:

delta     Use a "delta" distance for these inputs, in which the
distance is 0 if x_i=x'_i and 1 otherwise.  Not allowed
for the linear part of the covariance.

omit      Ignore these inputs when computing the covariance
(ie, don't include them in the sum above).

drop      Do not include this term of the covariance for the
listed outputs.

spread    Spread out the relevance parameters for these inputs.
This is presently allowed only for the linear part, and
causes this term in the covariance function to become

SUM_i x_i x'_i SUM_j s_j^2

The sum over j includes only i when i is not marked for
spreading.  When i is being spread, j includes all
indexes from i-spread to i+spread that are marked for
must be specified after the flags.

mulprod   For each input listed, multiply this term in the covariance
for two cases by the product of the values of that input
for the two cases.  Note that this is done even for inputs
that are omitted, as long as they are in the mulprod list.
Not allowed for the linear part of the covariance.

The "delta" flag is useful when the inputs are categorical.  The
"omit" flag is useful in setting up additive models, in which
different covariance parts correspond to additive components, with
each component looking at only a subset of the inputs.  The "drop"
flag is mostly useful when setting up models in which one target is
explained as the log of the sum of the exponentials of several
outputs, which one may wish to have different covariance functions.
When "drop" appears after the const/linear/jitter arguments, it
applies to all of these terms.  The "spread" flag is useful for data
such as spectra where nearby inputs are likely to be of similar
relevance.  The "mulprod" flag has the effect of producing a
quasi-linear term in the model for which the regression coefficient is
not a constant, but instead varies with the inputs according to a
Gaussian process with the specified covariance function.

Note that the prior for the const-part corresponds closely to the
prior for output biases in neural network models, and that the prior
for the linear-part corresponds to the prior for input-output weights
in a network.  The priors for v and w_i correspond to the priors in
neural network models on the hidden-output and the input-hidden
weights.  (The neural network priors can go to more levels, however.)

To use a Gaussian process to model data, additional information must
be specified as well, as described in model-spec.doc.  See also gp.doc
for general information on how the outputs of the Gaussian process are
used to define the model likelihood.  Depending on the model used,
state information in addition to the hyperparameter values for the
Gaussian process may need to be kept (case-specific function values
and/or case-specific noise variances).

Copyright (c) 1995-2004 by Radford M. Neal
```