GP-SPEC: Specify a Gaussian process model, or display existing spec. Gp-spec creates a log file containing a specification of a Gaussian process model and the associated priors over hyperparameters. When invoked with just a log file as argument, it displays the specifications of the Gaussian process model stored in that log file. Usage: gp-spec log-file N-inputs N-outputs [ const-part [ linear-part [ jitter-part ] ] ] { flag } [ spread ] { / scale-prior relevance-prior [ power ] { flag } } or: gp-spec log-file N-inputs and N-outputs are the numbers of input variables and output variables in the model - ie, the dimensionality of the domain and the range of the functions that the Gaussian process defines a distribution over. For the Gaussian processes allowed here, the functions from the inputs to the various output variables are independent, given particular values for the hyperparameters. The covariance functions for the each output all have the same form, and share the same hyperparameters. The const-part argument gives the prior for the hyperparameter controlling the constant part of the covariance function, in the form described in prior.doc. A value of c for this hyperparameter adds a term of c^2 to the covariance between all pairs of input points. The prior for this hyperparameter can have only one level. If this argument is "-", or missing altogether, the covariance function will not have a constant part. The linear-part argument gives the prior for the hyperparameters, s_i, controlling the linear part of the covariance function. This part of the covariance between inputs x and x' is SUM_i x_i x'_i s_i^2 This prior can have up to two levels, allowing for a common hyperparameter that controls the priors for the coefficients, s_i, associated with the various inputs. The "x" option may be used in order to have the width of this prior scale with the number of inputs, as described in prior.doc. The linear part may be "-", or missing, in which case the covariance function will not have a linear part. The jitter-part argument gives the prior for a hyperparameter whose square is added to the covariance of a training or test case with itself. Such a contribution may be desirable for modeling reasons, or in order to improve the numerical methods. If a regression model with noise is being used, it is usually not necessary to include such a contribution to the covariance; if jitter-part is included, it effectively adds to the noise level (this could be useful if you want to constrain the noise to be at least some amount). For a classification model, a jitter-part should usually be included, as otherwise the updates of the underlying function values may be very slow, and numerical problems can arise. If the jitter-part is "-" or missing, it is taken to be zero. Zero or more additional terms in the covariance function may be specified using further groups of arguments. A group for which "power" is absent or positive results in a term in the expression for the covariance between a particular output at inputs x and x' that has the form: v^2 * exp( - SUM_i (w_i |x_i - x'_i|)^R ) The power, R, in this expression must be in the interval (0,2]. It is given by the last argument in a group, with the default being R=2. The first argument in the group gives the prior for the scale hyperparameter, v, which must have only one level. The prior for the w_i, which determine the relevance of the various inputs, can have up to two levels, allowing for a common hyperparameter and for hyperparameters for each input, i. Here again, the "x" option may be used to automatically scale the prior. A group can also have a "power" of -1, in which case it produces a term in the covariance function of the form v^2 / PROD_i (1 + w_i^2 (x_i - x'_i)^2 ) The v and w hyperparameters play the same roles as described above. Optional flags may be appended to the specifications of the linear and other parts of the covariance. They have the form: <flag>[:[-]<number>{,<number>}] where <flag> is one of the flag names below, and <number> is the number of an input or output (starting with 1) that the flag applies to. If the "-" is present, the flag applies to all inputs EXCEPT those mentioned. If only the flag name is given, it applies to all inputs. The possible flags are as follows: delta Use a "delta" distance for these inputs, in which the distance is 0 if x_i=x'_i and 1 otherwise. Not allowed for the linear part of the covariance. omit Ignore these inputs when computing the covariance (ie, don't include them in the sum above). drop Do not include this term of the covariance for the listed outputs. spread Spread out the relevance parameters for these inputs. This is presently allowed only for the linear part, and causes this term in the covariance function to become SUM_i x_i x'_i SUM_j s_j^2 The sum over j includes only i when i is not marked for spreading. When i is being spread, j includes all indexes from i-spread to i+spread that are marked for spreading. Here, "spread" is the spreading width, which must be specified after the flags. mulprod For each input listed, multiply this term in the covariance for two cases by the product of the values of that input for the two cases. Note that this is done even for inputs that are omitted, as long as they are in the mulprod list. Not allowed for the linear part of the covariance. The "delta" flag is useful when the inputs are categorical. The "omit" flag is useful in setting up additive models, in which different covariance parts correspond to additive components, with each component looking at only a subset of the inputs. The "drop" flag is mostly useful when setting up models in which one target is explained as the log of the sum of the exponentials of several outputs, which one may wish to have different covariance functions. When "drop" appears after the const/linear/jitter arguments, it applies to all of these terms. The "spread" flag is useful for data such as spectra where nearby inputs are likely to be of similar relevance. The "mulprod" flag has the effect of producing a quasi-linear term in the model for which the regression coefficient is not a constant, but instead varies with the inputs according to a Gaussian process with the specified covariance function. Note that the prior for the const-part corresponds closely to the prior for output biases in neural network models, and that the prior for the linear-part corresponds to the prior for input-output weights in a network. The priors for v and w_i correspond to the priors in neural network models on the hidden-output and the input-hidden weights. (The neural network priors can go to more levels, however.) To use a Gaussian process to model data, additional information must be specified as well, as described in model-spec.doc. See also gp.doc for general information on how the outputs of the Gaussian process are used to define the model likelihood. Depending on the model used, state information in addition to the hyperparameter values for the Gaussian process may need to be kept (case-specific function values and/or case-specific noise variances). Copyright (c) 1995-2004 by Radford M. Neal