```

SAMPLING FROM A UNIVARIATE NORMAL DISTRIBUTION

Markov chain method can be used to sample from a univariate normal
distribution.  This will illustrate the basic facilities for
specifying distributions, for specifying the Markov chain operations
to use, for looking at log files, and for estimating the expectations
of functions.

Specifying the distribution.

We begin by specifying the distribution we want to sample from.  This
is done by giving a formula for the "energy function", which is minus
the log of the probability density, plus any arbitrary constant.  To
sample from a univariate normal distribution for a variable called
"x", with mean 10 and variance 1, we can type the following command to
the Unix command interpreter (the shell):

> dist-spec nlog "(x-10)^2/2"

Here, "nlog" is the name of a "log file", in which this specification
is saved, and in which the results of sampling will later be stored.
The energy formula "(x-10)^2/2" is minus the log of the probability
density for the desired distribution over "x", with the constant term
Log(2*Pi)/2 omitted, as it is not necessary for most purposes.  Such
formulas must usually be put in quotes, since some characters such as
parentheses will otherwise have special meaning to the shell.  See
formula.doc for the syntax of formulas, which is fairly conventional,
except perhaps that function names must be capitalized (eg, "Sin").

Since only a single variable, "x", is mentioned in the specification,
this is a univariate distribution.  We could have mentioned other
"state variables" in the specification, as in the example in the next
one of "u", "v", "w", "x", "y", or "z", which may optionally be
followed by a single digit.  See dist-spec.doc for further details.

The density functions for some distributions, including the normal,
are pre-defined.  We could have used the following command instead of
the one above:

> dist-spec nlog "Normal(x,10,1)"

The same result can also be obtained as follows:

> dist-spec nlog "x ~ Normal(10,1)"

This last form is particularly useful for specifying Bayesian models
(see Ex-bayes.doc).

After specifying the distribution, we specify what Markov chain
operations should be used to sample from this distribution, using a
command such as:

> mc-spec nlog metropolis 1

This specifies that each Markov chain iteration should consist of a
single Metropolis operation, with a "stepsize" of 1.  In a Metropolis
operation, a new state is proposed by randomly drawing from the normal
distribution centred at the current state, with standard deviation
given by the stepsize.  This proposed state is then accepted or
rejected based on the change in the energy (ie, on the change in
probability density).  If the proposed state is rejected, the new
state is the same as the old state.

This Markov chain specification is saved in the log file, after the
distribution specification.  We could also specify an initial state
for the Markov chain (see dist-initial.doc), but here we will let the
initial state default to x=0.  To actually do some Markov chain
iterations, we use a command such as:

> dist-mc nlog 1000

This performs 1000 Markov chain updates, as specified by the last
mc-spec command, and saves the state after each update in the log
file.  This takes only a fraction of a second, but Markov chain
sampling runs for more difficult problems can take much longer, so one
would often wish to run the 'dist-mc' command in the background, by
putting an "&" at the end of the command line.

If we later decided that we wanted to continue Markov chain sampling
for more iterations, we could just use another 'dist-mc' command with
a larger iteration limit.  For example, the command

> dist-mc nlog 10000

would produce another 9000 iterations, for a total of 10000.  We could
issue another mc-spec command before this, if we wished to use
different Markov chain operations for these further iterations.

Checking how well the sampling worked.

After or during the Markov chain sampling, we can look at how the
state has changing during the run, and at certain properties of the
Markov chain methods.

The 'dist-display' command lets us see the state at any given
iteration.  For example, the state at iteration 10 of the Markov chain
run above might look like this:

> dist-display nlog 10

STATE VARIABLES IN FILE "nlog" WITH INDEX 10

x = 2.53664

However, what you see might not be exactly the same as this, due to
differences in random number generators or in floating-point roundoff
errors.  If the iteration number is omitted, 'dist-display' shows the
last iteration.

The 'dist-plt' command is usually more useful in getting a picture of
how well the chain is sampling.  The following command will show how
the state changes over the course of the run:

> dist-plt t x nlog | plot

If 'plot' is an appropriate plotting program, this will display a
graph of the state variable "x" vs. the iteration number (the "t"
quantity).  Such a plot (produced with my version of 'graph') from a
run of this sort can be seen in nplt-x.png.  From this plot, you will
likely see that up to about iteration 50, the values of "x" are not
typical of those seen later in the run.  These early iterations should
be discarded when estimating functions of state (see below).

After iteration 50, the chain seems to move around the region that has
high probability fairly rapidly.  It should therefore be possible to
estimate expectations of functions of the state with reasonable
accuracy.  If instead, the chain moved very slowly, it would be
necessary to run it for many more iterations, or to use better Markov
chain operations.

Other interesting quantities can also be plotted using 'dist-plt'.
For instance, the energy can be monitored with the following command:

> dist-plt t E nlog | plot

The change in energy on the basis of which the Metropolis proposals
were accepted or rejected can be examined as follows:

> dist-plt t D nlog | plot-points

Here, it is best if the plot program used is set up to plot individual
points, rather than lines.  For this chain, the energy difference is
often less than one, so many proposals will be accepted.  If instead
the energy change was almost always large, it would be necessary to
reduce the "stepsize" argument following the "metropolis" operation in
'mc-spec'.  On the other hand, if the energy change is usually very
close to zero, a larger stepsize would produce better sampling.

Other quantities that can be plotted are documented in quantities.doc,
mc-quantities.doc, and dist-quantities.doc.

Estimating expectations of functions.

Finally, we can use the states from the Markov chain to estimate the
expectations of functions of this state.  Since we decided above that
the first 50 iterations were not typical of the chain's equilibrium
distribution, we will use only states after these when estimating
expectations.

The 'dist-est' command is one way of estimating expectations.  The
following command estimates the expectation of the "x" itself:

> dist-est x nlog 51:

Number of sample points: 950

Estimates for x:

Mean:    9.81553  (standard error 0.0315052)
Std.dev: 0.971056

NOTE: The standard errors assume points are independent

The arguments to 'dist-est' are a formula for what we want to
estimate, the log file for the run, and the range of iterations to use
from that log file (here, from 51 on).  The output (which might be a
bit different when you run the programs) gives the estimated
expectation (mean) for the requested function of state, along with its
estimated standard deviation.

A standard error is also given for the estimated mean, but it is valid
only if the points are independent, which is generally not true if
they were obtained using Markov chain sampling.  In the output above,
the estimate differs from the true mean of 10 by almost six times the
standard error, which illustrates that the standard error cannot be
trusted when the points are not independent.

Ideally, the 'dist-est' program would automatically compensate for
this lack of independence, and give correct standard errors, but it
doesn't yet.  If you are interested in the expectation of a state
variable, however, you can get correct standard errors using the
'series' program, which adjusts the standard errors based on estimated
autocorrelations, if told how far out to look.  The following command
illustrates the procedure:

> dist-tbl x nlog 51: | series msac 15

Number of realizations: 1  Total points: 950

Mean: 9.815530  S.E. from correlations: 0.091114

Standard deviation: 0.971056

Lag  Autocorr.  Cum. Corr.

1   0.736905    2.473811
2   0.542432    3.558675
3   0.415767    4.390208
4   0.317866    5.025941
5   0.248658    5.523257
6   0.225521    5.974299
7   0.195526    6.365350
8   0.186829    6.739008
9   0.160685    7.060378
10   0.159264    7.378906
11   0.148583    7.676073
12   0.134523    7.945119
13   0.115941    8.177001
14   0.072718    8.322437
15   0.040891    8.404219

The 'dist-tbl' command produces a list of values for "x" at iterations
from 51 on.  This list is piped into the 'series' program.  With the
options "msac 15", the 'series' program calculates estimates for the
mean and standard deviation, which match those calculated above by
'dist-est'.  It also estimates the autocorrelations for "x" at lags up
to 15.  From these autocorrelation estimates, 'series' finds the
"cumulative correlations" at the various lags, which are defined as
one plus twice the sum of the autocorrelations up to that lag.  If the
autocorrelations from that point on are approximately zero, as seems
to be the case above, the cumulative correlations to that point will
approximate the autocorrelation time for the quantity, which is the
factor by which the effective sample size is less than the number of
points used.  In calculating the standard error for the mean, 'series'
assumes that this is the case, and it therefore divides the sample
size by the cumulative correlation for the last lag requested when
computing the standard error.

In the example above, the standard error calculated by 'series'
appears to be realistic.  This will be true only if the maximum lag
given as an argument to series is appropriate, however.  This maximum
lag should be large enough that the autocorrelations at larger lags
are close to zero, but not too much larger than this, since including
autocorrelations at many lags introduces extra noise into the estimate
of the autocorrelation time.

A different way to get a standard error that accounts for correlations
is to divide the run (once equilibrium is reached) into "batches" of
iterations, and look at the variance of the batch means.  If the size
of a batch is several times the lag past which autocorrelations are
nearly zero, the batch means will be almost independent, and the
standard errors with this approach will be realistic (provided there
are enough batches to estimate the variance of batch means well).  The
'mean' program can estimate expectations using batch means.  For this
example, we could estimate the expectation of x as follows:

> dist-tbl x nlog 51: | mean 50

Number of batches: 19
Number of lines per batch: 50

Mean        Standard error

9.8155298       0.0902573

The standard error found using this method is similar to that found
above using autocorrelations.

Finally, another approach is to use 'series' to find out the lag past
which the autocorrelations are almost zero, and then use 'dist-est' to
find the expectation, telling it to look at iterations separated by
that lag.  This is somewhat wasteful of data, but is at present the
only easy way to get correct standard errors when estimating some
function of state rather than a state variable itself.  For example,
from the output of 'series' above, it seems that autocorrelations for
"x" are almost zero at about lag 15.  This is consistent with states
at that lag being almost independent (although lack of correlation
does not guarantee independence).  On that basis, we might estimate
the expectation of Sin(x) with the following command:

> dist-est "Sin(x)" nlog 51:%15

Number of sample points: 63

Estimates for Sin(x):

Mean:    -0.235189  (standard error 0.0778972)
Std.dev: 0.61829

NOTE: The standard errors assume points are independent

These results are consistent with the true expectation of Sin(x) with
respect to the Normal(10,1) distribution, which is -0.330.
```