MODELING REAL-VALUED DATA WITH A T-DISTRIBUTION

One can also define models for the unconditional distribution of one
or more target values, without reference to any input values.  Here, 
I will show how one can sample from the posterior distribution for the
parameters of a univariate t-distribution.  This example also shows
how to specify different stepsizes for different state variables.


Specifying the model.

Here is a specification for a model in which the degrees of freedom of
the t-distribution are fixed (at two), but the location and scale
parameters are unknown:

    > dist-spec tlog1 d=2 "u ~ Normal(0,10^2) + w ~ Normal(0,1)" \
                          "w + [(d+1)/2] Log { 1 + [(t-u)/Exp(w)]^2/d }"

The "d=2" argument defines a constant (the degrees of freedom), which
may be used in the formulas for the prior and the likelihood.  The
location parameter, u, is given a normal prior with mean zero and
standard deviation 10.  The log of the scale parameter, w, is given a
normal prior with mean zero and standard deviation 1.  The software
does not have a built-in t-distribution, so the likelihood has to be
specified by explicitly writing a formula for minus the log of the
probability of the target value (t) for given values of the model
parameters.  This formula must include all terms that depend on the
model parameters, but for sampling from the posterior it is not
necessary to include constant terms.  (However, it is necessary to
include all terms in the likelihood if the marginal likelihood for the
model is to be found using Annealed Importance Sampling.)


Specifying the data source.

We can specify that the data comes from the file "tdata" as follows:

    > data-spec tlog1 0 1 / tdata .

This says that there are no input values and one target value, and
that the data comes from the file "tdata" (note that one must say
where the input values come from even though there are zero of them,
but that one can then say that the targets come from the same place
using the "." specification).

The contents of "tdata" are as follows:

    0.9
    1.0
    1.1
    6.9
    7.0
    7.1


Sampling with the Metropolis algorithm.

We can now try sampling using the Metropolis algorithm with a stepsize
of 1, repeated 10 times each iteration:

    > mc-spec tlog1 repeat 10 metropolis 1
    > dist-mc tlog1 5000

This takes about 12 seconds on our machine.  We can look at plots such
as the following to assess convergence:

    > dist-plt t uw tlog1 | plot

Equilibrium appears to have been reached within 50 iterations.  We can
now see a picture of the posterior distribution using a command such as

    > dist-plt u w tlog1 50:%3 | plot-points

This produces a scatterplot for the values of the two state variables
at those iterations from 50 on that are divisible by three.  Looking
at only every third state reduces the number of points that will be
superimposed, due to rejections of the Metropolis proposals.  (If some
points are superimposed, the scatterplot might be misleading.)

If you produce such a plot, you should see a tooth-shaped distribution
whose main mass is around u=3.9 and w=1, with two roots descending at
u=1 and u=7.


Varying the stepsizes.

The rejection rate of the metropolis operations with stepsize of 1 can
be estimated as follows:

    > dist-tbl r tlog1 | series m

    Number of realizations: 1  Total points: 5000

    Mean: 0.620760  

The efficiency of this chain at estimating the mean of u can be
assessed as follows:

    > dist-tbl u tlog1 50: | series mac 10

    Number of realizations: 1  Total points: 4951
    
    Mean: 3.963551  S.E. from correlations: 0.055950
    
      Lag  Autocorr.  Cum. Corr.
    
        1   0.625358    2.250717
        2   0.388528    3.027772
        3   0.254716    3.537205
        4   0.168077    3.873358
        5   0.106168    4.085695
        6   0.062758    4.211211
        7   0.031260    4.273732
        8   0.009645    4.293023
        9   0.001954    4.296931
       10  -0.006025    4.284882

An inefficiency factor of 4.2 is not bad, but we still might try to
improve sampling by using a different stepsize.  We can use a stepsize
twice as big by changing the 'mc-spec' command as follows:

    > mc-spec tlog2 repeat 10 metropolis 2

This produces a higher rejection rate:

    > dist-tbl r tlog2 | series m

    Number of realizations: 1  Total points: 5000

    Mean: 0.813800  

Despite this, the efficiency of sampling is improved:

    > dist-tbl u tlog2 50: | series mac 10
    
    Number of realizations: 1  Total points: 4951
    
    Mean: 3.916702  S.E. from correlations: 0.048513
    
      Lag  Autocorr.  Cum. Corr.
    
        1   0.502745    2.005491
        2   0.246632    2.498755
        3   0.133819    2.766394
        4   0.081335    2.929063
        5   0.053313    3.035689
        6   0.028491    3.092671
        7   0.013611    3.119892
        8   0.024985    3.169861
        9   0.013700    3.197261
       10   0.018007    3.233274

From the scatterplot of the posterior distribution, the larger
stepsize seems appropriate for u, but it seems too big for w.  We can
specify different stepsizes for u and w as follows:

    > dist-stepsizes tlog12 w=1 u=2
    > mc-spec tlog12 repeat 10 metropolis 1

The 'dist-stepsizes' command lets you specify individual stepsizes for
the state variables.  These values are multiplied by the stepsize
specified for the "metropolis" operation before being used, so you can
change the overall size of the steps by changing this argument, while
keeping the relative stepsizes the same.  

Using twice as big a stepsize for u as for w seems to work well, as 
seen by looking at the autocorrelations for u:

    > dist-tbl u tlog12 50: | series mac 10
    
    Number of realizations: 1  Total points: 4951
    
    Mean: 3.876082  S.E. from correlations: 0.036181
    
      Lag  Autocorr.  Cum. Corr.
    
        1   0.282051    1.564102
        2   0.061686    1.687474
        3   0.019975    1.727424
        4   0.010014    1.747453
        5  -0.004439    1.738575
        6   0.000983    1.740540
        7   0.000207    1.740954
        8   0.025230    1.791413
        9   0.004726    1.800866
       10  -0.001624    1.797619

The autocorrelations have been reduced to the point where the estimate
based on the points from this chain are a factor of only about 1.7
times less efficient than an estimate based on independent points.