STA 414/2104 Assignment 3 discussion
Data set 1
For this dataset, I tried K=10 and K=15, with s=0.001 (essentially no
penalty that would produce a one-dimensional structure) and s=1. Other
values for s didn't seem better. I tried three random number seeds
for each combination of K and s, for a total of 12 runs. The plots
show the ordering of component means, along with the average log
probability of the test cases with the parameters estimated.
With s=0.001, there is no ordering to the means (as expected), and the
average log probability of test cases ranges from -0.326 to -0.289,
with no obvious difference for K=10 and K=15.
With s=1, a good one-dimensional ordering is obtained for 5 of the six
runs. When a good ordering is obtained, the average log probability
of the test cases ranged from -0.301 to -0.250. It seems that the
results with K=15 are a bit better, with test results ranging from
-0.266 to -0.250. The one run that did not produce a good ordering
(K=10, seed=3), had a very bad average log probability for test cases
of -0.539.
So using the GP penalty on component means seems to help a bit for
this dataset, as long as one avoids getting a bad run that ends in a
local maximum that isn't good.
Data set 2
This data set resembles data set 1, except that the one-dimensional
structure is a bit more complicated, and less smooth.
I again tried K=10 and K=15, this time with s=0.001 and s=0.06. The
smaller value of s seems among the best I tried, and makes sense in
that the data requires a less smooth function.
This time, results with s=0.001 were clearly better for K=15 than for
K=10. The average test log probability for K=15 with s=0.001 was
either -1.490 or -1.489 for all three seeds.
With s=0.6, only one of the 6 runs found what seems to be the correct
one-dimensional structure (the run with K=15 and seed=2). For that
run, the average log probability for test cases was -1.490,
essentially identical to the results with s=0.001. For the other 5
runs, results were much worse, with average test log probability
ranging from -1.762 to -1.553 (with worse results for K=10 than K=15).
So for this data set, there was no advantage to using the GP penalty,
even when close to the right structure was found. Looking closely at
the results, it seems that even in the one "good" run, the
one-dimensional structure found is a bit too smooth, which may degrade
performance by about the same amount as the improvement from less
overfitting.
Data set 3
For this data set, I used K=15 and K=20 with s=0.001 and s=0.8. I
tried only two seeds for each, for a total of 8 runs. The six
pairwise scatterplots are shown for each run.
With s=0.001, the average test log probability was -0.473 and -0.480
for the runs with K-15 and -0.536 (both) for K=20, so it seems that
there is overfitting with s=0.001 and K=20.
With s=0.6 and K=15, both runs do not produce the desired
one-dimensional structure, since one can see that some of the
components are placed in areas with few or no data points. However,
the average log probability of test cases was -0.498 and -0.407 for
these runs, so the results are about the same or a bit better than
with s=0.001.
Much better results are obtained with s=0.6 and K=20. For both random
seeds, the one-dimensional structure produced seems to be reasonable.
The average log probability of the test cases for these runs was
-0.298 and 0-0.287, much better than an of the runs with s=0.001.
So the GP penalty was very helpful with this data set, provided that
good values of K and s are used.