DISCUSSION FOR ASSIGNMENT #1
(Note that your results when using cross validation might be slightly
different than below if you split the data into 10 folds differently.)
For dataset #1 using penalized least squares, cross validation
selected lambda1=2 and lambda1=32 as the best penalty magnitudes,
though any value for lambda1 between 0 and 4 together with lambda2=32
gave almost as small cross-validation error. The test error
(displayed as the square root of the average squared error) with the
selected values for lambda1 and lambda2 was 1.91. Smaller values for
lambda1 together with lambda2=32 gave a slightly smaller test error of
1.89. Other values for lambda1 and lambda2 gave worse test errors.
For dataset #1 using the posterior mean, the marginal likelihood was
highest for omega1=0.8, omega2=0.2, and sigma=1.4, which gave a test
error of 1.91. Slightly smaller test errors, down to 1.85, were
obtained with some other combinations of omega1, omega2, and sigma
(all of which have omega2 equal to 0.2 or less).
The same final test error of 1.91 was therefore obtained by using
penalized least squares with penalty selected by cross-validation and
by using the posterior mean with hyperparameters selected by marginal
likelihood. (This is somewhat coincidental - there's no reason for
the two approaches to give exactly the same result, even though they
may give very similar results.) As discussed in the lecture notes,
the posterior mean with a given omega and sigma corresponds to the
penalized least squares estimate with lambda = sigma^2/omega^2. This
gives the equivalent penalties of lambda1 = 1.4^2/0.8^2 = 3.06 and
lambda2 = 1.4^2/0.2^2 = 49, which are similar to the values of
lambda1=2 and lambda2=32 selected by cross validation, so in this
respect also the two approaches gave similar results.
If we force lambda1 to be the same as lambda1, cross validation would
select lambda1=lambda2=8, which gives a test error of 2.43, so using
two penalty magnitudes is a significant benefit for this dataset.
Similarly, if we force omega1 to be the same as omega2, the best value
according to the marginal likelihood is omega1=omega2=0.8 (with
sigma=1.4), which gives a test error of 2.35, worse than when two
hyperparameters are used.
For this dataset, we see that both approaches worked well at selecting
good values for the penalties or hyperparameters. We also see that
the best cross-validation error (2.00) was close to the final test
error (1.91).
For dataset #2 using penalized least squares, cross validation
selected lambda1=lambda2=1 as the best penalty magnitudes, though
lambda1=lambda2=2 was almost as good. The test error with the
selected values was 1.02. A slightly lower test error of 1.00 would
have been obtained with lambda1=lambda2=2.
For dataset #2 using the posterior mean, the marginal likelihood was
highest for omega1=omega2=0.4 and sigma=0.5, which gave a test error
of 1.00. This is as good a test error as for any other combination of
hyperparameters. However, there are other equally good (or nearly as
good) combinations, as is expected, since the posterior mean is the
same as a penalized least squares estimate with penalties that depend
only on functions of omega1, omega2, and sigma (so varying these three
hyperparameters can produce the same two penalties in more than one
way).
We again see that the two approaches give almost the same results.
The equivalent penalties for the hyperparameters chosen by marginal
likelihood are lambda1=lambda2 = 0.5^2/0.4^2 = 1.56, similar to the
values of lambda1=lambda2=2 chosen by cross validation. Since the
methods chose lambda1 to be the same as lambda2 or omega1 to be the
same as omega2, there was clearly no benefit to considering different
penalties or prior standard deviations for the two groups of inputs
with this dataset. For this dataset, the test error was also best
when lambda1=lambda2 or omega1=omega2. The test error for this
dataset was somewhat larger than would be expected from the cross
validation error (for the chosen lambda1=lambda2=1, cross-validation
error was 0.83, versus 1.02 for test error).
Overall, both the cross validation and the marginal likelihood
approaches worked well on these datasets. For dataset #1, using two
penalty magnitudes or prior standard deviations improved the results
substantially. For dataset #2, there was no improvement, but neither
did it hurt. So, based just on these datasets, it seems that using
more than one penalty magnitude or prior standard deviation may be
generally useful when inputs come in two or more groups.