Discussion for Assignment 2.
Read this discussion in conjunction with the text output and plots
available from the course webpage.
For both the forms of the covariance function (differing in how
temperature is treated), estimates for the hyperparameters that look
reasonable were found when optimization was done starting with eta1
set to 0.5 and the other hyperparameters set to 0.1.
For both forms of the covariance function, the estimate for the eta1
hyperparameter was very close to 0.5, indicating that the Poisson
assumption seems to be good enough, and that the square root
transformation of the number of deaths has worked as expected.
For both forms of the covariance function, the estimate for the eta2
hyperparameter is non-zero, so a seasonal effect on deaths seems to be
present.
For both forms of the covariance function, the eta3 hyperparameter was
estimated to be very close to zero, indicating that the models did not
see temporary changes in the number of deaths with a time scale of
around a month. However, the eta4 hyperparameter was estimated to be
non-zero, so the models do see temporary changes with a time scale of
around a week.
For both forms of the covariance function, the eta5 hyperparameter was
estimated to be non-zero, but the estimate for eta5 was about 3.5
times larger with the model that uses the cube of temperature. This
is one indication that temperature extremes are what matters most.
Also, the log marginal likelihood of the model using the cube of
temperature is bigger than the log marginal likelihood of the model
using unmodified temperatures by 0.6674, whose exponential is 1.95.
So if we consider the two models to be equally likely a priori, then
given the data, the the model using the cube of temperature is about
twice as likely as the model using the unmodified temperature.
Furthermore, the average squared error of predictions for the square
root of the number of deaths for the fourth year is 0.2674 for the
model using unmodified temperature and 0.2645 for the model using the
cube of temperature. The difference may seem small, but remember that
for a Poisson model, the standard deviation of the square root of the
number of deaths would be 0.5, and hence the variance would be 0.25.
The excess squared error beyond this inevitable Poisson-derived
variation is thus 0.0174 for the model using unmodified temperature
versus 0.0145 for the model using the cube of temperature. This
difference seems less trivial.
I looked in more detail at the differences in predictions of the two
models when the temperatures are extreme - above 20, or below -5. The
average squared error with the model using unmodified temperatures is
0.2581 on days with temperature above 20, versus 0.2402 for the model
using the cube of temperature. There is no difference in average
squared error on days with temperature below -5. For extreme high
temperatures, the predictions of the model using the cube of
temperature therefore do seem to be better.
I plotted the predictions using each model, along with the data
points, for both the three years that the model was fitted to and the
fourth year that follows. (The top plot is for the model using
unmodified temperature, the bottom plot for the model using the cube
of temperature.) The "predictions" for the years that were actually
fitted show more variability, because they include the temporary
variation on the time scale of about a week. For the fourth year, the
model has no idea what these variations will be, and they are
therefore effectively set to zero (except for a few days at the
beginning of the fourth year for which there is still significant
covariance with the end of the third year). The two models produce
noticeably different predictions for some days in the summer of the
fourth year, when extremely high temperatures may occur.
To see how these models are actually using the temperature, I also
made predictions for the square root of the number of deaths with the
data modified to increase the temperature by one degree each of the
days, and subtracted the original predictions from these predictions
in order to see what effect a one-degree increase in temperature has
with each model. The results are plotted separately for the training
data (first three years, top) and the test data (fourth year, bottom).
In these plots, one can see that the effect of a one-degree rise in
temperature is greater for the model using the cube of temperature
(plots on the right). The effect in the model using the cube of
temperature is, as expected, positive for high temperatures, and
negative for low temperatures, but (within the range of temperatures
experienced in Toronto), the effect is greater for high temperatures
than for low temperatures. (The decline in the size of the effect for
very high temperatures is probably due to there being very few days
with such high temperatures, so the model has little information on
the effect of very high temperatures, and accordingly shrinks the
effect size towards zero.)
A simple scatterplot (not shown) clearly displays variation in number
of deaths with season of the year, which is confirmed by the Gaussian
process models. A simple scatterplot of deaths versus temperature
shows a negative correlation. However, in the Gaussian process model,
we see that the only large effect is that extremely high temperatures
are associated with a larger number of deaths - the reverse of what
one might conclude from the simple scatterplot. This is possible
because temperature of course varies with season. It appears that
there is a large effect of season on number of deaths - with more
deaths in winter - that is not due to short-term effects of cold
temperatures. This seasonal effect could be due to the change in
amount of sunlight (perhaps via its effect on vitamin D production),
to different social behaviour (perhaps the effect of the school year),
or to long-term effects of temperature that would not be visible in a
model such as used here that looks only at the temperature on the day
that a death occurred, not on the days preceding the death.