## A View of the EM Algorithm that Justifies Incremental, Sparse, and
Other Variants

**Radford M. Neal
and Geoffrey E. Hinton **

Dept. of Computer Science, University of Toronto
The EM algorithm performs maximum likelihood estimation for data
in which some variables are unobserved. We present a function
that resembles negative free energy and show that the M step maximizes
this function with respect to the model parameters and the E step maximizes
it with respect to the distribution over the unobserved variables.
From this perspective, it is easy to justify an incremental variant of
the EM algorithm in which the distribution for only one of the unobserved
variables is recalculated in each E step. This variant is shown
empirically to give faster convergence in a mixture estimation
problem. A variant of the algorithm that exploits sparse conditional
distributions is also described, and a wide range of other variant
algorithms are also seen to be possible.

In M. I. Jordan (editor) *Learning in Graphical Models*,
pp. 355-368, Dordrecht: Kluwer Academic Publishers (1998):
postscript, pdf.

This is a revision of an earlier draft from February 1993.