MIX-MC:  Use Markov chain to do sampling for a mixture model.

The mix-mc program is the specialization of xxx-mc to the task of
sampling from the posterior distribution of the hyperparameters,
parameters, and component indicators that are associated with a
mixture model.  If no training data is specified, the prior for
hyperparameters is sampled from, in which case only the gibbs-hyper
operation below is valid.

The generic features of this program are described in xxx-mc.doc.
However, at present, none of the pre-defined Markov chain operations
can be used with mixture models, only the special operations described
below.

The state of the simulation has three parts: the hyperparameter values
common to all mixture components, the indicators of which component is
associated with each training case, and the parameter values for those
components that are associated with some training case.  Any parts
that are not present when sampling begins are set as follows:
hyperparameters to their means, parameters of components to their
means given the hyperparameters, and indicators to a single component.

The three parts of the state can be updated with the following
application specific sampling operations:

   gibbs-hypers

       Does a Gibbs sampling scan over the hyperparameter values.

   gibbs-params

       Does a Gibbs sampling scan over the parameters for the mixture
       components that are currently associated with at least one
       training case.

   gibbs-indicators

       Does a Gibbs sampling scan over the indicators for which 
       component is currently associated with each training case.
       This operation is not allowed for models with an infinite
       number of components.  

   gibbs-ext-indicators [ N ]

       Applies the method of Gibbs sampling with extra components 
       to update the indicators of which component is currently
       associated with each training case.  A new indicator for a
       case is chosen from among the components associated with
       other cases along with N (default one) components not
       associated with any case except perhaps the present one.  If
       the component associated with the present case is also
       associated with other cases, these N components are drawn from
       the prior.  If the current case is a singleton, one of these N
       is the current component and N-1 are drawn from the prior.

       As a special fudge, the "no gaps" method of MacEachern and
       Mueller is done if N is -1.

       This operation is possible with both finite and infinite
       models.

   gibbs1-indicators

       Does a Gibbs sampling scan over the indicators for which
       component is currently associated with each training case,
       except that the component for a training case is not updated
       if this training case is currently associated with a singleton
       component - ie, if no other training case is associated with
       the same component as this one.  This operation is possible
       for both finite and infinite models.  Note that to produce
       an ergodic Markov chain, at least one of gibbs-indicators, 
       gibbs-ext-indicators, met-indicators, and met1-indicators will
       need to be used as well.

   met-indicators [ N ]

       For each training case in turn, does N Metropolis-Hastings
       updates for the indicator that specifies which component is 
       associated with that case.  For each update, a new component is
       proposed, selected according to the predictive prior probability
       based on the current component frequencies and the concentration 
       parameter.  This component is then accepted or rejected based 
       on the relative probability of the case with respect to the new 
       and the old components.  This operation is possible with both
       finite and infinite models.  N defaults to one.

   met1-indicators [ N ]

       This operation is like met-indicators, except that the new
       component proposed for a training case depends on whether or 
       not the case is a singleton - ie, whether no other case has
       the same component.  For singleton cases, only those components
       associated with other training cases are proposed, according to
       the predictive prior probabilities.  For cases that are not
       singletons, only components that are not currently associated
       with any training case are proposed, with equal probabilities.
       The acceptance criterion is adjusted to make these proposals
       valid.  This operation is possible with both finite and
       infinite models.  N defaults to one.

Note that the gibbs-indicators, gibbs-ext-indicators, met-indicators,
and met1-indicators operations may change the set of components that
are associated with some training case.  Components are removed from
the current list when they are no longer associated with any training
case.

These operations can be combined to produce the various algorithms for
non-conjugate models described in my tech report on "Markov chain
sampling for Dirichlet process mixture models", as follows:

    Algorithm 4:  gibbs-ext-indicators -1 gibbs-params
    Algorithm 5:  met-indicators <R> gibbs-params
    Algorithm 6:  met-indicators <R>
    Algorithm 7:  met1-indicators gibbs1-indicators gibbs-params
    Algorithm 8:  gibbs-ext-indicators <m> gibbs-params

A "gibbs-hypers" operations would be added to these if the model
includes unknown hyperparameters.

Statistics regarding the met-indicators and met1-indicators operations
can be accessed via the 'r', 'm', and 'D' quantities, as documented in
mc-quantities.doc.  The values of the 'm' and 'D' quantities are for
the update of the component associated with one training case,
selected at random each time.

            Copyright (c) 1995-2003 by Radford M. Neal