XXX-MIS: Do Metropolis importance sampling. There is a version of this program for each 'mc' application that supports Metropolis importance sampling. These programs implement an importance sampling scheme in which states are randomly sampled from an infinite-temperature distribution (the prior, or a uniform distribution), and then modified by applying semi-deterministic Metropolis updates, in which proposals are random, but their acceptance is determined using a "thermostat" variable with exponential distribution (except that acceptance with respect to the prior, if any, is determined randomly in the usual way). The thermostat variable is initialized at a high temperature but then decayed to move the system to lower temperatures. This process produces a distribution over states which we hope is close to the desired distribution (corresponding to temperature one), and which we can use to estimate expectations with respect to the desired distribution by means of importance sampling. Usage: xxx-mis log-file n-runs min-steps max-steps [ [-]modulus ] / blocksize { stepsize count } stepsize [ count ] / inv-temp decay Here 'xxx' is a prefix identifying the particular incarnation of this program (eg, dist-mis or mol-mis). The log file must contain appropriate specifications for the model before xxx-mis is run. One or more states, with appropriate weights, are appended to this log file when each of n-runs runs are simulated using the parameters specified after the two "/" arguments. These states can be used to estimate expectations of various quantities (eg, using dist-est), as well as the normalizing constant for the distribution. After xxx-mis has finished, additional runs can be simulated by simply running the program again. The importance sampling distribution is an equal mixture of the distributions defined by first sampling the ordinary state variables from the infinite-temperature distribution and the special "thermostat" variable from the exponential distribution with mean equal to the specified temperature (shown above in terms of its inverse, inv-temp), and then applying between min-steps and max-steps iterations of a semi-deterministic Metropolis dynamics, in conjunction with decay for the thermostat variable. The inverse temperature may be specified directly, or as "/T", where T is the temperature. Conceptually, each of the runs simulated by xxx-mis gives rise to (max-steps - min-steps + 1) states from this importance sampling distribution, produced by overlapping portions of the run. These states are written to the log file with appropriate importance weights, unless modulus in specified, in which case only those whose indexes (from 0) are multiples of modulus are written. To find the appropriate importance weights, the total probability of producing each state under the importance sampling distribution must be found. This involves a sum over all possible numbers of steps (from min-steps to max-steps) that might have been used to produce the state. Finding this sum requires simulating a backwards portion of the run for (max-steps - min-steps) steps. The states from this backward portion of the trajectory, and from the portion before min-steps have been done, are not saved in the log file, unless "-" is included before modulus, in which case they are saved with weights that are effectively zero (which works OK for estimating expectations, though not for estimating normalizing constants). Each iteration of the Metropolis dynamics consists of one or more semi-deterministic Metropolis updates. Updates are applied to successive blocks of consecutive variables (not including the thermostat variable), with the size of a block being as specified on the command line after the first "/". A blocksize of "all" means all variables are updated in one block. If the number of variables is not a multiple of blocksize, the last block will have fewer than blocksize variables. Each update begins with the random selection of a proposed new state for the variables in the block, found by adding Gaussian noise to all these variables. The standard deviations for the noise are given by the default stepsizes set by the application or by the user (see mc-stepsizes.doc) times a stepsize specified in the command line. The stepsize specifications follow the blocksize, and each is followed by a count of how many times it is to be repeated (the count for the last stepsize can be omitted, and defaults to one). Each repetition of each stepsize is used for a full scan through all the variable blocks. The total number of Metropolis updates in an iteration is therefore equal to the number of blocks times the sum of the counts for all the stepsizes. The proposed new state for a Metropolis update is accepted or rejected according to two criteria. First, if we are sampling from a Bayesian posterior, the ratio of the prior probability of the proposed state to the old state is computed, and the new state is rejected if a uniform (0,1) random number is not less than this ratio, in which case the second step is skipped. In the second step, the difference in the log probabilities of the new and the old states due to factors other than the prior (ie, the energy difference for a physical system) is computed, and the new state is rejected if this difference is greater than the value of the "thermostat" variable. (If the new state has lower energy / higher probability, then it will always be accepted.) If the new state is accepted, the value of the thermostat variable is adjusted by the amount of the energy difference, which keeps the energy constant, since the energy for the thermostat is just equal to its value. Following each complete Metropolis iteration (which may consist of many updates), the thermostat variable is multiplied by the specified decay factor, which should be slightly less than one. These steps are reversed (with division by decay rather than multiplication) when simulating backwards. For good results, inv-temp should be chosen to be small enough that the canonical distribution at this temperature is almost the same as the prior (or almost uniform). The range min-steps to max-steps should be chosen so that the system cools to approximately a temperature of one somewhere within this range of steps. The stepsizes need to be small enough that reasonable acceptance rates are obtained, but not too small. (The ideal size may vary with temperature, which is why several stepsizes are allowed, so that at least one will be appropriate at any time.) Finally, the decay needs to be sufficienly slow (ie, the decay parameter needs to be only slightly less than one) if the importance weights are not to be highly variable. The value of the thermostat variable is accessible as the "h" quantity. The rate of rejections of Metropolis updates is also recorded, and is available as the "r" quantity, with the difference in energy level available as the "D" quantity. (There is no access to the difference in log priors.) The position of each saved state in the overall run is recorded as the "m" quantity. The overall time taken so far is recorded as the "k" quantity. These are more-or-less analogous to the quantities documented in mc-quantities.doc. Copyright (c) 1995-2004 by Radford M. Neal