DFT: Models based on Dirichlet diffustion trees. The 'dft' programs implement Bayesian models for multivariate probability or probability density estimation that are based on Dirichlet diffusion trees. The results of fitting such a model to data (a set of training cases) can be used to make predictions for future observations (test cases), or they can be interpreted to produce a hierarchical clustering of the training cases. A 'dft' model consists of one or more Dirichlet diffusion trees, whose parameters may be fixed or may be given prior distributions. Each tree produces a real-valued vector for each training case; these are added together to produce real-valued "latent" vectors associated with each training case. The latent vector for a case is used to define a probability distribution for the case's data. The data vector for a case can consist entirely of real value, or entirely of binary, or it can consist of all real values except for the last, which is binary. This last option allows binary classification problems to be solved using a model for the joint distribution of the binary class and the real-valued features, from which the conditional distribution of the class given the features can be found (albeit somewhat clumsily with the present version of the software). The model for binary data is that the probability of data items being 1 is found by applying the logistic function to the corresponding latent value. Real data is modeled as being Gaussian distributed with mean given by the latent vector, or as being t-distributed with location parameter given by the latent vector. A t-distribution for the noise is obtained using a hierarhical prior specification for Gaussian noise variances that includes a level allowing for different noise variances for each variable and for each training case, which produces a t-distribution once the case-by-case variances are integrated over. Targets may be missing for some cases (written as "?"), in which case they are ignored when computing the likelihood (as is appropriate if they are "missing at random"). The Markov chain used for sampling from the posterior distribution over trees has as its state the structures of the trees, the divergence times for each node in each tree, and any variable hyperparameters for the trees or the noise distribution. The latent latent vectors for the training case and the locations of non-terminal nodes may are also present (and must be in some circumstances). Copyright (c) 1995-2004 by Radford M. Neal