GENERATING TEST DATA FOR SOURCE LOCATION MODELS. This example shows how a source model can be specified, and how to set its parameters to specific values. Measurements made for a grid of detectors can then be generated, with random noise. This data will be used in the later examples of fitting models. Three 'spec' commands are used to specify the structure and prior distributions for a source model. The first command to use (see src-spec.doc) creates a log file, in which it stores the specifications for how many sources there are and the priors on the locations and intensities of these sources. Here is an example: > src-spec logg 3 0:5 / -10:10 -1:1 0:1 This creates a log file called 'logg' (wiping out any previous file of that name), and store in it specifications for a model with 3 sources, with intensities in the range 0 to 5, and with x, y, and z coordinates in the ranges -10 to 10, -1 to 1, and 0 to 1, respectively. The prior distributions for the intensities and coordinates are uniform over these ranges. Following this command, the properties of the detectors can be specified (see det-spec.doc). The simplest option is for a detector with Gaussian noise, with a fixed standard deviation, which can be specified using a command such as the following: > det-spec logg 0.1 This appends a record to the log file 'logg' specifying that the detector noise standard deviation is 0.1. Finally, the model for how contaminants flow through the atmosphere must be specified (see flow-spec.doc). The 'flow-spec' command is designed to allow for various such models, but at present only a 'test' model is implemented. We can specify that this model be used as follows: > flow-spec logg test 1 0.08 0.0001 0.06 0.00015 This specifies use of the test model, with wind speed of 1, and other parameters as specified. Note that this is a steady-state model, in which time is irrelevant. To generate data from this model, we need to fix the locations and intensities of the three sources that it specifies exist. Before we can do this, however, the following command is needed: > data-spec logg 3 1 / /dev/null . The data-spec command is meant for more general usage (see data-spec.doc). Here, it is a bit redundant, but is nevertheless required by the software. The arguments after the name of the log file are the number of "inputs" in data files - in this case, 3, giving the x, y, z coordinates of sources - and the number of "targets" - which for source models is always 1, representing a measurement by a detector. The remaining arguments just indicate that no data is available yet. We can now specify the locations and intensities of the sources with the src-initial command (see src-initial.doc), which can also be used to initialize MCMC runs. Here is an example: > src-initial logg / / 0.5 4.5 -0.4 0.2 / 0.8 -6 0.7 0.85 / 1.8 9 0.1 0.45 For this model, the detector and flow specifications have no variable parts, so the first two groups of arguments after the name of the log file are missing (nothing before the first two "/" arguments). What follows are the intensities and locations of the three sources, given as four numbers with intensity first, followed by x, y, and z coordinates. Note that these numbers lie within the ranges allowed by the src-spec command above. These values for the model parameters are stored in the log file, with an "index" of 0. We can now actually generate the data. To start, we can specify a random number seed to use: > rand-seed logg 1 This specifies seed 1, which would have been the default in any case. We need a grid of x, y, z locations at which we want measurements (ie, locations where detectors are assumed to exist). The 'grid' program (see grid.doc) is useful for this: > grid -10:10%0.1 -1:1%0.1 0.3:0.9%0.3 >grid1 This creates a file called 'grid1' containing a grid of x, y, z coordinates spanning the ranges -10 to 10, -1 to 1, and 0.3 to 0.9, including all values in those ranges that are multiples of 0.1, 0.1, and 0.3, respectively. Here are the first ten lines of grid1: -1.000000e+01 -1.000000e+00 +3.000000e-01 -1.000000e+01 -1.000000e+00 +6.000000e-01 -1.000000e+01 -1.000000e+00 +9.000000e-01 -1.000000e+01 -9.000000e-01 +3.000000e-01 -1.000000e+01 -9.000000e-01 +6.000000e-01 -1.000000e+01 -9.000000e-01 +9.000000e-01 -1.000000e+01 -8.000000e-01 +3.000000e-01 -1.000000e+01 -8.000000e-01 +6.000000e-01 -1.000000e+01 -8.000000e-01 +9.000000e-01 -1.000000e+01 -7.000000e-01 +3.000000e-01 This file will be used as the input file when fitting models to the data generated. The file of measurements taken at these grid locations is generated by the src-dgen command (see src-dgen.doc), an example of which follows: > src-dgen logg 0 / grid1 data-grid1-0.1-1 This uses the parameters for the model stored in the log file under index 0 (produced with the src-initial command above) to generate measurement values for detectors at all the locations in the file 'grid1', with random noise added. These noisy measurments are stored in the file data-grid1-0.1-1 (named for the grid, the noise level, and the random seed). This will be used as the file of target measurements when fitting a model to this data. Here are the first ten lines of this file: 3.38862e-01 3.26606e-01 2.16727e-01 2.05567e-01 4.46061e-01 1.95426e-01 2.67355e-01 3.25556e-01 2.47019e-01 2.79804e-01 Negative measurements are possible, even though actual concentrations are non-negative, since Gaussian measurment error is assumed. Note that slight differences from one machine to another are possible for this and other output. We can instead specify a much smaller noise level (eg, 1e-30) in the det-spec command, and thereby generate effectively noise-free measurements of concentrations at the grid points. This is useful for checking results. Here are the first ten lines of the file data-grid1-0-1, generated in this way: 3.56170e-01 3.17061e-01 2.61963e-01 3.72152e-01 3.31319e-01 2.73783e-01 3.86745e-01 3.44371e-01 2.84644e-01 3.99744e-01 The same log file can be used to generate the data at coarser grids, in order to test our ability to infer source locations with varying amounts of data. Data files data-grid2-0.1-1 (with 126 detectors), data-grid3-0.1-1 (with 63 detectors), and data-grid4-0.1-1 (with 15 detectors) were generated for testing and use in later examples. For grid3 and grid4, all measurements have the same z coordinate, which leads to ambiguous inferences that test the ability of the MCMC methods to sample a multimodal posterior distribution. Some R functions to plot data are in data-plot.r. A script to use these functions to plot data on the four grids mentioned above is in plt-data.r.