This course will look at how statistical computations are done, and how to write programs for statistical problems that aren't handled by standard packages. Students will program in the R language (a free and improved variant of S), which will be introduced at the start of the course. Topics will include the use of simulation to investigate the properties of statistical methods, matrix computations used to implement linear models, optimization methods used for maximum likelihood estimation, and numerical integration methods for Bayesian and frequentist inference. The course will conclude with a look at some more specialized statistical algorithms, such as the EM algorithm for handling missing data and latent variables, and Markov chain Monte Carlo methods for Bayesian inference.

**Instructor:**
Radford Neal, **Phone:** (416) 978-4970,
**Email:**
radford@stat.utoronto.ca

**Office Hours:** Mondays 1:10-2:00 and Thursdays 2:30-3:30, in SS 6016A

**Lectures:**

Tuesday, Thursday, Friday from 1:10pm to 2:00pm, in SS 2128.

From January 8 to April 12, except for Reading Week (February 18-22) and Good Friday (March 29).

**Course Text:**

J. F. Monahan,Numerical Methods of Statistics.

Also of possible use: R. A. Thisted,Elements of Statistical Computing.

**Computing:**

Assignments will be done in R. Graduate students will use the Statistics/Biostatistics computer system. Undergraduates will use CQUEST. You can also use R on your home computer by downloading it for free from http://lib.stat.cmu.edu/R/CRAN/.

**Marking scheme:**

Four assignments, worth 16%, 18%, 18%, and 18%.

Three one-hour in-class tests, each worth 10%.

The tests are tentatively scheduled for February 8, March 15, and April 12.

**Please read the important course policies**

**Running R**

To run R on photon.utstat or fisher.utstat, just type "R" in a Unix command window. To run R on a CQUEST workstation, type /u/radford/R in a Unix command window. You can't run R on a CQUEST server machine.

Here is some documentation on R:

- Introduction to R
- Frequently Asked Questions on R
- R Data Import/Export
- Writing R Extensions
- R Language Definition
- R Installation and Administration

**What to read in the text:**

Chapter 2 (Computer arithmetic)

Sections 10.5 and 11.2(A) (Random number generation)

Chapter 3 (Matrices and linear equations)

Chapter 5 (Regression computations)

Chapter 8 (Optimization)

Chapter 9 (Maximum likeklihood)

Chapter 10 (Numerical integration)

Section 12.6 (Laplace approximation)

Section 12.4 (Importance sampling)

**Tests:**

Here are some practice questions for test 2: Postscript, PDF.Here are some practice questions for test 3: Postscript, PDF.

**Assignments:**

Assigment 1:Postscript, PDF

Here is a solution: program, plots, output and discussion.

Assigment 2:Postscript, PDF

Corrections: All occurrances of "-beta_1" should be "+beta_1", "gradient of likelihood" should be "gradient of log likelihood".

Here are the two data sets for testing: ass2-a, ass2-b.

Here is a solution: program, tests run, results and discussion.

Assigment 3:Postscript, PDF

Here is a solution: program, tests run, results and discussion.

Assigment 4:Postscript, PDF

Here is the data: ass4-data

Here is a solution: program, results and discussion, a plot in postscript or pdf.

**Example programs:**

Testing how well t tests work: program, plots: Postscript, PDF.Permutation test for correlation: program, output.

Computing a Cholesky decomposition: program.

Least-squares regresssion with Householder transformations: program, plus a test program and its output.

Maximum likelihood for no-zero Poisson data: program.

Maximum likelihood for a simple model with two variances: program.

An old maximum likelihood assigment: Handout, Solution to part 1, Solution to part 2.

EM algorithm for censored Poission data: program, output of two tests.

An old EM assignment: Handout, Solution.

Symbolic computation and minimization in R: examples

Evaluating a double integral with the midpoint rule: program, output.

Midpoint rule used for Bayesian inference for two Poission models: program, output

An old Bayesian inference assignment: handout, program, output.