STA 437 / 1005 - Methods for Multivariate Data (Sep-Dec 2010)

The solutions for Assignments 1 and 2 are available below, along with two past final exams.

STA 437 is the undergraduate version of this course. STA 1005 is the graduate version, which may be taken for credit only by graduate students who are not in Statistics.


Radford Neal
Phone: (416) 978-4970
Office: SS6026A


Mondays 6:10pm to 9:00pm, from September 13 to December 6, PLUS 6:10pm to 9:00pm Wednesday December 8, EXCEPT for October 11 (Thanksgiving) and November 8 (term break). The December 8 lecture will not cover new material, and so could be skipped by students unable to come Wednesday evening.

Lectures are in Sidney Smith Hall, 100 St. George Street, room 1070.

Office Hours:

I will hold office hours Tuesdays, from 4:30pm to 5:30pm, starting immediately. My office is in Sidney Smith Hall, 6th floor, room 6026A. Come see me if you have any administrative issues, or if you just want to discuss the course material.

The TA, L. Xu, will be providing help in the stat aid centre (SS 2133) Wednesdays from 3:10 to 4:00.


B. S. Everitt and G. Dunn, Applied Multivariate Data Analysis, 2nd edition.


Assignments will require use of the R statistics package. You can use this package on the CQUEST computer system, or install it for free on your own computer (running Microsoft Windows, Macintosh OS, or Linux).

You'll be able to get a CQUEST account once classes start at

The R package and documentation are at Here are some direct links to things available there:


45% Final exam, scheduled by the Faculty during the exam period.
20% Two assignments, each worth 10%, tentatively due Nov. 1 and Dec. 6.
10% Two in-class quizes, each worth 5%, on Sep. 27 and Nov. 15.
25% Mid-term test, tentatively scheduled for Oct. 18, 6:10-8:00, followed by discussion of answers (will discuss whether this is a suitable time in the first class).

Late assignments will be accepted only for legitimate reasons, such as illness. Answers to the quizes and the mid-term test will be discussed immediately afterward. If anyone misses a quiz or the mid-term test for a legitimate reason, that part of their course mark will be taken from other work.


Assignment #1: Handout. Due November 15 (changed from original date). Here are the data files:
Part I: wind speeds on Sundays, for Unix/Linux/Mac, or Microsoft Windows
Part I: wind speeds on Mondays, for Unix/Linux/Mac, or Microsoft Windows
Part II: gene expression data, for Unix/Linux/Mac, or Microsoft Windows
Part II: patient information, for Unix/Linux/Mac, or Microsoft Windows
Here are the R hints for assignment 1. Here are the solutions to Part I and Part II. Note that these are just example solutions. You don't necessarily have to have done everything that I did, and at some points, it would be reasonable to make different analysis decisions than I did.

Assignment #2: Handout. Due December 6. Here are the R hints for assignment 2. A model solution is here.

Quizes & mid-term test:

Quiz #1: Held Sept. 27. Here is the quiz paper with answers.

Mid-term test: Held Oct. 18. Here is the test paper.

Final Examination:

The final exam schedule is here.

Here are two old exams: 2008, 2009.

Other useful information:

Lecture notes. Updated 2010-11-26. Note that not all the material we've covered is mentioned in these notes, which mostly just summarize definitions and some theory.

Lattice graphics in R.

Paper on visual significance tests by Buja, et al. The R function I used for the class demo is here.

Air pollution data, from Table 2.5 in the textbook: Unix/Linux/Mac, or Microsoft Windows. Figure 2.13 can be approximately reproduced in R with

air <- read.table("air.dat",head=TRUE)
xyplot(SO2 ~ wind | equal.count(temp,number=6), data=air)

Glucose data, from Table 8.8 in the textbook: Unix/Linux/Mac, or Microsoft Windows.

R functions for multivariate confidence intervals: Unix/Linus/Mac, or Windows.

Web pages for previous versions of this course:

Fall 2009
Fall 2008