STA 437 / 1005 - Methods for Multivariate Data (Sep-Dec 2008)

STA 437 is the undergraduate version of this course. STA 1005 is the graduate version, which may be taken for credit only by graduate students who are not in Statistics.

THE FINAL EXAM has been scheduled by the faculty for December 8, from 7pm to 10pm, in EX310 (Central Exams Facility, 255 McCaul St., south of College St.). Some exercises from the book to study are below.

AN EXTRA OFFICE HOUR will be held Friday, December 5, from 11:10 to 12:00.

Instructor: Radford Neal, Phone: (416) 978-4970, Office: SS6016A, Email: radford@stat.utoronto.ca

Lectures:

Mondays 6:10pm to 9:00pm, from September 8 to December 1, except for October 13 (Thanksgiving). Lectures are held in Sidney Smith Hall, room 2110.

Office Hours: Wednesdays from 4:45 to 6:00, in Sidney Smith Hall, room 6016A.

Evaluation:

30% Three assignments (10% each), tentatively due Oct. 6, Oct. 20, and Nov. 24.
25% Mid-term test, Oct. 27, 6:10-8:00pm, in RW 117.
45% Final exam, scheduled by the Faculty during the exam period.

Textbook:

R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 6th edition. You can get the datasets used as examples in the text, plus some proofs omitted from the book, from this web page. Click on "Take a closer look".

How much of the textbook we'll cover will depend on how fast things go, but we'll probably look at Chapters 1-9 plus Chapter 11, with some sections omitted, and some material not in the textbook added.

Computing:

Some assignment questions will require use of the R statistics package. You can use this package on the CQUEST computer system, or install it for free on your own computer (MS Windows, Macintosh, or Linux).

To get a CQUEST account, go to www.cquest.utoronto.ca.

The R package and documentation are at www.r-project.org. Here are some direct links to things available there:

Assignments:

Assignment 1: handout, data.
Solution: writeup, plots, R commands.

Assignment 2: handout, R functions to use (mvn.r), data for Q1 (ret.txt), data for Q2 (twins.txt), hints on using R.
Solution to Question 1: discussion, R commands, R output, plots.
Solution to Question 2: discussion, R commands, R output, plots.

Assignment 3: handout, R functions to use (pca.r), data (gene.txt), web site, hints on using R.
Solution: discussion, R commands, plots.

Lecture topics:

Note: This list of lecture topics and sections of the text covered may not be complete.

Sep 8: Sample statistics; scatterplots; meaning of a random sample. Demo of R. R scripts used in demo are here and here. Text: 1.1-1.4.

Sep 15: Review of sample statistics; means, covariances, correlations for random vectors; estimation of mean, covariance, etc. from sample statistics; effects of linear transformations; multivariate normal distribution; its density function; positive definite matrices. Text: 2.5-2.6, 3.3, 3.6, 4.1-4.2.

Sep 22: Eigenvectors and eigenvalues, especially of covariance matrices; properties of multivariate normal; Central Limit Theorem; R Demo. Text: 2.3, 4.5, 4.2.

Sep 29: Sampling distributions of sample mean and covariance; Maximum likelihood estimation; statistical distance; assessing normality and finding outliers, QQ plots; transformations to make data closer to being normally-distributed. Text: 1.5, 4.2-4.8.

Oct 6: Testing hypotheses about a multivariate normal mean; confidence intervals (univariate, simultaneous, Bonferroni-corrected); paired observations. Text: 5.1-5.4, 6.2

Oct 13: No lecture (Thanksgiving).

Oct 20: Tests and C.I. when sample is large, T2 test as a likelihood ratio test, prediction of future observations, control charts, False Discovery Rate (not in text, not on test). Text: 5.3, 5.5, 5.6

Oct 27: Mid-term test. Some excercises from the text to help in studying: 2.16, 2.17, 2.23, 2.24, 2.27, 3.7, 4.1, 4.3, 4.6, 4.7, 4.18, 4.19, 5.1, 5.2, 5.5.

Nov 3: Principal Component Analysis (PCA), intro to factor analysis. Text: 8.1-8.4, 9.1-9.2.

Nov 10: More on factor analysis. Text: 9.3-9.6.

Nov 17: Comparing means: review of paired data, repeated measures designs, two samples with equal and unequal covariance. Text: 6.1-6.3.

Nov. 24: Multivariate Analysis of Variance (MANOVA), regression with a multivariate response, introduction to classification. Text: 6.4, bits of Chapter 7, 11.1-11.2.

Dec. 1: Classification when class distributions are normal, logistic regression. Text: 11.3, 11.7. (Logistic regression won't be on the final exam.)

Dec. 8: FINAL EXAM. Sscheduled by the faculty for December 8, from 7pm to 10pm, in EX310 (Central Exams Facility, 255 McCaul St., south of College St.). Will cover the whole course, but with more emphasis on material since the mid-term. Some exercises from the second half to help in studying: 6.6, 6.7, 6.8, 8.1, 8.2, 8.3, 8.6, 9.1, 9.7, 9.8, 11.1, 11.4, 11.5.