STA 414/2104: Statistical Methods for Machine Learning and Data Mining (Jan-Apr 2011)

You can pick up uncollected assignments Thursday, May 12, from 3:30-4:30pm, in SS 6026A.


Radford Neal, Office: SS6026A, Phone: (416) 978-4970, Email:
Office hours: Thursdays 2:10-3:00, in SS 6026A.


Tuesdays 12:10pm to 2:00pm and Thursdays 12:10 to 1:00pm, in SS 2105. The first lecture is January 11. The last lecture is April 7. There are no lectures February 22 and 24 (Reading Week).


35% Final exam, scheduled by the Faculty during the exam period.
45% Three assignments, each worth 15%.
20% Two in-class tests, each worth 10%, on February 10 and March 17.

The assignments are to be done by each student individually. Any discussion of the assignments with other students should be about general issues only, and should not involve giving or receiving written or typed notes.

The final exam times and locations are here.


Christopher M. Bishop (2006) Pattern Recognition and Machine Learning, Springer. There's a webpage for the book here (includes errata).

Sections to read: Ch. 1 except 1.6, Ch. 2 except 2.3.8., Ch. 3.


Assignments will be done in R. Statistics Graduate students will use the Statistics research computing system. Undergraduates and graduate students from other departments will use CQUEST. You can request an account on CQUEST if you're an undergraduate student in this course (you need to fill out a form if you're a grad student).

You can also use R on your home computer by downloading it for free from From that site, here is the Introduction to R.

Some useful on-line references

Proceedings of the annual conference on Neural Information Processing Systems (NIPS)

Information Theory, Inference, and Learning Algorithms, by David MacKay

My tutorial on Bayesian methods for machine learning: Postscript or PDF.

UCI repository of machine learning datasets

Lecture slides:

Tuesday Thursday
Week1: Slides Slides
Week2: Slides -
Week3: Slides Slides
Week4: Slides -
Week5: - -
Week6: Slides -
Week7: Slides -
Week8: - Slides
Week9: Slides -
Week10: Slides -
Week11: - Slides
Week12: Slides-


Assignment 1: handout (a typo has now been fixed). Programs to modify are below. Data: ass1-train1.txt, ass1-test1.txt, ass1-train2.txt, ass1-test2.txt.
A model solution: discussion, modified functions, script for first part and its output, script for second part and its output.

Assignment 2: handout.
Neural network programs to modify are below. The R commands for reading the data here: MS Windows, Mac/Unix/Linux
Data: est.txt, val.txt, tst.txt.
The data was derived from the original data files from here.

A model solution:

Modified MLP functions: MS Windows, Mac/Unix/Linux
R Script: MS Windows, Mac/Unix/Linux
Output: MS Windows, Mac/Unix/Linux
Plots: PDF
Discussion: MS Windows, Mac/Unix/Linux
Note: I see now that in my model solution I assume a penalty of 1/2 times lambda times the sum of squares of weights, though the penalty in the handout doesn't include the factor of 1/2! This just changes the optimal value of lambda. Either definition will be OK for marking.

Assignment 3: handout.
EM program to modify is below.
Data: training data, test data. A model solution:

Modified mixture functions: MS Windows, Mac/Unix/Linux
R Script: MS Windows, Mac/Unix/Linux
Output: MS Windows, Mac/Unix/Linux
Plots: PDF
Discussion: MS Windows, Mac/Unix/Linux

R programs:

Functions for fitting a 1D Gaussian basis function model, with cross-validation to select width and penalty, or Bayesian fitting using marginal likelihood: MS Windows, Mac/Unix/Linux. Also, the script to try these functions out: MS Windows, Mac/Unix/Linux. The plot that is produced is here. (Note that these have been revised to include the Bayesian method, and I've put in the missing minus sign noted in lecture.)

Functions for neural network training (slightly revised): MS Windows, Mac/Unix/Linux. Also, the data file used for the demo: MS Windows, Mac/Unix/Linux, and a script that uses this data: MS Windows, Mac/Unix/Linux

Function for fitting simple mixture with EM: MS Windows, Max/Unix/Linux.

Example of PCA applied to yeast cell cycle data: writeup, data, PCA functions, script, discussion, plots.

Web pages for past related courses:

STA 414/2104 (Spring 2007)
STA 414/2104 (Spring 2006)
CSC 411 (Fall 2006)
STA 410/2102 (Spring 2004) - has many examples of R programs