Project 1: Multilayer perceptron

Overview

Task: implement a general multilayer perceptron classifier (supporting at least one hidden layer), trained by the backpropagation algorithm. Employ this model on a task of classifying points on a plane into three categories. Use a validation technique to select the best performing model, then perform final testing.

Deadline: March 31st, 23:59 CEST

Specifics

Model

Multi-Layer Perceptron, having at least one non-linear hidden layer
(Stochastic) Gradient Descent via Back-Propagation (online, “true” or mini-batch)

Data

one-line header, then one sample per line
points in a 2D plane (2 real-valued inputs)
three output classes (A, B and C)
train set – 2d.trn.dat, 8000 samples – training data (estimation and validation)
test set – 2d.tst.dat, 2000 samples – testing data

Training

Split the training data set into a bigger estimation subset and a smaller validation subset.¹ Use this split to perform model selection, i.e. find the best-performing combination of hyper-parameters (model architecture: number of hidden layers, neuron counts, …; training parameters: learning rate, …):

train the model on the estimation subset
test the model on the validation subset (not the test set! don’t touch that yet!)
remember the hyper-parameters of the best performing model

(Sanity check: a properly working network should reach a classification accuracy of \(\geq 95\%\))

Also try experimenting with some of the following:

input preprocessing (e.g. normalization/rescaling)
activation functions (logsig, tanh, softmax, …)
output encoding (one-hot encoding, ordinal)
training length and/or early-stopping
learning rate schedule
weight initialization type (uniform, Gaussian, sparse, orthogonal) and scale
momentum type (none, classic, Nesterov’s accelerated gradient) and strength
regularization:
- implicit (weight decay, …)
- explicit (\(L_1\), \(L_2\), …)
regression error metric used for training (square error, categorical cross-entropy/log-loss, …)

Testing

Using the best performing set of hyper-parameters (on the validation set), train a new model on the full training set, then perform final testing on the test set. Report classification accuracy, regression error and calculate a confusion matrix.

Bonus

Train the model using a more sophisticated method, such as:

Scaled Conjugated Gradient [2 pt]
A newer method (published after 2010): Adagrad, RMSprop, ADAM,... [1 pt]

Submission

Submit your code and report using this project as a single archive (e.g .zip).

Code

Projects should be written in Python; use of previously finished labs is strongly encouraged. All the “interesting” bits should be identifiable in the code (especially all the relevant equations). You’ll probably need no additional libraries other than the standard numpy/scipy/matplotlib combo. Don’t reimplement the wheel, use np.loadtxt/np.savetxt/np.load/np.save and plt.savefig where necessary.)

Model selection should not be performed by hand, but the project should rather include a runnable program² that goes through the various combinations of parameters, selects the best performing model and tests and produces final outputs.

(If you’d desperately prefer to use another language, write us an e-mail.)

Report

Create a report – briefly describing model selection, training and testing – in .pdf format. The report should be sufficiently detailed so that one can read the description, reimplement your project and using the provided parameters arrive at the same results (reproducibility). (Assume previous knowledge of neural network algorithms, so for example don’t explain how backpropagation works. But include details such as if the training was online/minibatch/batch and whether and what strength of momentum was used.)

for each examined model (hyper-parameter combination), report estimation and validation error [table]
for the best model:
- error vs. time (at least one instance) [plot]
- outputs in 2D [plot]
- confusion matrix [table]
  - rows = actual classes
  - columns = predicted classes
  - sum of each column = 100%
correct submissions with highest (testing) accuracies will be awarded bonus points

Nomenclatures may differ according to the source – train/test and estimation/validation split represent the same idea, but at different levels and for different reason.↩
or e.g. a bash script, if that seems like a better idea ;)↩