Tuning Hyperparameters

Thursday, December 14, 2017 1:50 PM

Our last Learning Machines assignment is to calibrate the hyperparameters for a Multilayer Perceptron. Patrick gave us a working model using the MNIST database of handwritten digits. The model uses a Restricted Boltzmann Machine to reduce the dimensionality of the data and then a Multilayer Perceptron to classify the digits.

I was able to achieve an out-of-sample accuracy of almost 96%. This is in line with the results of other researchers.

I made some changes to Patrick's code to help me better understand the model performance but not in a way that alters the assignment.

In [1]:

%matplotlib inline

from Mnist import *
from Supervised import *
from Unsupervised import *
import numpy as np

np.set_printoptions(precision=3, suppress=True)

np.random.seed(0)

First I load the data and divide up the data into training, validation, and test datasets.

In [2]:

mnist_use_threshold = False

mnist_num_training_examples = 10000
mnist_num_validation_examples = 5000
mnist_num_testing_examples = 5000

mnist = Mnist(mnist_use_threshold)

training_digits, training_labels = mnist.getTrainingData(mnist_num_training_examples)
validation_digits, validation_labels = mnist.getValidationData(mnist_num_validation_examples)
testing_digits, testing_labels = mnist.getTestingData(mnist_num_testing_examples)

I completed this assignment in two phases. The first phase was to calibrate the parameters of the Restricted Boltzmann Machine. I wanted to reduce the dimensionality of the data as I've done in the past with PCA. I didn't want to compromise the data and undermine the Multilayer Perceptron, so I wanted to keep the error rate as low as possible.

The parameters I used are:

In [3]:

# RBM hyperparameters:
rbm_is_continuous = True
rbm_visible_size = 784
rbm_hidden_size = 200
rbm_batch_size = 250
rbm_learn_rate = 0.25
rbm_cd_steps = 3
rbm_training_epochs = 500
rbm_report_freq = 1
rbm_report_buffer = rbm_training_epochs

Using these parameters I can create and train a Restricted Boltzmann Machine.

In [4]:

rbm_name = 'rbm_' + str(rbm_visible_size) + '_' + str(rbm_hidden_size)
rbm = Rbm(rbm_name, rbm_visible_size, rbm_hidden_size, rbm_is_continuous)
rbm.train(training_digits, validation_digits, rbm_learn_rate, rbm_cd_steps,
          rbm_training_epochs, rbm_batch_size, rbm_report_freq, rbm_report_buffer)

Two figures. The first figure shows the decline of training and validation error as model training takes place. The x axis is numbered from 0 to 500 epochs and the y axis shows the base 10 log of the error. The 2 plotted lines show both errors decline quickly during the first 50 epochs and slowly during the remaining 450. The validation error is consistently lower than the training error. The lower figure shows a sampling of 10 handwritten digits input to the model and their outputs, which look very similar to the inputs.

I changed Patrick's code to plot the log error to better see improvements in the error rate. I see that after a few hundred epochs the error rate barely diminishes. Interestingly, the validation error is smaller than the training error.

The display of several handwritten digits is helpful. The top layer are original database digits. The bottom layer are digits that have been encoded and decoded. They look almost the same as the original. This means I was able to reduce the dimensionality of the data without compromising the integrity of the handwritten digits.

Next I use the Restricted Boltzmann Machine to encode the dataset. This effectively reduces the dimensionality of the data to 200 data points. These encodings will become the input to the Multilayer Perceptron.

In [5]:

_, training_encodings = rbm.getHiddenSample( training_digits )
_, validation_encodings = rbm.getHiddenSample( validation_digits )
_, testing_encodings = rbm.getHiddenSample( testing_digits )

Below are the parameters I used for the Multilayer perceptron. There is only one hidden layer with 150 nodes. This is somewhat smaller than the 200-value input layer.

In [6]:

mlp_layer_sizes = [ rbm_hidden_size, 150, 10 ]
mlp_batch_size = 250
mlp_learn_rate = 0.5
mlp_training_epochs = 2000
mlp_report_freq = 1
mlp_report_buffer = mlp_training_epochs

Next I create and train the Multilayer Perceptron.

In [7]:

mlp_name = 'mlp_' + '_'.join( str(i) for i in mlp_layer_sizes )
mlp = Mlp( mlp_name, mlp_layer_sizes, 'sigmoid' )
mlp.train( training_encodings, training_labels, validation_encodings,
          validation_labels, mlp_learn_rate, mlp_training_epochs,
          mlp_batch_size, mlp_report_freq, mlp_report_buffer )

Two figures. The first figure shows the decline of training and validation error as model training takes place. The x axis is numbered from 0 to 2000 epochs and the y axis shows the base 10 log of the error. The 2 plotted lines show both errors decline quickly during the first 100 epochs and slower after that. The validation error flattens out after 250 epochs but the training error continues to decline closer to zero. The lower figure is a bar chart showing the percent accuracy of each of the 10 digits. The x axis is numbered from 0 to 9 and the y axis is from 0 to 100 percent. Each bar in the char is above 95%.

After 500 or so iterations the training error declines while the validation error remains steady. It seems that the model is starting to overfit the data.

How well does the model work on an out-of-sample dataset? I can measure to find out.

In [8]:

testing_guesses = mlp.predict( testing_encodings )
testing_error = mlp.getErrorRate( testing_labels, testing_guesses )
testing_accuracy = mnist_get_accuracy( testing_labels, testing_guesses )
print ('Final Testing Error Rate: %f' % ( testing_error ))
print ('Final Testing Accuracy: %f' % ( testing_accuracy ))

Final Testing Error Rate: 0.007759
Final Testing Accuracy: 0.956400

Almost 96%. That's reasonably good for this dataset, although I recall fitting neural networks to this in the past and getting closer to 99%. To achieve that here I think I'd need to do more than adjust the model hyperparameters. There are regularization techiques for neural networks such as dropout that can help with overfitting, and I think they'd be useful here.

Comments