Paul Bodily About Courses Research Outreach Tips for Communicating Teaching Philosophy Vitae

Backpropagation Lab

Review all requirements before starting to ensure your implementation will support all of them. Submit your report as a single PDF file with appendices. Where MSE calculations are required, you can use either the raw output (from the activation function) or the one-hot representation of the output to calculate MSE. Be sure to document what you did in your report. To calculate MSE on a test dataset, my suggestion would be to keep some static class members to track the MSE on the test set as the predictInstanceLabelsFromFeatures function gets repeatedly called and which prints back the SSE and MSE after each call to predictInstanceLabelsFromFeatures so that the last time it prints, you know the MSE on the test set. You'll get a more interesting graph and probably train slightly longer (more fine-tuning) if you use the raw outputs to calculate MSE, which I think is probably better. Where reasonably possible, your report should have some discussion, table, or graph related to each of the points below:

  1. Implementation (40%) Implement the backpropagation algorithm. This is probably the most intense lab, so start early! This and the remaining models are implementations you should be able to use on real world problems in your careers, group projects, etc. Your implementation should include:
    • the ability to create an network structure with at least one hidden layer and an arbitrary number of nodes;
    • random weight initialization (small random weights with 0 mean);
    • on-line/stochastic weight update;
    • a reasonable stopping criterion;
    • training set randomization at each epoch; and
    • an option to include a momentum term.
  2. Iris.arff (15%) Use your backpropagation learner, with stochastic weight updates, for the iris classification problem.
    • Use one layer of hidden nodes with the number of hidden nodes being twice the number of inputs.
    • Always use bias weights to each hidden and output node.
    • Use a random 75/25 split of the data for the training/test set and a learning rate of .1.
    • Use a validation set (VS) for your stopping criteria for this and the remaining experiments.
    • With a VS, do not stop the first epoch that the VS does not get an improved accuracy. Rather, keep track of the best solution so far (BSSF) on the VS and consider a window of epochs (e.g. 5) and when there has been no improvement over BSSF in terms of VS MSE for the length of the window.
    • Create one graph with the MSE (mean squared error) on the training set, the MSE on the VS, and the classification accuracy (% classified correctly) of the VS on the y-axis, and number of epochs on the x-axis. (Note two scales on the y-axis). Results for different measurables should be shown with a different color, line type, etc. (Showing this all in one graph is best, but if you need to use two graphs, that is OK).
    • Typical backpropagation accuracies for the Iris data set are 85-95%.
  3. Vowel.arff (15%) For 3-5 you will use the vowel dataset, which is a more difficult task (what would the baseline accuracy be?). A more complete description of this dataset is available here.
    • Typical backpropagation accuracies for the Vowel data set are ~60%.
    • Consider carefully which of the given input features you should actually use (Train/test, speaker, and gender?) and discuss why you chose the ones you did.
    • Use one layer of hidden nodes with the number of hidden nodes being twice the number of inputs.
    • Use random 75/25 splits of the data for the training/test set.
    • Try some different learning rates (LR). For each LR find the best VS solution (in terms of VS MSE). Note that each LR will probably require a different number of epochs to learn. Also note that the proper approach in this case would be to average the results of multiple random initial conditions (splits and initial weight settings) for each learning rate. To minimize work you may just do each learning rate once with the same initial conditions. If you would like you may average the results of multiple initial conditions (e.g. 3) per LR, and that obviously would give more accurate results. The same applies for parts 4 and 5.
    • Create one graph with MSE for the training set, VS, and test set, at your chosen VS stopping spot for each tested learning rate on the x-axis.
    • Create another graph showing the number of epochs needed to get to the best VS solution on the y-axis for each tested learning rate on the x-axis.
    • In general, whenever you are testing a parameter such as LR, # of hidden nodes, etc., test values until no more improvement is found. For example, if 20 hidden nodes did better than 10, you would not stop at 20, but would try 40, etc., until you saw that you no longer got improvement.
  4. Hidden Node Count (15%) Using the best LR you discovered, experiment with different numbers of hidden nodes.
    • Start with 1 hidden nodes, then 2, and then double them for each test until you get no more improvement in accuracy.
    • For each number of hidden nodes find the best VS solution (in terms of VS MSE).
    • Create one graph with MSE for the training set, VS, and test set, on the y-axis and # of hidden nodes on the x-axis.
  5. Momentum (15%) Try some different momentum terms in the learning equation using the best number of hidden nodes and LR from your earlier experiments.
    • Graph as in step 4 but with momentum on the x-axis and number of epochs until VS convergence on the y-axis. You are trying to see how much momentum speeds up learning.

Note: Don't forget the small examples for debugging and other hints! You may also discuss and compare results with classmates.

Deliverables (zipped as a single file):

Acknowledgments

Thanks to Dr. Tony Martinez for help in designing the projects and requirements for this course.