Project Requirements

Individual project specifications are accessed through links in the schedule

Project reports are to be done on a word processor and be neat and professional. Good writing, grammar, punctuation, etc. are important and points will be taken off if these things are lacking. Each written project report should be 3-5 pages, single spaced, including all graphs and figures. You should rarely need to go longer than that (the Backprop project may need an extra page). Number each section of your paper to match the numbering in the assignment directions. You must have some discussion for each section, though some may be brief (e.g. A section that just asks you to code up the algorithm. But you should still say a sentence or two regarding it.) You do not need the standard intro of “Machine learning is an exciting new field...," etc. Assume that your audience is knows the basics but wants to hear about what you learned, your analysis, etc. Figures are not to be hand-drawn and should be large enough to be easily legible. Keep numbers reported to consistent (3 or 4) significant digits. Read the directions carefully and do and discuss each subtask. Label all axes on graphs! Be careful to use the right type of graph in reporting results.

All assignments are submitted through Moodle. You should upload a .zip file containing two things: your write-up and your code. The write-up should be a PDF file (make sure you look at your PDF before submitting and check that it is clear and readable, including readable fonts on graphs, etc.). The code can be submitted as text if just one file or as a zip or tar file. Make sure you submit both for each project. When your projects are graded you can go to Moodle and access your project which will have feedback. I expect you to review your mistakes and learn from them as they will be penalized more highly if consistently repeated.

You should code up your projects in Java. The learning algorithms are relatively short and you must create them from scratch. Do not use (or start from) code found on the web or in the book. We provide a toolkit that contains the boiler plate code (I/O, accuracy calculation, etc.). You are required to use the provided toolkit. Programming projects will be graded primarily on positive completion, correctness of results, technical accuracy in the discussion, and perceived effort and understanding. Most parts of the projects require some clear measurable results (e.g. working program, graphs of results, etc.) and some questions where you analyze your results. If your results or answers differ significantly from appropriate norms, you will be marked off. A few questions are more open-ended and subjective, and in these cases you are graded based on perceived effort and thought. If you do a thoughtful conscientious effort, learn the model well, and try to create a quality write-up demonstrating your learning, you will do well. Have fun and learn while doing the projects!

Instructions and source code for the toolkit can be found here.
A collection of data sets already in the ARFF format can be found here.
Details on the ARFF data format are found here.
The complete and continually updating UC Irvine Data Set is here. This is a good place to go to study more about the problems used in your assignments and also to find new problems.
In order to help you debug projects we have included some small examples and other hints with actual learned hypotheses so that you can compare the results of your code and help ensure that your code is working properly.
Here is a set of example graphs to give you a sense of what the graphs should look like.
For a summary of evaluation methods (i.e., ways to measure accuracy), see this this.

The main goal of these projects is to help you gain a better understanding of how the algorithms work and their potential to be applied on real tasks. Coding up the projects and doing the required subtasks are an important part of this. However, your learning will be diminished if you just do the rote assignment without trying to think hard about what is going on. That is why we also ask you to discuss and analyze each subtask. This will help you to a) better understand the algorithms and their potential (the main goal), and b) better demonstrate to the grader that you do understand. You may make your report flow like a paper would, or you may just treat each subtask independently. When reporting a subtask this is an opportunity for you to explain (teach) what you did and demonstrate that you understand (or are trying your best to understand) what the algorithm is doing. Describe the subtask, your results, and then discuss what happened, including pointing out interesting or unexpected observations. For some subtasks the observations may seem obvious. Good! You understand it. But still demonstrate to us your understanding by explaining it. For other subtasks you may still be somewhat confused by the results, but by doing your best to explain what is happening, you learn more. Grading is generous if it is obvious that you are trying your best to understand and explain. Coding up and doing the project takes a certain amount of time no matter what. Taking a little more time to think through and explain what you just did significantly increases your understanding and retention. As you strive to find interesting observations and insights in the experiments, your learning and enjoyment of the project will increase.

You should have some discussion for all subtasks and particularly discussion of results, graphs, etc. The report for each subtask might go something like:

In this task I divided the randomly shuffled training set into a training set (70%) and a test set (30%). I tried different values of parameter x from n-m and reported training and test set accuracy in Table y. I kept testing until m because... The accuracy on the training set was highest for values between 2-4 while for the test set the accuracy was highest for values 5 and 6. (Note: Just reporting results without discussion will not receive full credit. Thus, your discussion should also include sentences like the following.) This is because... Note that the... I was surprised to observe... I am not sure why..., but my hypothesis is that...

Acknowledgments

Thanks to Dr. Tony Martinez for help in designing the projects and requirements for this course..