General Project Guidelines

Individual project specifications are accessed through links in the schedule

Implementation Requirements

You should code up your projects in a language for which the ML toolkit has been provided.
The learning algorithms are relatively short, and you must create them from scratch. Do not use (or start from) code found on the web or in the book.
We provide a toolkit that contains the boiler plate code (I/O, accuracy calculation, etc.). You are required to use the provided toolkit.
Programming projects will be graded primarily on positive completion, correctness of results, technical accuracy in the discussion, and perceived effort and understanding.
Most parts of the projects require some clear measurable results (e.g. working program, graphs of results, etc.) and some questions where you analyze your results. If your results or answers differ significantly from appropriate norms, you will be marked off.
A few questions are more open-ended and subjective, and in these cases you are graded based on perceived effort and thought. If you do a thoughtful conscientious effort, learn the model well, and try to create a quality write-up demonstrating your learning, you will do well.
Have fun and learn while doing the projects!

Data and Other Resources

ML Toolkit
ARFF Datasets
ARFF Format Description
UC Irvine Dataset (more info about the problems used in your assignments and also useful for finding new problems)
Small examples for debugging and other hints
Evaluation Methods.

Reporting Format

Below are expectations and guidelines regarding project reports, and, in many cases, are good tips for academic writing in general:

Project reports are to be done on a word processor and be neat and professional. Good writing, grammar, punctuation, etc. are important and points will be taken off if these things are lacking.
Each written project report should be 3-5 pages, single spaced, including all graphs and figures. You should rarely need to go longer than that (the Backprop project may need an extra page).
Number each section of your paper to match the numbering in the assignment directions.
You must have some discussion for each section, though some may be brief (e.g. A section that just asks you to code up the algorithm. But you should still say a sentence or two regarding it.)
You do not need the standard intro of “Machine learning is an exciting new field...," etc. Assume that your audience is knows the basics but wants to hear about what you learned, your analysis, etc.
Figures are not to be hand-drawn and should be large enough to be easily legible.
Label all axes on graphs!
Be careful to use the right type of graph in reporting results.
Example graphs (yours should look like these)
Keep numbers reported to consistent (3 or 4) significant digits. Read the directions carefully and do and discuss each subtask.

Submission and Feedback

The following are guidelines and expectations regarding submission:

All assignments are submitted through Moodle.
You should upload a .zip file containing two things: your write-up and your code.

The write-up should be a PDF file (make sure you look at your PDF before submitting and check that it is clear and readable, including readable fonts on graphs, etc.).
The code can be submitted as text if just one file or as a zip or tar file.

When your projects are graded you can go to Moodle to access feedback. I expect you to review your mistakes and learn from them as they will be penalized more highly if consistently repeated.

A Final Suggestion

The purpose of the report is to invite deeper thinking about what is going on. Be detailed in describing both what you did and in describing your thoughts as you try to make sense of the results. You should have some discussion for all subtasks and particularly discussion of results, graphs, etc. The report for each subtask might go something like:

In this task I divided the randomly shuffled training set into a training set (70%) and a test set (30%). I tried different values of parameter x from n-m and reported training and test set accuracy in Table y. I kept testing until m because... The accuracy on the training set was highest for values between 2-4 while for the test set the accuracy was highest for values 5 and 6. (Note: Just reporting results without discussion will not receive full credit. Thus, your discussion should also include sentences like the following.) This is because... Note that the... I was surprised to observe... I am not sure why..., but my hypothesis is that...

Acknowledgments

Thanks to Dr. Tony Martinez for help in designing the projects and requirements for this course.