Review all requirements before starting to ensure your implementation will support all of them. Submit your report as a single PDF file with appendices. Where reasonably possible, your report should have some discussion, table, or graph related to each of the points below:

- (40%)
**Implementation**: Correctly implement the perceptron learning algorithm using the tool kit (or your variation of the toolkit if you do your own). Accuracy will be assessed based on how well your results agree with intuition and provided sanity checks. - (10%)
**ARFF Datasets**: Create 2 datasets in ARFF format:- One should be linearly separable and the other should not. (5% each)
- Each should have 8 instances using 2 real valued inputs (ranging between -1 and 1).
- Each should have 4 instances from each of two classes.
- Include these two ARFF files
**as appendices**in your report (doesn't count against page limit).

- (10%)
**Training**: For each of your 2 datasets, train on the entire set with the Perceptron Rule.- Your model should stop training when there has been no significant improvement in accuracy over a number of epochs (e.g. 5).
- You should
**not**merely stop on the first epoch with no improvement. - You should
**not**use a fixed maximum number of training epochs. - Remember that weights/accuracy do not usually change monotonically.
**In your report, describe your specific stopping criteria.**

- (10%)
**Learning Rates**: Also for each of your datasets, try different learning rates.- Report
**in tabular format**a) each learning rate you tried, b) the final accuracy with that learning rate, and c) how many epochs were required for training. - Note that for this lab, learning rate will have minimal effect as compared with the Backpropagation lab.

- Report
- (10%)
**Plotting Data and Functions**: Create a graph for each of the two datasets you created above.- Plot the (x,y) points for each dataset on a separate graph.
- Use different labels to represent different output classes.
- Using the final weights from training on each dataset (use a LR of 0.1 for this and all subsequent requirements), derive the equation for the learned decision boundary represented by the weights. Show your work in deriving this equation. Remember, the equation for the line comes from w_x * x + w_y * y + w_b * 1 = 0, where w_x and w_y and w_b are your learned weights. From that you can convert to slope-intercept format and draw the line.
- Graph this line for each graph.
- Indicate which output label is predicted for each side of the decision boundary.
- For all graphs always label the axes!

- (20%)
**The Voting Problem**: Use the perceptron rule to learn this version of the voting task.- FYI, this is an edited version of the standard voting set with all “don’t know” values replaced with the most common value for the particular attribute.
- Randomly split the data into 70% training and 30% test set (the toolkit has a commandline option for this).
- Try it five times with different random 70/30 splits (as long as you don't have a fixed random seed, the toolkit should do this for you)..
**Create a table that reports for each split a) the final training accuracy, 2) the final test set accuracy, and c) the # of epochs required. Also report as a final row the average of these values for the 5 trials in the table.**- You should update weights after every instance (as opposed to batch training).
- Shuffle the data order after each epoch.
**Looking at the weights, explain what the model has learned and how the individual input features affect the result. Which specific features are most critical for the voting task, and which are least critical?****Do one graph of the average (over the 5 trials) misclassification rate on the training set vs epochs (0th – final epoch).**Note that your larger number epochs will only be averaging those trials that trained for that long.- In our helps page is some help for doing graphs. To clarify what specific graphs should look like, find examples for this and future projects here.
- As a rough sanity check, typical Perceptron accuracies for the voting data set are 90%-98%.

*Note: Don't forget the small examples for debugging and other hints! You may also discuss and compare results with classmates.*

Submit via Moodle (zipped as a single file):

- Written report (following the General Project Guidelines)
- The actual ARFF files for the two datasets you created.
- Your Perceptron implementation file.

**Q**: *I am a little confused on how to think about the features and labels matrix objects*

**A**: The MLSystemManager loads the whole ARFF dataset into a Matrix. It splits that Matrix into two sub-Matrix objects, one being the features, the other being the labels (labels is synonymous with targets from your homeworks). In cases where you request, say, a 70-30 split of the data, you'll see that it again splits the Matrix into other sub-Matrix objects in order to create separate train and test datasets. A Matrix is just that: a matrix. The Matrix passed into the "train" function as "features" is a Matrix with the feature values (a row for each instance, a column for each feature) and the "labels" are target values (a row for each instance, and a column for each target feature (i.e., probably one column); there are the same number of rows in both "features" and "labels" and the ith row of one corresponds to the ith row of the other). You should not split these Matrices further; that is all taken care of for you by the MLSystemManager. If you are still lost, I would suggest starting from the "main" function in the MLSystemManager class, just walk through the code, line by line, drilling down in functions and classes where necessary, to understand what the program is doing. You eventually find where it creates an initial Matrix object from the ARFF dataset and then where it splits that Matrix into two sub-Matrix objects.

Note that by contrast, the "predict" function (in the Perceptron class) is passed only a single instance (i.e., row) of feature values together with an array in which you should "load" your prediction (hence why you don't need to return anything; the array is passed by reference). Your prediction should be either a continuous value (if doing regression) or an integer corresponding to the output class (if doing classification).