CS 4478/5578 Evaluation Methods Detail Page

You will implement the following four approaches to predict future accuracy (details below):

Training Set Method

·          Use full data set to train a model

·          Compute accuracy on same dataset

Static Split Test Set Method

 

Two distinct datasets (ARFF files) are made available to the machine learner: a training set and a test set.

·          The training set is used for learning/training (i.e., inducing a model), and

·          The test set is used exclusively for testing

Random Split Test Set Method

·          A single data set is made available to the machine learner

·          The data is split (by the learner) into a training and a test set, such that:

o         Instances are randomly assigned to either set – Do this by randomizing the data set before the split.  Stratification (where the distribution of instances with respect to the target class is the same in both sets) is optional

o         x% of instances are used for training and the remainder for testing (x is input by the user)

N-fold Cross-validation Method

·          Partition dataset (call it D) into N (input by user) equally-sized subsets S1, ..., SN

·          For k = 1 to N

o         Let Mk be the model induced from D - Sk

o         Let nk be the number of instances of Sk correctly classified by Mk

·          Return (n1+n2+...+nN)/|D|