CS 4478/5578 Evaluation Methods Detail Page

You will implement the following four approaches to predict future accuracy (details below):

Training set approach
Static Split Hold-out set Method
Random Split Hold-out set Method (with parametrizable training set size)
N-fold cross-validation (for arbitrary N)

Training Set Method

· Use full data set to train a model

· Compute accuracy on same dataset

Static Split Test Set Method

Two distinct datasets (ARFF files) are made available to the machine learner: a training set and a test set.

· The training set is used for learning/training (i.e., inducing a model), and

· The test set is used exclusively for testing

Random Split Test Set Method

· A single data set is made available to the machine learner

· The data is split (by the learner) into a training and a test set, such that:

o Instances are randomly assigned to either set – Do this by randomizing the data set before the split. Stratification (where the distribution of instances with respect to the target class is the same in both sets) is optional

o x% of instances are used for training and the remainder for testing (x is input by the user)

N-fold Cross-validation Method

· Partition dataset (call it D) into N (input by user) equally-sized subsets S₁, ..., S_N

· For k = 1 to N

o Let M_k be the model induced from D - S_k

o Let n_k be the number of instances of S_k correctly classified by M_k

· Return (n₁+n₂+...+n_N)/|D|