Paul Bodily About Courses Research Outreach Tips for Communicating Teaching Philosophy Vitae

Group Project Proposal

Come up with one carefully proposed idea for a possible group machine learning project, that could be done this semester. This proposal should be one page long (no more, no less). The page should be submitted as lastname_firstname_mlproposal.pdf (it should be a PDF). The proposal should start with a descriptive title of a few words that includes what your model is predicting, your name, and to what extent you would want to be part of this project if chosen. Your page will then have the following 3 paragraphs:

  1. Description of the project (20%): Your description should clearly state the background necessary to understand the problem of interest. It should then state concisely what the problem of interest is (i.e., what precisely will the model be trained to predict).
  2. What features the data set would include (40%): As part of paragraph 2 give one fully specified example of a data set instance based on your proposed features, including a reasonable representation (continuous, nominal, etc.) and value for each feature. And make sure you include the target output value as part of the instance. Don’t worry about normalizing for this example. The actual values may be fictional at this time. Creating an example training instance will encourage you to consider how plausible the future data gathering and representation might actually be. This training instance should include the target feature as the last column. Following is an example (not using well thought out features) of what an example instance might look like if the task was heart attack diagnosis:
  3. Heart Rate  Pain Level  BP-systolic BP-diastolic  Age  Gender  Color Numb  Heart-Attack?  
    96          8           120         80            54   F       Red   N     Yes
    
  4. How and from where would the data set be gathered and labeled (40%): This needs to be specific, providing websites for APIs, plans for how you will scrape, how big your dataset will be. If you do not know, give your best estimate. Proposals should avoid addressing problems that require signifiant manual data collection and/or labeling.

Don’t choose a task where the data set is already worked through and collected, and pretty much ready to use for training. I want you to learn by having to work through, at least to some degree, the challenging issues regarding feature selection and data gathering. Please submit this proposal as a PDF on Moodle by the due date. You may work together on the proposal and send in one proposal with all your names on it (and all members cc's) if and only if you all commit to work on this project if chosen.

The grade on this will not be based on whether your proposal is chosen or not. It will be based on whether it appears that you put in a reasonable effort to propose a plausible project and if the proposal is appropriate for a semester project based on what we have learned in class so far, and also if you included each of the items mentioned above. Sometimes your project will not be chosen for the potential list simply because one similar to it is already there, or I feel that it may be too hard to get the data, etc.

Immediately after the due date I will consider which proposals are most reasonable for the class and post a document with all of them out to each of you. Read them all at least briefly, as part of this assignment is simply to have you look over a set of possible tasks to get more of a feel for the types of things that could be done with machine learning.

After reading through them, each of you must submit 1) a ranked list of the top 4 projects you would like to be part of (if one of your choices is the project you proposed, note which), and 2) specify if you are committed to give full effort on the group project (regardless of which one you end up on). If you foresee that you may not be able to (e.g. possibly dropping the class, etc.), I need to know, so that I can put together sufficiently dependable groups.

After getting all your e-mails I will then do my best to place you in teams of 3-4 on a project you are interested in. Only a subset of the proposals will be chosen for actual projects. Note that you may not all get your first choice, but I will guarantee that if a) you are the one(s) who proposed the project, and if b) that project is chosen, and c) you put it as your first choice, then you will be on that project.

There is not a team leader; you should all work equally. When a group is chosen to work on the project, then you will all start fresh to attack the problem and create the actual set of features you will work with, which may end up being quite different from the ones proposed in the initial proposal. You may also modify somewhat the initially proposed project as needed. If you want to make major changes, run it by me first. I choose projects to e-mail out for consideration based on their potential, assuming significant upgrades on the features will occur once the group gets going, rather than just using the features suggested by the initial proposer.

Here is an example of a proposal.

Acknowledgments

Thanks to Dr. Tony Martinez for help in designing the projects and requirements for this course.