Hi, all! We thought we'd put all our resources together in a central spot. I'll organize the materials by day.
First the PRE CAMP SURVEY!!
There were some questions about why we use n-1 in sample variance and n in variance. This is called Bessel's correction. The Wikipedia article is really interesting, and I'll look for better explanations too!
First, the algorithms. We are focusing on classification, so support vector machines (the easy version) and decision trees are our primary algorithms. If there is time, we can talk about nearest neighbors or naive Bayes. Let's do out the math by hand and with Excel to get really familiar with these ideas.
Excel is kind of painful here, isn't it? There's got to be a better way! Python to the rescue!
End-of-day evaluation for camp
If you really want to do machine learning well, you need to understand the algorithms and their strengths and weaknesses. The math under the hood is really important, and that's what we spent time on Monday and Tuesday. But if you want to do machine learning at all you have to leverage the strengths of modern computing -- that's why it's "machine" learning! We are using scikit-learn here to do all kinds of amazing things, and with this intro you should be able to go to the scikit-learn documentation and do SO MUCH MORE than we could ever cover in a one-week class.
Below, we have links to all notebooks on Github. However, in the Windows lab Google Drive is easier to use so use this great link for today!
Here are project instructions. After you look at that, fill out your preferences on this Google form.
Most of the datasets are available via our Google drive as well. Check there. For economic data, you will have to decide on some variables and gather your own -- I've got unemployment in the Drive. Also, the "labeled faces in the wild" set is really big, so find it here.
If you want a template for starting your Python code, use "Template notebook -- rename me" from the Google drive!
Drop your presentation here!
Evaluation here.