Some of you may have seen an earlier version of this graph, which I uploaded to Facebook:
You can see in this graph that the support vector machine (SVM) worked really well, and correlation feature subset (CFS) gave really lousy results, and even my best attempt only averaged about 77% accuracy, but as Arlo Guthrie said, that’s not what I came to tell you about.
I came to talk about how I learned to do that.
Generating that graph involved a bunch of different software tools. I used the Twitter API to collect the data, Python for feature extraction, WEKA for feature reduction and classification, and R for statistical analysis, not to mention LaTeX for typesetting my paper. Among these tools, I had used Python (and LaTeX) before, and a teensy bit of R. At an early stage of my data analysis, I was sitting at my computer and I realized I had no idea how to do what I was planning to do. So how did I learn it?
I googled it.
No, seriously. Most of what I know about LaTeX comes from Wikibooks, the Python documentation is all online, there are online R tutorials, and if I ever have a problem it’s probably already been answered on Stack Overflow. I did get a book to learn WEKA, and I originally started learning Python from the NLTK book, but for the most part, gone are the days of paging through fat computer manuals.