LMGTFY is the new RTFM

Some of you may have seen an earlier version of this graph, which I uploaded to Facebook:

Box plot

Twitter authorship attribution results

You can see in this graph that the support vector machine (SVM) worked really well, and correlation feature subset (CFS) gave really lousy results, and even my best attempt only averaged about 77% accuracy, but as Arlo Guthrie said, that’s not what I came to tell you about.

I came to talk about how I learned to do that.

Generating that graph involved a bunch of different software tools. I used the Twitter API to collect the data, Python for feature extraction, WEKA for feature reduction and classification, and R for statistical analysis, not to mention LaTeX for typesetting my paper. Among these tools, I had used Python (and LaTeX) before, and a teensy bit of R. At an early stage of my data analysis, I was sitting at my computer and I realized I had no idea how to do what I was planning to do. So how did I learn it?

I googled it.

No, seriously. Most of what I know about LaTeX comes from Wikibooks, the Python documentation is all online, there are online R tutorials, and if I ever have a problem it’s probably already been answered on Stack Overflow. I did get a book to learn WEKA, and I originally started learning Python from the NLTK book, but for the most part, gone are the days of paging through fat computer manuals.

So if someone’s asking questions that they should be able to answer on their own, rather than telling them to RTFM, maybe a better response is LMGTFY.

Tagged with: , , ,
Posted in Research process

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s