Thing-a-day #11: Twitter data set

Continuing my forensic linguistics twitter project.

Rather than going with my first ten randomly selected participants in #phdchat, I took twenty randomly selected participants to see what I would get. Out of those twenty I eliminated eight, either because they were institutions, not individual tweeters (like @GdnHigherEd I wrote about last time), or because for some reason my data collection script gave me fewer than 100 tweets. I took the other twelve tweeters, with between 120-190 tweets each, and saved them as a new data set.

This is the data set I’m going to use for my computational experiments, at least to begin with. So I’ve spent the early afternoon thinking about how to segment the data into training and testing samples, and reading up on how to do Principal Components Analysis in R.

Tagged with: , , ,
Posted in thing-a-day

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: