Thing-a-day #12: One step back

Yesterday I wrote about my #phdchat twitter corpus, but today I got further in my reading and I decided I can’t use it. My project involves replicating Antonio Rico-Sulayes’s work, so I spent the morning reading his 2012 dissertation, “Quantitative authorship attribution of users of Mexican drug dealing related online forums.” I was taking notes on methods — feature reduction techniques, classification algorithms — but what I hadn’t realized is how important it is that he used Spanish-language data. If I use English data like #phdchat, then I can’t really say I’m replicating his methods. (For example, one of the features he tags is multi-word prepositions like después de, “after.”)

So I’ve been hunting around for a hashtag I could search on that would give me a list of people who tweet in Spanish about similar topics. I did a little googling, and I also asked the hive mind:

I got a lot of responses, but mainly they’re hashtag games like #SiHarryPotterFueraMexicano rather than recurring interest groups like #phdchat. Maybe tomorrow I’ll see what I can do with #mamastuiteras. I like the idea of reading a study that used drug-dealing related forum posts, and replicating it with data from mommy bloggers.

Tagged with:
Posted in thing-a-day

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: