Yesterday I wrote about my #phdchat twitter corpus, but today I got further in my reading and I decided I can’t use it. My project involves replicating Antonio Rico-Sulayes’s work, so I spent the morning reading his 2012 dissertation, “Quantitative authorship attribution of users of Mexican drug dealing related online forums.” I was taking notes on methods — feature reduction techniques, classification algorithms — but what I hadn’t realized is how important it is that he used Spanish-language data. If I use English data like #phdchat, then I can’t really say I’m replicating his methods. (For example, one of the features he tags is multi-word prepositions like después de, “after.”)
So I’ve been hunting around for a hashtag I could search on that would give me a list of people who tweet in Spanish about similar topics. I did a little googling, and I also asked the hive mind:
Anyone know a recurring twitter chat hashtag, like #phdchat, that is used by Spanish-language tweeters?
— Daniel Ginsberg (@NemaVeze) April 18, 2013
I got a lot of responses, but mainly they’re hashtag games like #SiHarryPotterFueraMexicano rather than recurring interest groups like #phdchat. Maybe tomorrow I’ll see what I can do with #mamastuiteras. I like the idea of reading a study that used drug-dealing related forum posts, and replicating it with data from mommy bloggers.