I figured out how to get the names of all 216 El País journalists. I just had to use a different twitter module.
To begin with, I was using python-twitter, but there’s something weird about their implementation of lists. It seems like at one point there was a GetListMembers method that would allow you to get the members of a list, but in the current version it wasn’t working for me. So I went on the website, copied down the names of the first 20 people listed, and took tweets from them.
This worked OK but I’d like to have more people in my corpus. Specifically I’m worried about having some participants with a lot of data and some with comparatively less. Even if I only work with ten twitterers in the end, I’d like them to have approximately the same number of tweets and the same number of words, so that my classifier will work just as well for all the participants. If I collect from more people, there’s a better chance that I’ll be able to pick out ten who are comparable.
So instead of python-twitter, I decided to give tweepy a shot. Here’s the code I used:
auth = tweepy.BasicAuthHandler(my twitter name, my password)
api = tweepy.API(auth)
members = tweepy.Cursor(api.list_members, owner='el_pais', slug='el-pais').items()
names = [user.screen_name for user in members]
and it worked! Now I’ve got so much data to collect that I’m running up against Twitter’s rate limiting. The API only allows you to make 150 requests an hour, and with 216 names on my list, it’s going to take a couple of tries. So I’ve been working on code that will save my data collection in progress and let me tack on more once an hour has passed and my rate limit resets.