I’m posting this in hopes that someone with experience in any/all of the above or maybe Perl, can point out that I’m doing something stupid or have overlooked something obvious. If nothing else, you might read this to see what not to try.
Here’s the issue: it’s totally obvious that I need to look at the other tweets that were sent by #agu10 tweeters (the ones not marked with the hash tag) if I want to understand how Twitter was used at the meeting. But it’s now five months later and there are 860 of them (although I would be fine with looking at the most prolific non-institutional tweeters).
I first looked at the Twitter API and I tried just adding terms to URLs and got the recent timelines for a user at a time but I couldn’t see a way to get a user’s timeline for a set period of time (the conference time period +a week on each end, or so).
I asked two experts and they both said that you couldn’t combine the user timeline with a time period.
Darn. So my next idea was to see if I could actually access someone’s timeline that far back through the regular interface. I tried one of the more prolific tweeters and I could. Ok, so if I can pull down all of their tweets, then I could pick out the ones I wanted. Or, even better, I could also look at the evolution of the social network over time. Did people meet at the meeting and then continue to tweet at each other or are these people only connected during the actual meeting? Did the network exist in the same way before the meeting?
I was looking for ways to automate this a bit and I noticed that there were things already built for Perl and for R. I used Perl with a lot of handholding to get the commenter network for an earlier paper and I used R for both that same article and in lieu of STATA for my second semester of stats. I’m not completely comfortable with either one and I don’t always find the help helpful. I decided to start with R.
The main package is twitteR by Jeff Gentry. I updated my installation of R and installed and loaded that package and the dependencies. First thing I did was to get my own standard timeline:
testtweets <- userTimeline("cpikas")
Then I typed out the first few to see what I got (like when you’re using DIALOG)
And I saw my tweets in the format:
I checked the length of that and got 18 – the current timeline was 18 items. I tried the same thing substituting user id but that didn’t work. So then I tried to retrieve 500 items and that worked fine, too.
testlonger <- userTimeline ("cpikas", n=500)
Great. Now, let me see the dates so I can cut off the ones I want. Hm. Ok, let’s see, how to get the other columns. What type of object is this anyhow? The manual is no help. I tried some things with object$field. No joy. Tried to edit. no joy – it was upset about the < in the image url. And it was also telling me that the object was of type S4. The manual said it wasn’t but I can’t argue if that’s what it’s reading. I somehow figured out it was a list. I tried object name [] - null. Then I eventually tried
Hrumph. It says 1 slot. So as far as i can tell, it’s a deprecated object type and it didn’t retrieve or keep all of the other information needed to narrow by date.
When googling around, I ran across this entry by Heuristic Andrew on text mining twitter data with R. I didn’t try his method with the xml package yet (may try that). I did try the package that was listed in the comments tm.plugin.webcorpus by Mario Annau. That does get the whole tweet and put the things in slots the right way (object$author), but it looks like you can only do a word search. Oh wait, this just worked:
testTM <- getMeta.twitter('from:cpikas')
But that’s supposed to default to 100 per page, 100 things returned and it only returned 7 for me. I guess the next thing to try is the XML version unless someone reading this has a better idea?
edit: forgot the copy paste. When I tried to just look at the tweets i wanted on screen and then copy them into a text document it crashed firefox. who knows why