Archive for the 'dissertation' category
So I said I wouldn't analyze the data from '13 because I'm already underwater, I have plenty, and I need to get done. However, I figured that I had already figured out oauth and using twitteR so no harm in running a couple commands, stashing the data somewhere, and maybe pulling it out if there's a specific question or maybe later when turning my dissertation into an article (should I live so long!).
I thought well, it will give me about 2 weeks worth, but maybe I should give it a try while the conference is still going to make sure everything works ok. Well crap. I'm getting anywhere from 99 to 1000 tweets per query... and that's covering like at most 3 hours... and I can't seem to fill in the rest. Bummer.
The search has a sinceid but no untilid... and it has since and until for dates - but full days not down to the hour or minute or anything. So I'm really able to get 9pm-midnight GMT. Huh.
I watched Kim Holmberg's fabulous webinar today, so I'm going to try something he suggested to see if that helps. Otherwise, I kinda need to run the search throughout the day, which I can do if I work from home, but I will have missed the most important days of the conference. It's tapering off now. Sigh.
I can't say I recommend listening to this as it will be well over an hour and there was a lot of (very helpful) feedback. Maybe flipping through the slides is the best approach.
BTW I think I butchered the description of arsenicscience ... please don't hold that against me!
I should be more positive: picking up steam. Or picking up steam! But it's really pretty tenuous with the busyness at home and whatnot.
Here's where I am now... still (months late) analyzing the actual tweets as we head into the next conference. I'm going to collect and save this year's even if I don't plan to analyze it.
Started interviews and interviewee tweeted about the interview which I think is awesome, but Bo (*) would frown upon.
Research talk Tuesday (eek!) for CASCI. I had hoped it would be livestreamed but I haven't heard about that to share the details. I will post the slides to SlideShare.
So anyway, I'm still alive. Lots of new papers on blogging, incidentally. Here are 3 (one via Jason Priem the others... I don't remember where I found them?)
- Hank, C. (2013). Communications in blogademia: An assessment of scholar blogs’ attributes and functions New Review of Information Networking, 18(2), 51-69. doi:10.1080/13614576.2013.802179
- Luzón, M. J. (2013). Public communication of science in blogs: Recontextualizing scientific discourse for a diversified audience. Written Communication, 30(4), 428-457. doi:10.1177/0741088313493610 <- reading this now, and I like it, but the research methods are sort of inadequately described? I mean, I guess I'm not used to reading rhetoric papers, maybe this is typical? worth a blog post for sure
- Mewburn, I., & Thomson, P. (2013). Why do academics blog? an analysis of audiences, purposes and challenges. Studies in Higher Education, 38(8), 1105-1119.
oh and this one that discusses blogging while you research
Olive, R. (2013). ‘Making friends with the neighbours’: Blogging as a research method. International Journal of Cultural Studies, 16(1), 71-84. doi:10.1177/1367877912441438
* Kazmer, M. M., & Xie, B. (2008). Qualitative interviewing in internet studies: Playing with the media, playing with the method. Information, Communication & Society, 11(2), 257-278. doi:10.1080/13691180801946333
I'm hesitant to post this because I'm not sure how it rates in the terms of service, but here's what seems to be working.
When last I posted, I had overcome some oauth hurdles to successfully pull tweets using the API only to find that the tweets I wanted (#AGU11, #AGU2011, #AGU12, #AGU2012) were not covered by the API. Crap on a banana as my old boss would say.
I did find that if you do a search in the interface and scroll a whole bunch you can actually get all the tweets on the screen from back that far. So I did that and I copied into Excel.
Unfortunately I ended up with something like this:
Open Refine to the rescue, sorta. Open Refine used to be called Google Refine. It just helps with data munging. I kinda didn't really get it until I tried it for this because I assumed I could do all this as easily in Excel, but that's not the case.
I'm tempted to actually post the json file so the steps could be repeated, but so far I haven't actually found that I can create a clean enough script to run from one end to the other. Nevertheless, I'm willing to post if there's interest. Here's what I've done so far:
- Upon import, got rid of blank rows
- transposed to 3 columns
- deleted the column with "expand" , "view photo", or "view conversation" in it
- split the first column at @ to get a new column with just the twitter user name
- split the twitter user name column at the first space to get just the twitter user name and a new column with the date*
- copied the tweet content to a new column
- filtered for RT OR MT and starred those - might do something with that later... probably will omit for most of the work
- split at @, for each of those columns (up to 6!)
- for each of those columns split at space and limited to 2 columns, then deleted the second column. I tried this GREL fanciness that would have done this in one shot, and the preview looked good, but it said that it wasn't an object that something or other couldn't end up in a cell.
So here are my columns:
Name , Twitter handle, date, tweet, first@, second@, third@....
So it would be quite easy to convert this to one of the UCInet files types that has you list nodes that are connected. As soon as I do this for the other files and combine the 2011 files and the 2012 files.
*I'm missing a lot of what I would have gotten easily with the old API like a full date time stamp, id, geocodes, what software was used, mentions, etc.
So I'm back again on the dissertation... finally... and of course since last time I got Twitter data, the original service I used is gone, the api has changed (multiple times), and now oauth authentication is required even for trivial searches. Sigh.
For reference here are some things that are helping me:
- The vignette is missing from cran, but there's a copy in the wayback machine at: http://web.archive.org/web/20130615052036/http://cran.r-project.org/web/packages/twitteR/vignettes/twitteR.pdf
- There's now a mailing list for TwitteR with lots of great advice: http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/
- Stack Overflow
- This page: https://sites.google.com/site/dataminingatuoc/home/data-from-twitter/r-oauth-for-twitter by Jordi Girones
- This blog post by Dave Tang: http://davetang.org/muse/2013/04/06/using-the-r_twitter-package/
I'm using RStudio and the funny thing was that I ran the whole script first which of course didn't work because it needs you to stop, take a url, put it into the browser, and paste back in the PIN. So I figured out that I needed to do that... but in RStudio you can't copy the url... sigh... so I had to type this monstrosity... and then I couldn't figure out how to enter the PIN, but duh, I just put it in the script window and hit run for it. Then of course my computer had rebooted because it came from the factory with the setting to basically let windows updates do whatever they wanted. Anyhoo, finally I recovered and got it all working...
And then I tried the hashtag from the AGU2012 conference and zero responses. Huh? So then searched for "#icanhazpdf" and got 25... so then I looked at the API documentation and crap, crap, crap!!! you can only get like 6-9 days worth of tweets. Sigh. Time to regroup.
FWIW, there's been a ton of cool literature recently and I'd like to get into discussing that but I need to make some headway on this analysis bit first instead of iterating on the lit review. Sigh.
edited to link urls and to remove a random link to an ebay store where you can buy a christening gown. sigh.