Archive for the 'dissertation' category

My poster from UM iSchool's Research Day

Mar 11 2014 Published by under dissertation

Pikas research day 2014 poster from Christina Pikas

No responses yet

Keeping up with a busy conference - my tools aren't doing it

I wrote about trying to use TwitteR to download AGU13 tweets. I'm getting fewer and fewer with my calls. I was very excited to try Webometric Analyst from Wolverhampton and described by Kim Holmberg in his ASIST webinar (BIG pptx, BIG wmv).

One of the things Webometric Analyst will do is do repeated searches until you tell it to stop. This was very exciting. But I tried it and alas, I think Twitter thinks I'm abusive or something because it was way throttled. Like I could see the tweets flying up on the screen at twitter.com but the search was retrieving like 6. I ran the R search mid-day today and got 99 tweets back which covered  5 minutes O_o. I asked for up to 2000, from the whole day, and had it set to retry if stopped.

Sigh.

No responses yet

AGU13 - like whoa

Dec 11 2013 Published by under dissertation

So I said I wouldn't analyze the data from '13 because I'm already underwater, I have plenty, and I need to get done. However, I figured that I had already figured out oauth and using twitteR so no harm in running a couple commands, stashing the data somewhere, and maybe pulling it out if there's a specific question or maybe later when turning my dissertation into an article (should I live so long!).

I thought well, it will give me about 2 weeks worth, but maybe I should give it a try while the conference is still going to make sure everything works ok.  Well crap. I'm getting anywhere from 99 to 1000 tweets per query... and that's covering like at most 3 hours... and I can't seem to fill in the rest. Bummer.

The search has a sinceid but no untilid... and it has since and until for dates - but full days not down to the hour or minute or anything. So I'm really able to get 9pm-midnight GMT. Huh.

I watched Kim Holmberg's fabulous webinar today, so I'm going to try something he suggested to see if that helps. Otherwise, I kinda need to run the search throughout the day, which I can do if I work from home, but I will have missed the most important days of the conference. It's tapering off now. Sigh.

No responses yet

CASCI Talk about dissertation work

Nov 27 2013 Published by under dissertation

I can't say I recommend listening to this as it will be well over an hour and there was a lot of (very helpful) feedback. Maybe flipping through the slides is the best approach.

BTW I think I butchered the description of arsenicscience ... please don't hold that against me!

One response so far

Dissertation ... picking up steam?

Nov 23 2013 Published by under dissertation

I should be more positive: picking up steam. Or picking up steam! But it's really pretty tenuous with the busyness at home and whatnot.

Here's where I am now... still (months late) analyzing the actual tweets as we head into the next conference. I'm going to collect and save this year's even if I don't plan to analyze it.

Started interviews and  interviewee tweeted about the interview which I think is awesome, but Bo (*) would frown upon.

Research talk Tuesday (eek!) for CASCI. I had hoped it would be livestreamed but I haven't heard about that to share the details. I will post the slides to SlideShare.

So anyway, I'm still alive. Lots of new papers on blogging, incidentally. Here are 3 (one via Jason Priem the others... I don't remember where I found them?)

  • Hank, C. (2013). Communications in blogademia: An assessment of scholar blogs’ attributes and functions New Review of Information Networking, 18(2), 51-69. doi:10.1080/13614576.2013.802179
  • Luzón, M. J. (2013). Public communication of science in blogs: Recontextualizing scientific discourse for a diversified audience. Written Communication, 30(4), 428-457. doi:10.1177/0741088313493610 <- reading this now, and I like it, but the research methods are sort of inadequately described? I mean, I guess I'm not used to reading rhetoric papers, maybe this is typical? worth a blog post for sure
  • Mewburn, I., & Thomson, P. (2013). Why do academics blog? an analysis of audiences, purposes and challenges. Studies in Higher Education, 38(8), 1105-1119.

oh and this one that discusses blogging while you research

Olive, R. (2013). ‘Making friends with the neighbours’: Blogging as a research method. International Journal of Cultural Studies, 16(1), 71-84. doi:10.1177/1367877912441438

 

 

* Kazmer, M. M., & Xie, B. (2008). Qualitative interviewing in internet studies: Playing with the media, playing with the method. Information, Communication & Society, 11(2), 257-278. doi:10.1080/13691180801946333

No responses yet

The #agu12 and #agu2012 Twitter archive

I showed a graph of the agu10 archive here, and more recently the agu11/2011 archive here, and now for the agu12/2012 archive. See the 2011 post for the exact methods used to get the data and to clean it.

#agu12 and #agu2012 largest component, nodes sized by degree

#agu12 and #agu2012 largest component, nodes sized by degree

agu12 and 2012 other components no iso sized by degree n1294

#agu12 and #agu2012 other components, no isolates, nodes sized by degree

I will have to review methods to show this, but from appearances, the networks are becoming more like hairballs. In the first year, half the people were connected to theAGU and the other half were connected to NASA, but very few were connected to both. The other prominent nodes were pretty much all institutional accounts. In 2011, that started to decrease and now in 2012 you can't really see that division at all. There are the top three nodes - two the same plus a NASA robotic mission - but then there's a large second group with degrees (connections to others) around 40-80 (combined indegree and outdegree) of individual scientists.

2 responses so far

An image of the #agu2011, #agu11 Twitter archive

A loooong time ago, I  showed the agu10 archive as a graph, here's the same for the combination of agu11 and agu2011. I mentioned already about the upper/lower case issues (excel is oblivious but my graphing program cares) - this is all lower case (I first tried to correct but kept missing things so I just used Excel's =LOWER()). I also discussed how I got the data. I'm going to have to probably go back and do this for 2010 if I really want equivalent images because 1) I only kept the first @ (this has all the @) 2) I don't believe I did both 2010 and 10 so I probably missed some. For this image I did a little bit of correcting. One twitter name spelled wrong and quite a few people using the_agu or agu instead of theagu. I also took out things that were like @10am or @ the convention center.

I made this graph by taking my excel spreadsheet that was nicely username first@ second@ .... and copying that into Ucinet's dl editor and saving as nodelist1. Then I visualized and did basic analysis in NetDraw.

agu2011 and agu11 largest component, sized by degree

agu2011 and agu11 largest component, sized by degree

The largest component is 559 nodes of 740 and this time you don't see that breakdown where the people who tweeted @NASA didn't tweet @ theAGU. There were 119 isolates and other components with 2,3, and 10 nodes:

Other components, sized by degree (no isolates)

Other components, sized by degree (no isolates)

eta: oh yeah, one other little fix. I took out random punctuation at the end of user names like hi @cpikas! or hey @cpikas: or  well you get the idea

No responses yet

Current plan - possibly a bad one - for older Twitter stuff

Aug 01 2013 Published by under dissertation, information analysis

I'm hesitant to post this because I'm not sure how it rates in the terms of service, but here's what seems to be working.

When last I posted, I had overcome some oauth hurdles to successfully pull tweets using the API only to find that the tweets I wanted (#AGU11, #AGU2011, #AGU12, #AGU2012) were not covered by the API. Crap on a banana as my old boss would say.

I did find that if you do a search in the interface and scroll a whole bunch you can actually get all the tweets on the screen from back that far. So I did that and I copied into Excel.

Unfortunately I ended up with something like this:

twitterrawThere are 5 lines for each tweet and only two have any text I want. I also would like fields and I've just got two long messes. What to do?

Open Refine to the rescue, sorta. Open Refine used to be called Google Refine. It just helps with data munging. I kinda didn't really get it until I tried it for this because I assumed I could do all this as easily in Excel, but that's not the case.

I'm tempted to actually post the json file so the steps could be repeated, but so far I haven't actually found that I can create a clean enough script to run from one end to the other. Nevertheless, I'm willing to post if there's interest.  Here's what I've done so far:

  • Upon import, got rid of blank rows
  • transposed to 3 columns
  • deleted the column with "expand" , "view photo", or "view conversation" in it
  • split the first column at @ to get a new column with just the twitter user name
  • split the twitter user name column at the first space to get just the twitter user name and a new column with the date*
  • copied the tweet content to a new column
  • filtered for RT OR MT and starred those - might do something with that later... probably will omit for most of the work
  • split at @, for each of those columns (up to 6!)
  • for each of those columns split at space and limited to 2 columns, then deleted the second column. I tried this GREL fanciness that would have done this in one shot, and the preview looked good, but it said that it wasn't an object that something or other couldn't end up in a cell.

So here are my columns:

Name , Twitter handle, date, tweet, first@, second@, third@....

So it would be quite easy to convert this to one of the UCInet files types that has you list nodes that are connected.  As soon as I do this for the other files and combine the 2011 files and the 2012 files.

*I'm missing a lot of what I would have gotten easily with the old API like a full date time stamp, id, geocodes, what software was used, mentions, etc.

One response so far

TwitteR, oauth, et al again

Jul 25 2013 Published by under dissertation

So I'm back again on the dissertation... finally... and of course since last time I got Twitter data, the original service I used is gone, the api has changed (multiple times), and now oauth authentication is required even for trivial searches. Sigh.

For reference here are some things that are helping me:

I'm using RStudio and the funny thing was that I ran the whole script first which of course didn't work because it needs you to stop, take a url, put it into the browser, and paste back in the PIN. So I figured out that I needed to do that... but in RStudio you can't copy the url... sigh... so I had to type this monstrosity... and then I couldn't figure out how to enter the PIN, but duh, I just put it in the script window and hit run for it. Then of course my computer had rebooted because it came from the factory with the setting to basically let windows updates do whatever they wanted. Anyhoo, finally I recovered and got it all working...

And then I tried the hashtag from the AGU2012 conference and zero responses. Huh? So then searched for "#icanhazpdf" and got 25... so then I looked at the API documentation and crap, crap, crap!!! you can only get like 6-9 days worth of tweets. Sigh.  Time to regroup.

FWIW, there's been a ton of cool literature recently and I'd like to get into discussing that but I need to make some headway on this analysis bit first instead of iterating on the lit review. Sigh.

edited to link urls and to remove a random link to an ebay store where you can buy a christening gown. sigh.

One response so far

Solution to my Twitter API - twitterR issues

With lots of help from Bob O’Hara (thank you!), I was able to solve my problems. I am looking at the tweets around #AGU10 but it occurred to me that I wanted to know what other tweets the AGU twitterers were sending while at the meeting because some might not have had the hashtag.

Here goes:

# Get the timeline
person <- userTimeline("person",n=500)

# Check to see how many you got
length(person)

# Check to see if that is far enough back
person[[500]]$getCreated()

# Get the time it was tweeted
Time = sapply(person,function(lst) lst$getCreated() )

# Get screen name
SN = sapply(person,function(lst) lst$getScreenName() )

# Get any reply to screen names
Rep2SN = sapply(person,function(lst) lst$getReplyToSN())

# Get the text
Text = sapply(person,function(lst) lst$getText())

# fix the date from number of seconds to a human readable format
TimeN <- as.POSIXct(Time,origin="1970-01-01", tz="UTC")

# replace the blanks with NA
Rep2SN.na <- sapply(Rep2SN, function(str) ifelse(length(str)==0, NA, str))

# make it into a matrix
Data.person <- data.frame(TimeN=TimeN, SN=SN, Rep2SN.na=Rep2SN.na, Text=Text)

# save it out to csv
write.csv(Data.person, file="person.csv")

 

So I did this by finding and replacing person with the screen name in a text editor and pasting that into the script window in Rcmdr. I found that 500 was rarely enough. Some I had to request up to 3200 tweets, which is the maximum. I had to skip one person because 3200 didn’t get me back to December. It’s also worth noting the length() step. It turns out that when you ask for 500 you sometimes get 550 and sometimes get 450 or anywhere in between and it’s not because there aren’t any more. You may also wonder why I wrote the whole thing out to a csv file. I could have had a step to cut out the more recent and older tweets to have just the set there for more operations within R. I need to actually do qualitative content analysis on the tweets and I plan to do that in NVIVO9.

I didn’t do this for all 860, either. I did it for the 30 or so who tweeted 15 or more times with the hashtag. I might expand that to 10 or more (17 more people). Also, I didn’t keep the organizational accounts (like theAGU).

With that said, it’s very tempting to paste all of these data frames together, remove the text and do the social network analysis using iGraph. Even cooler would be to show an automated display of how the social network changes over time. Are there new connections formed at the meeting (I hope so)? Do the connections formed at the meeting continue afterward? If I succumb to the temptation, I’ll let you know. There’s also the the textmining package and plugin for Rcmdr. This post gives an idea of what can be done with that.

2 responses so far

Older posts »