Have some Friday tidbits!
- An important biology dataset is losing NSF funding and may fold. Nor (as the article explains) is it the only one. It is impossible to overstate the desperate gravity of the data-sustainability question. Academic libraries, if we are not the white knights here—and we certainly have been in the past; witness arXiv—who is?
- On a similar theme, Yahoo pulls the plug on GeoCities. O ye researchers relying on consumer-grade web services, or new startups, have an exit strategy! Consumer-grade services die when they lose money. Jason Scott may not come charging to your rescue.
- H1N1 science depends on a public database of flu immunity data. "As the researchers acknowledge in their paper, the work couldn't have taken place if it weren't for extensive data sharing within the community of flu virus researchers." Data sharing makes possible better, faster science.
- Data and the journal article. First: if you are saving your data as PDF, stop it. Second: as I suggested to Chris on FriendFeed, there's a serious structural issue with expecting journal publishers to cope with appropriate data archiving: by the time a researcher chooses a journal to publish in, all the decisions about data gathering and representation have already been made—and they may well have been made badly. The poor journal publisher can't go back in time and fix bad decisions! In our not-yet-standardized data age, early data interventions have to happen close to the researcher, which to me means they need to happen at the institution where the research happens.
- The need for clear data licenses. I haven't talked about data licensing here, partly because the current state of intellectual-property law makes me sick at heart, but there's no question that it's an important piece of the data puzzle.
- Peer-to-peer technology used for the forces of good: BioTorrents. Datasets vary in size; for the large ones, network latency becomes a sharing problem. Torrenting won't precisely solve the problem, but it certainly increases the size range within which datasets are portable.
- Fascinating data project of the week: National Center for Ecological Analysis and Synthesis. What caught my attention is that as I read the project description, it takes public data sharing for granted. NCEAS researchers are not generating data; they are mining existing data. I'm inordinately curious about the disciplinary culture that makes this a feasible thing: what price scooping?
Whew. I have a lot more, but it's Friday.
Starting off the week with some juicy tidbits:
That should keep everyone out of trouble a while…
My del.icio.us tag overfloweth…
- A challenge to libraries from an information science professor: "I wish I could say that libraries were the obvious organization to take care of data… But… they have not been ambitious, they lack the subject area knowledge, they often lack the technical skills." What say ye, librarian Trogoolies?
- Cross-disciplinary use of data shines in this account of the decline of the Maya. "Space technology is revolutionizing archeology." Who would have guessed it?
- On the tools front, take a look at the Tranche Project, aimed at securely sharing datasets among researchers.
- On the interesting collections front, Canadensys is trying to collect biodiversity information from various researcher networks. Their technical infrastructure is very much "use what you have; build only what you must."
- Why build yet another silo for data? Exploring curation micro-services is a great introduction to the simple, UNIX-y tools coming out of the California Digital Library.
- And because it's Friday, a lovely lyrical reflection from John Mark Ockerbloom on why preservation matters. Sometimes it's not all about the bottom line.
Have a good weekend!
The Book of Trogool turns another page...
- Social scientists and medical researchers, pay attention to this: "Anonymized" data really isn't—and here's why not. If informaticists aren't starting to run similar analyses on their own "anonymized" data, they should be. This is a serious concern.
- One for the humanists: the rather vaguely-named Scholarly Communication Institute Report from Virginia. The theme was using spatial data in the humanities.
- From my SciBling Christina: Anybody can code… but should you? Peer review is for more than published papers. Holding your code close to your chest probably means you're writing unnecessarily bad code. Trust me. I write a lot of bad code.
- The data tell the story. Government data in this case, but imagine what could be done with research data! Imagine!
- What is the scientific paper? A sensible outsider's view. Money quote for our purposes: "Like it or not, science increasingly depends on data being published in public machine readable formats."
Personal note: I may be a little scarce around these parts for the next little while. I have three presentations to give in the next six weeks, none of them the same, none of them finished yet. In fact, two of them are but gleams in the back of my cerebellum. This is eating most of my off-work time at present.
Hope your Hump Day was fruitful.
Happy Labor Day, US readers. Time to clean out the "toblog" tag on del.icio.us again:
Like many, I am watching the global economy with equal shares fascination and horror. That pursuit led me to this article, which I read through wholly without my librarian goggles on, until I was happily surprised by the kicker at the end for dataphiles:
The last ten years have seen a quiet revolution in the practice of economics. For years theorists held the intellectual high ground.… The typical empirical analysis in economics utilized a few dozen, or at most a few hundred, observations transcribed by hand…
But the IT revolution has altered the lay of the intellectual land… The data sets used in empirical economics today are enormous, with observations running into the millions… But now it is on the empirical side where the capacity to do high-quality research is expanding most dramatically, be the topic beer sales or asset pricing. And, revealingly, it is now empirically oriented graduate students who are the hot property when top doctoral programs seek to hire new faculty.
Not surprisingly, the best students have responded. The top young economists are, increasingly, empirically oriented. They are concerned not with theoretical flights of fancy but with the facts on the ground. To the extent that their work is rooted concretely in observation of the real world, it is less likely to sway with the latest fad and fashion. Or so one hopes.
The ability to acquire and manipulate large datasets is changing the entire discipline of economics, is how I read that. That's quite a strong statement.
I have more, but they need to wait until I finish the megaposts on library classification. I promise I'm working on them!
Hello, Monday. My tidbits folder overfloweth.
Have a productive week!
I am furloughed today and going out of town, so here, have an early tidbits post.
- I won't be at the iPRES 2009 conference, but I do recommend looking over the program; it gives a pretty good overview of what digital preservationists think about and study, and what keeps them awake at night. (Midwesterners: the International Digital Curation Conference is coming to Chicago in 2010. I'll be there!)
- The strength of weak ties: why Twitter matters to scholarly communication. Spot on, and true of FriendFeed as well. This is why, privacy concerns aside, the Facebook acquisition of FriendFeed is a threat; the friends-and-family design limits or eliminates casual elbow-rubbing.
- Digital Library Services in the Information Arcade from the University of Iowa. This is an e-research service approach worth pondering. Rather than create a digital-curation or digital-humanities outfit from whole cloth, Iowa is adding consulting responsibilities (and additional services TBD, apparently) to an existing service brand whose former responsibilities have to some extent gone elsewhere. I'll be watching this, and I hope Iowa lets us know how it's going. Love the planning wiki, too.
- Research data preservation and access: the views of researchers. Seems about right to me. Would researchers care to comment in the comments?
- From SciBling John Wilbanks, Publishing science on the web. I haven't blogged yet about openness and e-research, but I will be, because e-research without openness is so much technology-enhanced window-dressing. Consider John's post a sneak peek at the sort of thing I think about.
- Reading with Machines. Well-written discussion of where computers fit in textual scholarship, with which I entirely agree. Nice mini-bibliography at the end, too.
I have a few more links in the pipeline, but I think this'll do. Happy (furloughed or not) Friday!
All of today's tidbits are from one blog! Well, all but one.
- David Rosenthal on digital preservation. I had this bookmarked to blog about, but…
- Chris Rusbridge beat me to it, saying everything I would have. Yes, online-versus-offline. Yes, research data in uncommon, niche, and/or proprietary formats. Yes, metadata! And yes, thinking for ourselves.
- Semantic Web of Linked Data for Research? In all honesty, my reaction to "Linked Data" can be summed up in Chris's question mark. I am not a fan of RDF, I remain to be convinced that even small, constrained Semantic Webs are feasible given how slippery human reality-representations are and how fraught the attempt to render them in computer-understandable terms. Chris makes me reconsider, though.
- My backup rant. No, not mine—Chris's again. But I have the same rant! I would add to it that I have heard many graduate students mourn that their labs push backup chores onto them without the least effort to provision them with appropriate technology. Those labs that think about backups at all, that is…
Chris, I haven't gotten around to reading the latest International Journal of Digital Curation yet; it's sneering at me from Bloglines. I will get to it, though!
Interesting and perhaps relevant:
- Jean-Claude Guédon's examination of power in science. Does e-research destabilize this situation? How? If it doesn't, should it?
- Should copyright in academic works be abolished? Makes the obvious point that journal-article authors don't use copyright for its intended purpose of filthy lucre, and extrapolates from there. What I notice is that journal-article authors use copyright as a bulwark against plagiarism, lack of credit, and (whatever they perceive to be) misappropriation. Copyright is a lousy tool for that. We need better ones. Personally, I'd prefer that they bypass the legal system altogether.
- A nifty-looking modeling tool: Emergent Trails for brain-process modeling.
- A workflow tool: VisTrails. (Am I the only person who gets an Ars Magica frisson from that name? I probably am. I am such a nerd.) Tracks who did what to what when, with what result. Pluggable, written in Python.
Have a pleasant weekend!