I'm in Urbana-Champaign this weekend to teach an in-person day for my online collection-development class. I'm looking forward to it; every time I teach I am reminded that students are smarter than I am.
For now, tidbits!
- As world plus dog probably knows already, The Economist tackled the data deluge.
- Adam Christensen gives us the modest, unassuming Data. The foundation for everything on an intelligent, interconnected, instrumented planet.
- Rethinking scholarly communication from the ground up: SciBling John Dupuis asks Are computing journals too slow? and Dan Cohen muses about how best to deconstruct the humanities' reverence for the print codex, while Craig Mod brilliantly deconstructs book design in an iPad world.
- So-called "digital natives" have digital histories; the Library of Congress asks whether and what they think about preserving them. (For more on personal digital preservation, I strongly recommend Microsoft Research's Cathy Marshall. Her two D-Lib articles are wonderful; also keep an eye on her recent presentation at the code4lib conference, which there should shortly be video of.)
- The city of Vancouver is taking digital archiving seriously. No "put floppy disks in the fridge" here (no, seriously, I've seen that hailed as innovative archival practice!). I like what I see of Archivematica.
- Stefano Costa hopes to make data in archaeology open. While there are serious and legitimate concerns about making location data on some finds and digs public—my father the anthropologist used to call himself a "grave-robber and junk-picker" in jest, but there are real robbers out there—in the main, archaeology data is a great target for open.
- Sarah Askew once again explains why the software turned loose on data should be kept and scrutinized, with astronomy as her case study. Good insight into why "one software suite fits all" doesn't work, which should give some web4science developers pause.
- Harvard's School of Engineering and Applied Science interviews Stuart Shieber, in a treatment of open access refreshingly free of hyperbole on one side and panic on the other.
As always, if there's a link I should see, comment here or tag it "trogool" on del.icio.us. Thanks!
It's Friday! Snack on some tidbits.
- In the "didn't anyone teach you to show your work in grade school?" department, we have NIWA unable to justify official temperature record, as well as the radical notion of using actual data to gauge the effectiveness of review boards in stopping unethical research.
- In the "open is not a panacea" department, we have Nat Torkington rethinking open data, or at least its funding models (hat tip to Trevor Muñoz), and JISC's Clarion project trying to convince principal investigators that sharing data is a useful thing to do.
- In the "let's kill all the lawyers" department, we have troubling privacy questions about the sharing of personal genome data, and on a happier note, the wonderful Panton Principles for making data properly open and reusable once the decision to share them has been made.
- In the first-principles department, the redoubtable Carole Palmer tells us that data need to be curated. AAAS wonders who's going to do the work, and JISC comes up with a good-practice guide for those willing to dive in.
- In the tools-and-toys department, we have Stuart Lewis building a SWORD library in PHP aimed at making quick, easy, even one-off repository-deposit tools. I am all in favor!
As always, tag a delicious link with "trogool" or leave a comment here if you have something tidbit-worthy. Thanks!
I'm home sick today, and not precisely looking forward to giving my class tonight because I really do feel wiped out. Fortunately, tidbits posts are easy…
- Denmark ponders the future of the research library. A thoughtful read for librarians; a good skim for scientists wondering how libraries will help them in future.
- Congratulations to Galaxy Zoo for its first published paper based on crowdsourced galaxy-classification data. May there be many more!
- Code is data too, says Chris Wiggins, arguing that you can't really judge results until you know what's been done to the data.
- An Economic Argument for Free Primary Data from, of all places, the digital humanities. I wish humanities-digitization projects that support themselves by selling access would consider sunset clauses.
- Archiving video games, which are worthy subjects of cultural inquiry. This is a hard problem.
- Kevin Smith asks why journal publishers think they're entitled to compensation for articles when their authors and peer-reviewers aren't.
- Microsoft to Offer U.S. Scientists Free Cloud Computing says the New York Times. Google tried this trick already, and gave up on it in short order. If I were a scientist, considering Microsoft's track record for supporting its own experiments (PlaysForSure, anyone?), I wouldn't trust this as far as I could throw it.
- Gideon Christian ponders Building a Sustainable Framework for Open Access to Research Data through Information and Communication Technologies. Mostly about intellectual-property law as it relates (or doesn't) to research data.
- Small archive? Research lab wondering how to do this digital preservation thing? Practical Digital Preservation for Small Archives conference-session summary provides some helpful tips, and many links to follow.
As always, leave a comment here or tag something "trogool" on del.icio.us if you think it belongs in a tidbits post.
Happy Groundhog's Day Eve! Or something.
If you've got a link that belongs in a Trogool tidbits round up, drop me a comment or tag it "trogool" on del.icio.us. Thanks!
Because I scanted you on tidbits for quite some time, have a second tidbits post in a single week!
Finally, I want to call out the excellent Data Dimensions: Disciplinary Differences report from Key Perspectives. "Data management differs by discipline" is a skeletal truism; Key Perspectives puts some meat on the bones.
It also contains throwaway gems like "It is worth noting that researchers expected their own institutions to be able to provide affordable managed storage, technical support and a preservation facility – but few institutions appear to be able to offer such services at this point." (p. 10)
Incidentally, authors, this institutional-repository administrator's answer to the question "Will institutional repository administrators in a university setting be willing or able to comprehend the details of data formats and metadata schemas across a whole range of disciplines?" (p. 3) is an emphatic "You bet! Bring it on!" Metadata is my business. I'm less bad at it than you might think.
I'm a bit late with these! Sorry about that. Bit busy around me just now.
- Data-sharing resolutions/requirements announced recently include: the American Naturalist and allied journals (possibly behind paywall, sorry), and the Linguistics Society of America.
- The calls for open data and data archiving redouble: from mainstream media such as New Scientist, from science bloggers like those at Bench Press, from service providers like Data Dryad.
- I try to stay out of the futurism game (sometimes unsuccessfully), but here are some eScience predictions for you from others.
- Conference reports relevant to our theme: Educause 2009 from DLib Magazine; one of Christina Pikas's reports from Science Online 2010, as well as Dave Munger's report on the same conference.
- More on ORCID by Martin Fenner. I described ORCID to some colleagues recently as "the biggest name-authority-control effort we'll see in our lifetimes," and I do believe that!
- Bethany Nowviskie's Monopolies of Invention is a must-read. Respect—do you speak it?
- Embeded (sic) metadata please, asks "a Hacked Librarian." Honestly, I'm so happy when I have any metadata at all that I don't tend to worry about where it lives. I agree that belt-and-suspenders metadata, internal as well as external, is ideal where possible.
- Data-mining in the real world: retailing.
- Bad data lurks everywhere, notes Jean-Claude Bradley, including in peer-reviewed, supposedly trustworthy chemistry publications! And yet exposing data as it is generated still scares collaborators away, says Pedro Beltrão.
- Data-mining humor, with a serious undercurrent: Ancient Woolworth's sites. The null hypothesis can only be discarded for good cause…
As always, you can email me or tag something "trogool" on del.icio.us to bring it to my attention for a tidbits post.
Wishing all of us a happy, prosperous, data-filled 2010.
See something that should be part of a future tidbits post? Comment here or tag it "trogool" on del.icio.us.
Every time I do a tidbits post, I think to myself, "gosh, that was a lot of tidbits; I'll never fill up the queue again."
Every time, I'm wrong.
Happy holidays to those celebrating them.
Want to help me collect tidbits? Tag them "trogool" on del.icio.us, or leave a comment in a tidbits post. Like, er, this one.
I'm at home today owing to last night's epic snowfall in Madison shutting down practically the entire university, so it's time for tidbits!
- The biggest data story of the week is the climate-data hijacking. Gavin Baker has the best roundup I've seen. I also strongly recommend Cameron Neylon's thought-provoking response.
- The Digital Curation Blog has a lengthy series of roundup posts on the just-past International Digital Curation Conference. Next year in Chicago! I will be there with bells on.
- Climate change for libraries. No, nothing to do with the climate data scandal; instead, a cogent exploration of why libraries ought to be involved earlier in the research process, and how we might go about getting involved.
- When tools aren't curated: Deepak Singh notes with irritation the shutting-down of two software projects that were useful in his work. This, too, is part of data curation: once software tools are used on data in the course of research, those tools are part of the scholarly record. (They could, after all, have been poorly coded or based on faulty assumptions; that needs to be known.)
- JISC claims that data-sharing happens more often these days. True, I suppose, but to me this article had a whiff of the wishing-it-so about it.
- With data available to all, will there be more citizen science? I hope so, not least because of the implications for general scientific literacy.
- This week's "good use of data" award goes to Matthew Wilkens for this lovely, lucid explanation of why text-mining is a necessary approach to literary criticism and analysis. (Not the necessary approach, mind you, but one among many.) His argument is even stronger in linguistics, where the aesthetics usually don't obtain.
- On the jobs front, the University of New Mexico is looking for a social sciences/humanities data librarian. (Seems an odd combo, unless UNM's social-science research is predominantly qualitative.)
Regarding that last one, would it be helpful for me to try to maintain a jobs roundup here? If you think so, drop me a comment. I'd also appreciate pointers to good places to spot such jobs. I know most of the library sources, but based on this poster helpfully pointed out to me by commenter Nic Weber, a lot of the job ads will go out in science venues.
Here's hoping my choir's dress rehearsal scheduled for tonight can actually happen… in the meantime, I raise my hot-chocolate mug to you all.
The tidbits folder is out of control, so this linklist may be a bit epic. My apologies! There's a lot of great discussion in this area of late.
- Data repositories: the next new wave Steve Hitchcock is sensible, as usual. The answer to "are repositories changing?" is "they already changed," if one asks Carole Palmer. What's lagging, still, is institutional recognition and approval of those changes. See also ERIS's initial thoughts about repositories for researchers.
- Free the humanities data! says Adam Crymble. Ainsworth and Meredith describe e-Science for Medievalists, but do take a look even if you're not one: a tool developed for medievalists turned out useful in other fields, including rather far-flung entomology!
- Jeni Tennison's Establishing trust by describing provenance explains an answer to a question I've often heard from potential data end-users: "How do I know I can trust this?" Mark well, data providers: if you don't clearly show your work, no one can trust what you give them, and what they can't trust, they won't reuse. Or cite.
- Scientists leading the Web 2.0 charge? Not so much. Nice to see this hitting the mainstream press.
- Eric Drexler explains how data-driven science may change how science is done. John Wilbanks explains why science isn't software, and why "open source" may not make sense as a metaphor for data sharing. (I suspect that usage is a lost battle, John.)
- Tool of the week: Scratchpads. The article, aside from describing a great-looking tool, is extraordinarily insightful about the sociocultural challenges of data sharing in the natural sciences.
- A call to shine light on dark data, including publication of negative results. One researcher's "failed" experiment may be another's goldmine. Besides, who needs or wants to replicate something that doesn't work out of ignorance? See also Why machine-readable data should matter to you (and follow the link therein, too).
- If you haven't seen this already from me or Christina, check out Nature asking what's wrong with chemistry that it won't share data. See also the longer report, if you are so inclined.
- Neil Saunders, showing data-mining in action! See also the vigorous and valuable discussion on FriendFeed. (I am no little amused that the latest comment is "Normalization ... aaargh! Most definitely not a solved problem." Indeed.)
Whew. I think that brings things back under control. Happy Wednesday browsing!