About the preservation of databases

Feb 16 2011 Published by under Information Science, Uncategorized

Egon Willighagen asked on Chm-Inf about why libraries aren't preserving databases. Beth Brown provided one reply.

I commented there and hopefully my comment will show up eventually but I seriously doubt we'll be able to help with this.

NASA, DOD, NIH, NSF, and other fund the development and first few years of hundreds if not thousands of databases. Then the database becomes less about new science and more about infrastructure or operations. Then the PI gets bored. Then maybe the users start to drop off... then the database disappears. I was just looking for information in a NASA database that was referenced all over the place. When I got there all I found was a notice that it wasn't funded anymore so no data for me!

We've been hearing this with data - about how it cost so much to gather but then is abandoned.  Libraries are working to try to take up some of the slack with this, but it's hard. Look, if NASA and DOD with big offices for science and technology information can't preserve their own stuff, they're not going to fund us to do so. Libraries don't have the money or the mandate.

I was at SLA whenever it was in DC and saw a presentation about yet another NASA database - even at the time the only thing I could think was how close is the PI to retirement?

Funders should ask about preservation plans for these things. I don't think they do.

  • Miss MSE says:

    Allegedly, the new NSF Data Management Plan requirement is meant to address some of these concerns. It seems likely to become a throw-away section for most investigators, though, unless NSF starts enforcing follow-through somehow. Coming from a field with limited, scattered databases, I think we've gotten caught in a negative feedback loop: no one submits their data to databases because no one uses them and they're poorly maintained, and databases are poorly maintained because no one uses them or submits new data. Also, there doesn't seem to be any money offered towards the costs of proper data management, just requirements.

    • Christina Pikas says:

      Too true. Plus, there are LOTS of other funders besides NSF for the databases I use at work. Even if the PIs did a great job on those data management plans, that doesn't help decades of lost work :(

  • Chris Rusbridge says:

    The Annual Nucleic Acids Research issue on databases now lists over 1100 databases of relevance! Year to year, there does appear to be some consolidation, but still the number grows. I think we have a real problem coming down at us. As you say, let's hope NSF (and other research funders, eg UK Research Councils) start enforcing some compliance. One well-known Program Officer, then at NSF, told me that he personally did this by sitting on further proposals until earlier commitments had been met, but this obviously was not agency policy!

    That said, for many simpler databases there are approaches to preserving the content rather than the database which could be used by some research libraries. Peter Buneman at Edinburgh has an approach for "preserving" changing databases, for instance. There's the SIARD approach, and others. It's developing, if slowly, and more research than product, but there are some approaches around!

