Cowboy and centralized research IT

(by Dorothea) Feb 08 2011

The question of research-IT provisioning came up in my post on data-security horror stories. I saw some confusion from readers about it, and it's worth examining in detail for other reasons, so here goes.

So let's imagine Achaea University for a moment: immense, a diverse research agenda across many disciplines, lots of grants coming in, but some areas (often but hardly exclusively in the humanities) with no grant money incoming at all. How does Achaea U provision researchers with IT tools and services?

Achaea U doubtless has a central IT unit. At a minimum, it handles networking, campuswide administrative IT (payroll, HR, authentication/authorization, likely the course-management system, perhaps calendaring and email if those haven't been outsourced), and a lot of front-line student- and staff-facing IT (computer labs, campus wireless, helpdesk, webspace, basic web-accessible storage, etc). It may or may not have a learning-technology unit.

It almost certainly doesn't have a research-IT-specific unit. Such research computing services as it provides are of two types: repurposed other services (e.g. webspace), or pay-to-play services (e.g. specialized development teams). Big storage, if it exists, is almost certainly pay-to-play; you pay as long as you keep data on central IT's systems, and if you don't pay, central IT blows the data away. Such research-type services also tend to be "enterprisey" in their technical provisioning—which combined with pay-for-play means "serious sticker shock" for the average researcher, even the average well-funded researcher.

Services also tend to be lowest-common-denominator. If you have special needs, such as preservation past grant expiration or diamond-hard security? Tough noogies, chum. Central IT offers what central IT offers; you can take it or leave it. You can yell at central IT all you like that they don't know what the hell they're doing (and they may very well not; insular central IT units can and do gin up services that are convenient for them to provide, while not convenient at all to the intended user). Doesn't matter. Central IT offers what Central IT offers. Take it or leave it.

Most researchers leave it, which means no economy of scale, which means these services cost central IT even more than they need to—and since central IT is pay-to-play, well…

So Achaea U has a lot of other systems running research-related IT. For example, Achaea U does a fair bit of what's called "grid computing" (which has other guises too, but let that go for now). That's not run through central IT, because central IT was too big and ponderous and lowest-common-denominator to jump on that need (it's very hard, organizationally, for central IT to greenlight a service that not everybody on campus will use). Engineering or comp sci owns the grid, or it may have spun off into its own (likely pay-to-play, depending on the status of its internal grant funding) research/service enterprise.

And then we have the other end of the scale: a poorly-funded lone-wolf researcher limping along via a Linux server installed on a dusty beige consumer-grade box under his desk. If it breaks, he's humped, because it was set up years ago by a grad student who has since graduated, leaving no documentation behind, and he doesn't entirely know how it works. It hasn't broken. Yet. Is it backed up? Who the heck knows? Has it been hacked? Who the heck knows? Who the heck knows which networks it's even connected to, for that matter? The researcher sure doesn't. But he knows that his server (plus whatever free-to-him web services he tacks on to his processes) is cheaper by a factor of ten (maybe even a hundred) than equivalent computing provision from central IT! This, folks, is what I mean by "cowboy IT." Yee-ha! And there's a lot of it, scattered all over Achaea U! Yippee-ki-yi-yay!

It is, as I said, a continuum. Based on what's said in the Inside Higher Ed article, Dr. Yankaskas was very close to the cowboy-IT end. Somewhere in the middle, Achaea U has a few research-IT units that work on soft money for small or large groups of researchers. These units are more nimble, discipline-savvy, and responsive than Achaea U's central IT, and they're likely just as competent or more so (especially considering how little central IT knows about research-computing needs); the downside is that they're not as richly-funded and their funding is always in danger, so they probably cut some corners. The worse among them are no better than straight-up cowboy IT; part of the problem is that their staff may be selected by researchers who don't know jack about IT (as clearly happened in Dr. Yankaskas's case).

Plenty of Achaea U researchers, it must be said, can't even muster a cowboy-IT setup, when lack of outside funding combines with lack of skill. They are utterly shut out. Neither central IT nor research-computing units want them because they have no grant money to toss in the pot. The library may do what little it can, particularly for humanities scholars, but it's not enough.

So how do researchers get away with cowboy IT? Well, honestly, nobody's ever looked. It's that simple. And nobody looks because nobody much cares—until there's a huge, embarrassing screwup like the Dr. Yankaskas affair. (If this seems to resemble the laissez-faire IT environment that used to exist for social-security numbers in US universities? Quite right. Same causes.) Classic case of externalities: cowboy IT creates risks, sometimes serious risks to the researcher or even the institution, but mitigating the risks isn't perceived as important (and is known to be expensive) until there's a sudden crisis.

I expect the NSF data-management plan process to expose a shocking amount of cowboy IT in US science research, from the Achaea Universities among us to industry all the way down to the lone-wolves. I also expect the NSF will start to indicate gently that cowboy IT is not acceptable practice… and to become rather less gentle about it over time. This means that researchers will have to internalize risks they hadn't previously worried about, or they'll wind up like Dr. Yankaskas.

I don't entirely know what campus research-IT infrastructures will emerge from this. I wouldn't be celebrating if I worked for central IT; I have serious misgivings that central IT in its ongoing ignorance can even do this right. I'd rather see a mesh of the middles, growing collaboration among research-specific IT units to expand their services, service models, and funding sources to campus cowboys and have-nots. That's a tall order, though; funding models aren't clear, and these units think of themselves as independent fiefdoms, rarely valuing collaboration because of its added process overhead. It doesn't help that central IT will often fight to keep such a mesh from emerging, viewing it as a threat.

So we'll see. The bottom-line truth is that Achaea U will have to do better at research-IT provisioning in the next decade, or it'll start losing grant dollars to universities that work out how to do it right. Yippee-ki-yi-yay.


18 responses so far

Friday foolery: Mitigating repository risks

(by Dorothea) Feb 04 2011

I have some zombie-ish tendencies of my own, so I was most interested to read such a thorough, well-researched investigation of Zombies and Risk to Repositories.

I feel ever so much better now that I know how to protect digital assets from the undead hordes. Don't you?


One response so far

Data-security horror stories

(by Dorothea) Feb 04 2011

I'm afraid we're going to see more data-security horror stories like this in the next few years. It's truly horrific for everyone involved.

Rather than point fingers, because there are multiple levels of epic fail in this situation and nobody comes out smelling like roses, I'll try to pull out some more-or-less depersonalized morals-of-the-story:

  • Knowing why confidentiality is important is not the same thing as knowing how to ensure it, particularly in a networked computing environment.
  • Cowboy research-IT installations and their staffers must soon expect a fair bit more scrutiny than they're used to with regard to many important data-management questions, data security hardly least. These risks may well swing the pendulum away from cowboy IT (widely perceived as cheaper) back to more centralized, accountable systems and staff.
  • The buck stops at the PI. This means that the practice of leaving computing to the young ’uns and part-timers is not going to cut it any more.
  • If it's this bad in biomedicine, which is well-funded… I'm scared about everything else. Really. I may never fill out a survey again. (Okay, that's just because I hate surveys and believe that much too much lazy survey research is done, not least in librarianship.)
  • Policy, policy, where is the policy around data issues? It's years behind where it needs to be, that's where. And don't talk to me about IRBs (or NSF grant reviewers, for that matter; this is a serious and I hope temporary weakness in the NSF data-management plan model). IRBs are made of PIs, not the necessary gimlet-eyed informaticists and IT-security pros. If you've ever been on an IRB, be honest: would you have thought to ask about IT staff competencies?
  • Anybody who reduces research data management to "storage and backup" needs repeated applications of cold water and horror stories like the above one until they come to their senses. It's more complicated than hardware, people. Much more.
  • Ditto anybody (hello, librarians! hello, OAIS model!) who thinks that data management starts when the data are final.

Data security is serious business, especially now that reidentification risks have entered the picture. If you do human-subjects research, or work with any other sensitive data in digital form, take security seriously before you get caught flatfooted.


7 responses so far


(by Dorothea) Feb 02 2011

So I go to a perfectly nice conference last year and get trapped by a volcano.

I try to get to OLA Superconference, and there's an epic blizzard. (I'm still going to try to make Top Tech Trends, but my Friday talk is cancelled. Pity. I put a lot of work into that talk.)

I dunno if it's safe to go on the conference circuit any more!


One response so far

The One Schema

(by Dorothea) Jan 31 2011

I grumbled on FriendFeed today that I wish folks (IT folks in particular) would understand that there is no single metadata schema that works for every kind of data in every form in every situation. If you're building a data repository intending to store many kinds of data from many disciplines, it had better have a metadata model that accommodates many different vocabularies.

Bill Hooker promptly stepped up to the plate with the following dictum (slightly edited by yours truly):

Three schemas for the astronomers under the sky;
   Seven for the urban planners in their halls of stone;
      Nine with which biologists comply;
and ONE for the Librarian on hir Dark Throne:
In the Land of Library, where the metadata lies.
   One schema to rule them all,
   One schema to find them;
   One schema to bring them all;
      And in the repository bind them.
In the Land of Library, where the metadata lies.

I just named my Aeron chair the Dark Throne, y'all.


7 responses so far

Friday foolery: Hawking the Library of Congress

(by Dorothea) Jan 28 2011

I've had a week, folks; as a reward, next week I'm off to a conference with a brand-new talk. (Slides up soon, I hope; they're getting close to done, but they're not there yet, and I still have plenty of patter to write. I'm trying a new presentation technique, which adds a lot to my prep time.)

But my week is as nothing to the week the Library of Congress has had, wrangling a stray Cooper's hawk that wandered in and didn't particularly want to leave. They did safely capture her, and now she's off to rehab despite saying "no, no, no" and will eventually be rereleased into the wild.

It's hard to get useful good press for libraries. What's more typical is this kind of nonsense with the subtext "that library stuff, it's sooooooo obscure, and aren't librarians just weirdoes?" Worse are the lazy buns-and-shushing stereotypes (George Lucas, my lazergaze is on you, man); sometimes even worse than those are the "look, aren't librarians so hip? and isn't that cute?" stories.

So I'm super-impressed with how well the Library of Congress handled public relations on this one. Their blog shines with good humor and good information. They made very clear that they were handling the situation well and responsibly. They took note of useful input from blog commenters, and responded publicly to it. They deserve every iota of the attention this odd little episode has garnered.


No responses yet

Link, don't pass around files

(by Dorothea) Jan 25 2011

So I heard an interesting question the other day, one that's worth thinking out loud about. Someone asked whether it was legal, copyrightly-speaking, to post a legally open-access article to a public server or service (such as Facebook or FriendFeed), or if one should link instead.

The answer, as with most copyright questions, is "it depends." The other answer is "I am not a lawyer; if you have a copyright question, go ask a lawyer." But in my estimation, even when reposting is probably safe, I think it's better to link, and I'll try to explain why I think that.

First, there's a pragmatic argument: it's usually just plain easier to drop in a link than to download and reupload (and if it isn't easier, the hosting archive is broken). I'm all in favor of easy.

Second, in many cases, reposting articles publicly may well infringe copyright. If there's a CC-BY license on the article, I would guess public reposting with credit to be an acceptable reuse. If there's a CC non-commercial or share-alike license, I'd personally think twice. If there's no CC license at all, which is the usual case? By reposting, you're making a copy, and yes, an author or copyright-owning publisher could bring a lawsuit over that. Would they have much of a case? Who knows? I don't. But who needs the hassle?

Can I, as a digital-archive manager, give you permission to repost items from the archive I run? Actually, no, I usually can't (the few CC-BY items in the archive aside). The license that archive depositors give the archive lets the archive disseminate materials via its own website. That license emphatically does not let the archive give other people permission to disseminate (except perhaps under the specific circumstance of the archive shutting down and transferring the entirety of its assets elsewhere). It's a subtle point, but important.

Third, there's an impact question to consider. As alternative impact metrics take hold in journal publishing, view and download numbers take on new importance for authors. If you repost an article instead of linking to it, are you going to count views and downloads? Probably not. Publishers and archives, though, they're counting and reporting. So anybody who downloads your copy robs the author of a countable download. Maybe that doesn't matter much today… but it might matter a lot tomorrow.

Fourth, authors aren't the only folks counting views and downloads. Digital archives aren't magically free to run, and we digital archivists don't work entirely out of the goodness of our hearts. One of the ways we justify our work and our archives' existence is through view-and-download counts. When you repost, you dilute the impact that we can report to our funders. Speaking as one whose service has been threatened with closure—any impact dilution can be a true threat.

Link, don't repost, even when reposting is legal. The author you benefit may be your colleague, or even yourself. The open-access archive or publisher you benefit is fighting against the paywall-bounded darkness.


19 responses so far

Library Day in the Life 6

(by Dorothea) Jan 24 2011

Once a year, librarians get together to tell people what we do all day, because we know that many people have stale, stereotyped, or just plain wrong ideas about what that is. I don't usually talk about my job here, because I've landed myself in hot water over that before, but for Library Day in the Life I'll make an exception.

For the record, what I do: My job split three ways as of last August. One-quarter of me belongs to the institutional repository, at least until such time as that enterprise is absorbed into the grand new digital-library infrastructure currently being built. Another quarter of me belongs to the library school, allowing me to teach two courses per year. The remaining half co-manages Research Data Services, pitches in on various projects relating to scholarly communication, (usually digital) preservation, and research-data management, and has some other irons in the fire that aren't yet ready for prime time.

Today's doings:

  • 7:30 am: Arrive at my office. Start up the iMac; tidy up a couple of (physical) things while it boots.
  • 7:32 am: Start up email, RSS feedreader, IM client (among other things, it's how colleagues know where I am), calendar app, Evernote (with my to-do list).
  • 7:33 am: Chug through email-related to-dos, while chugging the morning's Diet Coke:
    • Send an old RDS announcement to a committee member for revision.
    • Send a researcher at another institution a copy of a closed-because-of-copyright thesis that they found in the institutional repository. (We treat such requests just like interlibrary loan, by policy.)
    • I have a green light to talk briefly about RDS at Thursday's all-staff meeting, yay! Quickly write up what I want to say in Evernote (so that I can have it in my iPod Touch on the day).
    • Do a requirements writeup for a projected new RDS website feature for the folks who manage the RDS website.
    • Check the scholarly-communication questions email address. Nothing there but journal spam. Delete journal spam.
    • Hack through stuff in inbox that doesn't need to be there. Make it to Inbox 6. Note a few things that have been waiting to get done; give them a lick and a promise, because it's a busy day.
  • Intermittently: skim feeds, FriendFeed, and Twitter. Not much I actually have to stop and read through today, which is good.
  • 8:45 am: Set IM client away notification to "At a meeting, sorry!" Chug across the quad to catch part of a SLIS health-informatics class. They're demoing a new campus research facility whose PI is interested in help figuring out how to manage the digital and analog data that the facility will produce; this is the best chance we'll likely have to get a read on the project, given how hard the PI is to reach. Demo is useful; I ask questions related to what data they will be recording during experiments, and whether/how installations like this one are sharing their work. From the sound of things, we'll be in on the ground floor as they think through these questions—which is wonderful.
  • 9:30 am: Get back to office, slightly lightheaded from the glue smell in the library entryway (they're replacing the absorbent padding on the entryway floor). Set IM status back to "available." Revise ("put a screenshot on the back!" sounds so easy, yet isn't) and start printing flyers and handouts for afternoon meet-some-faculty event for RDS.
  • 9:34 am: Notice from @joan_starr on Twitter that DataCite Metadata Scheme is published. Grimace, bookmark it in Pinboard (tags "datacitation" and digital-curation class number), leave it open in a tab to look at later.
  • 10:04 am: Flyers merrily printing (they're both two-sided, and it's a communal printer, which means a lot of running-back-and-forth), take the opportunity to check the course-management system for any student SOSes. First-week homework is coming in; good. Had some drops, which is unsurprising and probably positive (if they can't handle the first week's homework, they don't belong in my class, and I deliberately set up the first week's homework so that students would find that decision easy).
  • 10:13 am: Try to sort out issues with RDS-related email list. Send SOS to library helpdesk. Get prompt, helpful response. Email internal RDS list to start discussion about future of related list.
  • 10:18 am: Skim the published DataCite scheme, paying special attention to the XML instance listed. Realize that DSpace can neither create nor do anything useful with such an XML instance. Sigh. Wish once again that we were off DSpace, or that DSpace would finally recognize that metadata stoppeth not with key-value pairs.
  • 10:20 am: Squeeze in some work on OLA Superconference slides ("Turning Collection Development Inside-Out"). For me, this means hunting for CC-BY licensed photos and art and sorting out the typography and general aesthetic I want. Save new Keynote theme in case I want this particular combo again.
  • 11:25 am: Everything's printed, yay! And I have some finished slides and a lot of presentation outline done. Heading down to meet health-informatics professor for discussion over lunch.
  • 12:30 pm: Back in office, with game plan for health-informatics professor's project. Send requested email to professor. Churn through email that's piled up over the morning. Figure out where RDS-related meeting is; realize that I'll have to leave in 25 minutes to make it there on time. Sigh. Try to move slides a wee bit further along.
  • 12:50 pm: Answer an emailed question for an institutional-repository contact on a different UW System campus about copyright clearances for graduate theses.
  • 12:55 pm: Shut down iMac; there won't be much point in returning to my office after meeting, so I'll just go home and do the rest of my workday from there. Depart for meeting, grabbing up folder with flyers on the way out; chug up and over Bascom Hill to the center of campus.
  • 2:35 pm: Having listened to many cheerful facts about the square of the hypotenuse conflict-of-interest reporting and watched a librarian colleague knock her RDS presentation out of the park, head out to the bus stop to catch a bus home. While waiting, check email and RSS feeds via iPod Touch. Star one RSS item related to OLA Superconference talk for more in-depth perusal later.
  • 3:10 pm: Arrive home, boot up Buffle the MacBook. Log onto the course management system, deal with group-project assignments, post some administrivia, grade first-week homework (yes, I am mean and cruel). Note with pleasure that students new to XML (which isn't all of them by any means, and I can tell the difference!) are figuring out for themselves that you can "make up your own tags" in XML that make sense in context.
  • 4:00 pm: Done grading. A successful assignment; I feel good about it (which is an important datum in a first-time course). Four assignments missing, but they have another hour's grace; I'll get the stragglers tomorrow morning. One last email check, one small, non-serious fire to put out.
  • 4:10 pm: Realize I forgot to water the philodendron in my office. Sigh. It's a forgiving plant; it'll live until tomorrow. Start expanding the little one-or-two-word notes-to-self in Evernote into this post. Not that this is part of my workday, of course; I thought folks might wonder, that's all.

Over the course of the evening, I'll probably look in on email and the course-management system a couple more times, but I won't answer anything unless it's an immediate problem (which it hardly ever is).


No responses yet

Friday foolery: have some more Dui!

(by Dorothea) Jan 21 2011

It's very cold where I am, so what could be better than some dude in shades and aloha shirts rapping about Dui, er, Dewey Decimal?

Okay, maybe some things could be better than that. But this isn't all bad!


No responses yet

Can it be? A metadata standard that makes sense?

(by Dorothea) Jan 19 2011

I am notorious for hating library metadata standards and standard-like objects. Hate MARC. Hate Dublin Core with a great and wonderful hate. Hate OpenURL. Hate EAD. Hate OAI-PMH and OAI-ORE. Bring me a metadata standard, I'll usually find something to hate.

What does it mean that I like the DataCite Metadata Scheme? Am I losing my edge? Going over the edge? What?

Or it could just be that the DCMS is a sensible minimum that solves the problem at hand (identifying and citing digital datasets) without gobs of cruft or gobs of oversimplification. They've also acknowledged the need to revisit and change the scheme over time, and are working on how that will happen (Open Archives Initiative, I am training laser-eyes on you).

DCMS is not perfect; in my opinion, they'll need to go beyond DOIs to handles and ARKs and PURLs. (Yes, I know all DOIs are handles; not all handles are DOIs.) But for a first cut, it's pretty darn good, and it'll stay that way if they can resist the temptation to cruft it up. Good job, standardistas!


No responses yet

Older posts »