Since the end of the year is a fairly quiet time for my particular professional niche, I've taken the opportunity to do some basic name authority control on author name-strings in the repository.
Some basic what on what, now? Welcome back to my series on library information management and jargon.
The problem is simple to understand. Consider me as an author. I took my husband's surname upon marriage; fortunately, I hadn't published anything previously, but I might have done—and if I had, how would you go about finding everything I've written, if it was published under two different names? "Dorothea" is a fairly distinctive given name, especially in my age cohort, but I do share it with other creators.
Now consider creators whose names are not written in Roman characters. The many and varied romanizations of the composer Tchaikovsky may give pause, though my personal favorite example is a certain Libyan leader who wrote a book or two. (Click over and then hit the plus beside "400's: Alternate Name Forms.")
Libraries confronted this problem when the search technology of choice was the card catalogue. The outline of a solution emerges: to avoid wasteful duplication of cards, all the cards representing titles by a given author should be in one place under one name, but it should also be possible to pop in a single card for each additional name variant so that searchers know which variant is hiding the good stuff. ("Chaikowsky, Peter Ilich: see Tchaikovski, Piotr Ilyich, 1840-1893.")
This means choosing a preferred name variant, of course. Ideally, we'd like this to be consistent across libraries, so that the devotee of Russian music who learns the preferred variant in her home library will easily find what she needs at any other library.
There are additional wrinkles as well: it does happen that different authors wind up with the same name, and for library purposes, that's no good. My husband David, for example, shares his name with a book-writing swimming coach. Libraries chose to use birth years—and, only if necessary, death years—to disambiguate.
Aha, you say. This is why not all author names in library catalogues have attached dates. This is why not all authors with listed birth dates have death dates, even when they'd have to be older than Methuselah to be living still. Yes, this is why. Dates in author headings started strictly as a disambiguation measure; the swim coach didn't have his birth year beside his name until my husband turned up and wrote a book. Of late, there have been raucous arguments among cataloguers in libraryland about adding death dates as a matter of course.
All of this activity—choosing preferred name variants such that each name listing remains unique, listing other name variants with the preferred, organizing by-author displays accordingly, coping with name changes—is called "name authority control." (It has an analogue for subject work, sensibly enough called "subject authority control." This verges on the topic of controlled vocabularies, which is definitely one for another post. Or six.) For catalogue cards, this solution is remarkably elegant and entirely functional. For computer-based record management—well.
Relational-database experts are howling right now, at the idea that a primary key—what's used to identify a particular row of information, a particular item, in a database—would ever change. The whole point of a primary key is its immutability! Ask for record number 91346342, always get the same record. You never, ever, ever change that record ID. Ever. Really, not ever. If a particle of information can change, it shouldn't be used as a primary key!
Linked-data experts are howling as well: why don't all these people have URIs? (If you remember your analogies from the SAT, database:primary key::RDF:URI. Roughly, anyway.) Well, they do, now, thanks to VIAF. Here's my VIAF URI (no, I have no idea why my birth year is included in my authority string, as my name by itself is unique in authority data; ask a cataloguer) to look at. Feel free to hunt for your own URI.
To some librarians, all this business of immutable identifiers may sound like specious wrangling, but it's not: it's actually a major disjunction among cataloguing practice, the databases underlying ILSes, and the perennially-emerging world of linked-data mashups via RDF. Inexpert programmer that I am, the idea of programming around library methods of authority control makes my head hurt. It leads to real problems making online catalogues work well (never mind library systems that aren't tied into authority control, such as digital-library platforms and institutional repositories), and making library data play nicely with other people's data. When gearhead librarians and other technologists say "library data is siloed," this is exactly the sort of thing they mean.
You may, particularly if you are a hard scientist, have noticed another hole in this system: you don't get into it unless you have written a book. (Exceptions, yes, for editors and composers and book illustrators and whatnot. However.) I, for example, had two or three articles and book chapters come out before co-authoring a book published in 2008. I didn't have an authority record until the book was catalogued. If all you've published are articles, you don't have an authority record, sorry.
This is becoming a serious problem! If it were just people like me struggling with it, that wouldn't signify; as a librarian, I'm supposed to struggle with this sort of thing. I learned hotshot DIALOG-searching tricks in library school to get around article databases' lack of name authority control, for instance. Right now, I've built up a strategy for finding physicists' and engineers' first names that mostly works, though I do wish whatever weird graduate-school midnight hazing ceremony that deprives these worthy people of their given names in favor of their initials would wither away and die. (I am joking. Mostly. This phenomenon, though of course it isn't the result of hazing, can be maddeningly difficult to rectify, especially when the author in question is a graduate student who either doesn't graduate or doesn't go on to an academic career.)
No, the real problem concerns the changing nature of performance measurement in academia, mostly in the sciences to date. As journal impact factors wane in importance (not nearly fast enough for me!), the importance of measuring the impact of individual articles and other publications via citations and download counts rises. How are we to measure this anything like correctly for a given author if we can't reliably match articles to authors?
In an article published earlier this year, I wrote that there was a ferment of activity around the question of author authority, and what would come of it all was far from clear. I'm happy to say that clarity is emerging, in the form of ORCID: the Open Researcher and Contributor ID initiative. This effort looks to me to have critical mass and brainpower to make a difference: publishers, libraries, technologists, and research funders are all involved.
In the meantime, I plod through the repo's author listings, making what minimal order I may, very desirous of a better solution.