Fellow Scientopian Christina Pikas posted an examination of Stack Overflow's motion toward a controlled tagging vocabulary. Toward the end, she made me grin:
Ok, one of my ongoing jokes is how CS keeps reinventing LIS (well indeed they’ve taken over the term “information science” in some places) – so now Stack Overflow has reinvented taxonomy (not quite a thesaurus though, right, because no BT or NT just UF and U, lol)
A lot of librarians, me not least, grumble "we told y'all so" when we see computer science reinventing our wheels. What this means, of course, is that librarians haven't done nearly good enough a job explaining our wheels.
This is what Book of Trogool's "Jargon" category is all about. I mean to rename it to "Librariansplaining" (I'm sure Zuska or Janet will explain the coinage, if it isn't obvious already) as soon as I can sort out how to do that without borking category links.
And now I'm going to librariansplain about controlled vocabularies, and explain Christina's in-joke. It may help you to read some of the earlier posts in this category first:
In those posts, I talked about how librarians divide up the world of knowledge into teensy-tiny slivers of "aboutness" in order to help lead you from one item of interest to another. One of the pieces of dividing up knowledge is naming the slivers. When you start doing that, as Christina noted, you run into some human-language problems really quickly:
- Synonymy. Istanbul or Constantinople? It's our business, as well as the Turks'.
- Homonymy. I say "bat." Do you say "Chiroptera" or "baseball"? And if librarians decide to use the word for the baseball apparatus, what should we do so that the Chiroptera-fanciers can find stuff they want?
- Terminology change. Nobody calls it a "horseless carriage" any more. To make matters worse, the first name something new gets is often not the name that sticks. Social changes also loom large here; some of the cruft that can accumulate in a naming system is kyriarchical cruft.
- Granularity. Knowledge is infinitely divisible. Naming systems have to decide at what level separate names are warranted. It can also help to indicate relationships up and down the granularity chain; for example, one could call "weblogs" a subcategory of "social software." Or not.
So when librarians "control a vocabulary," we come up with a naming system that avoids the above pitfalls as much as we can manage.
Various types of controlled vocabularies exist; I don't propose to describe them all here. Instead, I'll describe the type that Christina was referring to: the thesaurus. (No, not the synonym dictionary. This is different. Hang with me while I explain.)
Thesauri cope with granularity by establishing "broader-term" and "narrower-term" relationships between terms. So in an entry for "Social software" you might see "NT: weblog, wiki, social-networking service." Likewise, in a "Weblog" entry you may well see "BT: social software." This doesn't absolve the vocabulary-builder of the responsibility to choose the granularity of terms wisely, but it does help.
Homonymy and synonymy are often dealt with via "use" and "use for" relationships. If a vocabulary-builder decides that Istanbul is the preferred term, the entry for it will probably include "UF: Constantinople." Likewise, Constantinople's entry will say "U: Istanbul." This can also help with terminology change sometimes: an entry for "Automobile" might contain "UF: Horseless carriage."
As for "bat," controlled vocabulary terms often have "scope notes" that help to disambiguate homonyms and explain the intended granularity for the term. A scope note would make clear that "bat" for purposes of this vocabulary means the thing you smash a homer over the left-field fence with.
The last relationship between terms that thesauri include is the "related term," which is exactly as vague as it sounds. In an entry for "bat" you might see "RT: Baseball." These have to be used sparingly and with care, or we risk sending you off on wild-goose chases; in some way or other, almost everything is related to almost everything else.
So now I have librariansplained the thesaurus, and you understand Christina's joke. The last thing I'll add is that many library journal-article databases use thesauri underneath. The user-interface for them, however, is appallingly, stunningly bad in the implementations I know. Better UI ideas would be extremely welcome.