Disciplined tagging or how Stack Overflow plans to control their vocabulary

Carol H tweeted this blog post today from Stack Overflow, the wildly popular question and answer site for IT, CS, software dev, etc. Essentially, if you get stuck, you submit a question and you provide subject tags to help people find it. Answering questions gets you reputation points.

A collection of user-generated tags becomes a “folksonomy” (to use a worn out term), but typically in social software sites, the choice of the tag is completely up to the user so you get multiple versions of the same term (US, United States, USA, usa, U.S.A., etc), you have meta terms (to-do, to-read), and sometimes some unpleasant stuff. LIS researchers in information organization have done a ton of papers on these things and people who do taxonomies for a living sometimes use them to help determine “preferred” terms.

So according to this blog post. SO seeds new sites with a few sample terms and they started by letting everyone add new terms. Then they allowed moderators to merge terms. Then they required higher and higher reputation scores to be able to add new terms. But the terms were getting out of control. So this is cool, they now have wiki scope notes and synonyms for terms.

My CS colleague from work (hi Jack!) gives me a hard time – generically as a librarian – that I think all of the vocabularies should be determined in advance and human assigned, etc. He thinks these things should be emergent and machine assigned where possible. Obviously neither of us entirely subscribes to either of these views. If you have the luxury of the funding and time to have a good controlled vocabulary and human machine-aided indexing, your information system will be easier and better to search (better recall, better precision, more user satisfaction). However, it’s hardly ever the case that you have all of these things, and even if you do, user suggested terms are important to add to and maintain your CV.

Ok, one of my ongoing jokes is how CS keeps reinventing LIS (well indeed they’ve taken over the term “information science” in some places) – so now Stack Overflow has reinvented taxonomy (not quite a thesaurus though, right, because no BT or NT just UF and U, lol)

Edit 8/7: Promoting this from the comments. Joe Hourclé tells us that they've addressed some of the issues discussed here (I doubt they read this though :) ) see:

