Metadata Discussion Group: May 2008

The article for discussion this month was:

Borbinha, José. (2004). "Authority control in the world of metadata." Cataloging & Classification Quarterly 38(3/4): 105-116.

The article provoked a lively discussion that centered largely around the future and functions of authority control. It began by wondering what the tie-in to authority control in the article, as implied by the title, really was. The creator concept is very strong in the article, and authors are something we traditionally control in libraries, although archives treat creators differently. The explicit connection of the article to authority control is at the very end: it’s not so much about all using the same rules, but that we know what rules are being used. Control not as important as interoperability. Is this a good conclusion? It’s a practical one.

The discussion in this article is most useful to practicioners, in that it helps us think about why we do authority control in the first place. Some concern was expressed about the very general statements being made in what was perceived as overly technical language.

At this point, there was a bit of confusion in the room, as two participants realized they'd read the wrong article in preparation for the session. This article:

Vitali, Stefano. "Authority Control of Creators and the Second Edition of ISAAR(CPF), International Standard Archival Authority Record for Corporate Bodies, Persons, and Families." Cataloging & Classification Quarterly 38(3/4):185-199

...is an interesting one, discussing in depth the motivations and methods for authority control in archives. It's well worth a read.

That being settled, we returned to the Borbinha article. The effectiveness of crossing institituional borders was questioned– we don’t do this well but Amazon seems to. Perhaps we’re still too focused on our methodology, rather than the goal.

The moderator asked if the conceptual vs. context, etc., perspective was useful. The group was uncertain on this issue, and the gulf between theory and practice emerged as a discussion topic. Practitioners mostly know the records one sees in the cataloging interface, and in this mode, the distinction between, say, structure standards and content standards, can be confusing. AACR/MARC/data entry system are all taught together – the distinction between them is not generally made in training or daily life. Practitioners tend to move through the learning curve with both integrated into one’s mind. So it’s hard to think that we can make an AACR record in Dublin Core - overall it's very hard to talk about one without the other. Most never see the under-the-hood record coding at all. But in some cases it is useful to keep these distinctions in mind. What do we gain from thinking of them differently? It's likely not going to be effective to teach the conceptual first and then the practical.

From public service perspective, these functions as shown in Figure 3 are a black box that searches go into and come out of. How useful are these distinctions to that community?

Discussion then turned to the Fellini example in the paper– how would a system bring these together without authority control? Can we live with a system that isn’t perfect? Can we trust a secret and proprietary algorithm? What about the model of a human-generated Wikpedia page with a disambiguation step? Is it better to do the matching up of names ahead of time, or at search time?

Can Google connect Mark Twain and Samuel Clemens, as well as just misspelling Mark? Our authority records handle forms of names found in items, Google handles common misspellings. Are these things different? Authority control serves lots of purposes: disambiguate same name, collocate same person with multiple names, etc. But there’s room for both approaches. Search systems could have an authority list running underneath. Google works differently, pulling data from the Web rather than from an authority file. Ranganathan’s principle: “every book its reader” – can we say every search term its hit? OCLC WorldCat – we can guess they’re employing both the authority-based search and Google-based methodologies? Maybe not, doesn’t seem like they’re stemming. Their searches seem to be very literal.

The dual approach seems promising, as authority files have different purposes than the Google-style work. Maybe we could pull in data from 670 references (or from more structured data proposed by FRAD and showing up in RDA drafts)? Ask the user: “do you want the chemist or social scientist”?

Heterogeneity is part of our life, as the article mentions. We simply have to deal with it. We should find small models that can deal with it, and build on those. What about heterogeneity of thesauri? Some cases it’s clear what thesaurus to use, in others it's not. When you use different vocabularies, it’s a barrier to interoperability – how do we overcome this? This is the tension between doing one thing well and using the same standards for everything but do each less well that has come up in our discussions before. Yet Google and Amazon aren’t worrying about this. Google connects Twain and Clemens because somebody made the connection on the web page.

This is one of the drivers behind the "OPAC sucks" movement – for the audience that just wants something, not everything. There's a mistmatch between this goal and the one OPACs are designed around. But maybe users actually want something good (not everything good). Our systems don’t have most useful stuff at the top. WorldCat tries to do this by ranking by holdings. We’re doing something wrong when the Director of Technical Services goes to Amazon to find a book because she can’t find it in or catalog!

Metadata Discussion Group

Sunday, May 4, 2008

Summary of MDG session, 4-22-08

Blog Archive

Contributors