Metadata Discussion Group: Summary of MDG session, 4-22-08

The article for discussion this month was:

Borbinha, José. (2004). "Authority control in the world of metadata." Cataloging & Classification Quarterly 38(3/4): 105-116.

The article provoked a lively discussion that centered largely around the future and functions of authority control. It began by wondering what the tie-in to authority control in the article, as implied by the title, really was. The creator concept is very strong in the article, and authors are something we traditionally control in libraries, although archives treat creators differently. The explicit connection of the article to authority control is at the very end: it’s not so much about all using the same rules, but that we know what rules are being used. Control not as important as interoperability. Is this a good conclusion? It’s a practical one.

The discussion in this article is most useful to practicioners, in that it helps us think about why we do authority control in the first place. Some concern was expressed about the very general statements being made in what was perceived as overly technical language.

At this point, there was a bit of confusion in the room, as two participants realized they'd read the wrong article in preparation for the session. This article:

Vitali, Stefano. "Authority Control of Creators and the Second Edition of ISAAR(CPF), International Standard Archival Authority Record for Corporate Bodies, Persons, and Families." Cataloging & Classification Quarterly 38(3/4):185-199

...is an interesting one, discussing in depth the motivations and methods for authority control in archives. It's well worth a read.

That being settled, we returned to the Borbinha article. The effectiveness of crossing institituional borders was questioned– we don’t do this well but Amazon seems to. Perhaps we’re still too focused on our methodology, rather than the goal.

The moderator asked if the conceptual vs. context, etc., perspective was useful. The group was uncertain on this issue, and the gulf between theory and practice emerged as a discussion topic. Practitioners mostly know the records one sees in the cataloging interface, and in this mode, the distinction between, say, structure standards and content standards, can be confusing. AACR/MARC/data entry system are all taught together – the distinction between them is not generally made in training or daily life. Practitioners tend to move through the learning curve with both integrated into one’s mind. So it’s hard to think that we can make an AACR record in Dublin Core - overall it's very hard to talk about one without the other. Most never see the under-the-hood record coding at all. But in some cases it is useful to keep these distinctions in mind. What do we gain from thinking of them differently? It's likely not going to be effective to teach the conceptual first and then the practical.

From public service perspective, these functions as shown in Figure 3 are a black box that searches go into and come out of. How useful are these distinctions to that community?

Discussion then turned to the Fellini example in the paper– how would a system bring these together without authority control? Can we live with a system that isn’t perfect? Can we trust a secret and proprietary algorithm? What about the model of a human-generated Wikpedia page with a disambiguation step? Is it better to do the matching up of names ahead of time, or at search time?

Can Google connect Mark Twain and Samuel Clemens, as well as just misspelling Mark? Our authority records handle forms of names found in items, Google handles common misspellings. Are these things different? Authority control serves lots of purposes: disambiguate same name, collocate same person with multiple names, etc. But there’s room for both approaches. Search systems could have an authority list running underneath. Google works differently, pulling data from the Web rather than from an authority file. Ranganathan’s principle: “every book its reader” – can we say every search term its hit? OCLC WorldCat – we can guess they’re employing both the authority-based search and Google-based methodologies? Maybe not, doesn’t seem like they’re stemming. Their searches seem to be very literal.

The dual approach seems promising, as authority files have different purposes than the Google-style work. Maybe we could pull in data from 670 references (or from more structured data proposed by FRAD and showing up in RDA drafts)? Ask the user: “do you want the chemist or social scientist”?

Heterogeneity is part of our life, as the article mentions. We simply have to deal with it. We should find small models that can deal with it, and build on those. What about heterogeneity of thesauri? Some cases it’s clear what thesaurus to use, in others it's not. When you use different vocabularies, it’s a barrier to interoperability – how do we overcome this? This is the tension between doing one thing well and using the same standards for everything but do each less well that has come up in our discussions before. Yet Google and Amazon aren’t worrying about this. Google connects Twain and Clemens because somebody made the connection on the web page.

This is one of the drivers behind the "OPAC sucks" movement – for the audience that just wants something, not everything. There's a mistmatch between this goal and the one OPACs are designed around. But maybe users actually want something good (not everything good). Our systems don’t have most useful stuff at the top. WorldCat tries to do this by ranking by holdings. We’re doing something wrong when the Director of Technical Services goes to Amazon to find a book because she can’t find it in or catalog!

2 comments:

Unknown said...: I would like to suggest an additional issue that muddies the waters of searching, particularly title searching. Title indexes (what the computer searches when it looks for a title) are a huge culprit when one tries to find something in a library catalog and cannot. One cannot use authority control for titles. Our title indexes for IUCAT, for example, include variant titles, subtitles, titles in contents notes, series titles, and on and on. That means what a user thinks is a simple search finds too much stuff and there is no good way to put it in an order that is useful. On the IUCAT search screen, we tried to add a field for author to be searched with title words, but no one uses it, despite the fact that it works very well. Since most users search with one key word (in Google, in catalogs, in databasees), we need to figure out how to provide context to the search results to allow the user to find things more quickly. As Jenn says, most people just want some good things, not everything. WorldCat.org puts materials in order with the most heavily owned titles appearing first. And it allows the user to limit the search results using facets. The new SIRSI Enterprise system uses a FAST search that looks at 3 letter groupings. For acid rain, for example, it would search aci, cid, dra, etc. Then it uses context to put them together in manageable groups. I believe that authority systems are great for collocating materials by an author or by a subject, but in a large database, there has to be something more that makes the results usable for the average searcher. That means we need to look at ways to provide context in addition, something that authority control does not do well.; May 4, 2008 at 10:13 PM
Mechael said...: One additional problem we face with our current OPAC is the fact that we cannot do reindexing as frequently as we would like. Because of the large (and frequent) MARC record loads we do for e-resources, these titles always appear at the top of the indexes. This is very annoying when you are really trying to find a print book! While I am actually quite good at searching for serials (smaller universe, I know), trying to do a known item search for a monographic title is more difficult. So, when I can't find something in our OPAC, I do indeed search in Amazon to find the ISBN (usually my first choice) or OCLC WorldCat to see if we own the item and then link into our catalog. Am curious about the "field for author to be searched with title words" that Mary mentions has been added but no one uses. I'll have to try that out next time. (I'll also have to try to read the right article for our next discussion group session!); May 5, 2008 at 6:30 AM

Metadata Discussion Group

Sunday, May 4, 2008

Summary of MDG session, 4-22-08

2 comments:

Blog Archive

Contributors