Metadata Discussion Group: Summary of MDG Session, 5-28-09

Article discussed: Smith-Yoshimura, Karen. 2009. "Networking Names." Report produced by OCLC Research.

Terminology issues discussed this month included "cooperative identies hub" (is this OCLC’s term? Yes) and API.

The meat of the session began with a discussion of the statement in the report that a preferred form of a name depends on context. Is this a switch for the library community? National authority files tend not to do it this way, but merging international files raises these same issues – they might transliterate Cyrillic differently for example. The VIAF project is having to deal with this issue. The group believes Canada, where this issue likely is raised a great deal due to its bilingual nature, just picks one form and goes with it.

Context here could mean many things: 1) show different form in different circumstances, 2) include contextual info about the person in the record, etc. For #2, work going on right now (or at least in the planning stages) trying to minimize how much language specific stuff goes into a record. One could then code each field by which language is used in the citation. Library practice includes vernacular forms in 400 fields now – these could then be primary in another language catalog. But the coding doesn’t yet distinguish which is really a cross-reference, and which will be preferred in some other language. So this might not be as useful as it would seem at first.

To achieve the first interpretation of flexible context, a 400 field in an authority record can no longer mean “don’t use this one, use the other one”. Purposes of authority file now: for catalogers to justify headings, for systems to automatically map cross-references. Displaying a name form based on context is definitely a new use case for these authority files.

Participants wondered if we add more information to our authority files to make them useful for other purposes, how do we ensure they still fulfil their primary purpose? Should our authority records become biographies? The group reached basic consensus that adding new stuff to these reocrds won’t substantively take away from the current disambiguation function.

The group then turned to privacy issues raised by the expanding functions of authority files. One individual noted that in the past, researchers went to reference books to find information on people. Has this information moved into the public sphere? An author's birth date is often on the book jacket. Notable people are in Wikipedia. The campus phone book is not private information. In the archival community, context is everything. Overall, the group felt we didn't need to worry too much about the privacy issue – for the most part functionality trumps the privacy issues. We still need to be careful but it looks like we're looking at an evolution of what we think of as privacy. We no longer think privacy = public but not easy to get to. Privacy and access control by obscurity is no longer a viable practice. One solution would be to keep some data private for some period of time. Some authors don’t want to give birth date, middle initial for privacy issues, and might respond better to a situation where this is stored but not openly public. Participants noted that this additional data is generally not needed for justification of headings or cross-references. But with expanded functionality, we'll need expanded data.

One participant wondered almost rhetorically if the authority file should be a list of your works, or also a list of your DUIs? In the archival community especially, the latter helps understand the person. How far should we go?

The Institutional Repository use case was the next topic of discussion. When getting faculty publications, it would help to expand the scope of the authority file at the national level. But at local level, many are already struggling with these issues. Do A&I firms do name collocation? Participants don’t think so.

Participants then wondered about the implication of opening up services to contributions from non-catalogers. Some felt we needed to just do it. Others thought opening to humans was a good idea but buggy machine processes could cause havoc. Even OCLC has a great deal of trouble with batch processing (duplicate detection, etc.), and they’re better at this than anyone else in our sphere. For human edits, the same issues apply as with Wikipedia, but our system don't get as many eyes. What is the right nodel for vetting and access control? Who is an authorized user?

Participants believed we need to keep the identities hub separate from the main name authority file for a while to work out issues before expanding the scope of the authority file significantly. The proposed discussion model in the report (p. 7) will help with the vandalism issue. The proposal flips the authority file model on its head, with lots of people adding data rather than just a few highly trained individuals. A participant wondered if the NAF in the end becomes the identities hub. Maybe the NAF feeds the identities hub instead.

Discussion then moved to the possibility of expanding this model to other things beyond names. Geographic places might benefit, but probably not subjects - this process contradicts the very idea of a controlled vocabulary. One participant noted a hub model could be used to document current linguistic practice with regards to subjects.

The session concluded with participants noting that authority control is the highest value activity catalogers do. The data that’s created by this process is the most useful of our data beyond libraries. We need to coordinate work and not duplicate effort.

Metadata Discussion Group

Saturday, August 8, 2009

Summary of MDG Session, 5-28-09

No comments:

Blog Archive

Contributors