Article Read: Schaffner, Jennifer. (May 2009). "The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections." Report produced by OCLC Research. Published online at: http://www.oclc.org/programs/publications/reports/2009-06.pdf.
An online, user editable resource list accompanying this report can be found at https://oclcresearch.webjunction.org/mdidresourcelist.
While questions regarding terminology in Metadata Discussion Group sessions often focus on techological terms, this month they focused on terms from the Archives sphere not commonly used in libraries.
The group began the primary discussion by considering the third sentence in the report's Introduction, "These days we are writing finding aids and cataloging collections largely to be discovered by search engines." Participants wondered if this statement was accurate, and if so what it meant for our descriptive practices. The first reaction expressed was "So what?" OCLC records are exposed to Google through WorldCat.org - does this mean we're already starting to recognize the importance of search engine exposure? Another participant wondered if this statement were true for all classes of users - we certainly have many different types, and presumably the studies cited in the report refer to different groups as well. Different types of users need different types of discovery tools. Regardless, there is a recognition that recent activities reflect a big paradigm shift for special collections – they’re no longer “elite” and only for serious researchers with letters of recommendation in order to see them. In wondering if our descriptive practices need to change to reflect this new user base and new discovery environments, participants noted that there are efforts ongoing to pull more out of library and archives-generated metadata, including structured vocabularies such as LCSH.
User-supplied metadata could certainly be part of this solution. At SAA last month, there was a session on Web 2.0 where one repository that presented touted the importance of user-supplied metadata for some of their materials. The repository reported that the user contributions needed some level of vetting but overall they were useful. It was noted that just scanning is not enough, though – not all resources are textual, those that are can be handwritten, and in languages other than English, both of which can pose challenges to automated transcription (OCR).
The group then wondered what other factors could be used in relevancy ranking algorithms, which libraries are notoriously suspicious of. Participants found the idea in p. 8 of the report that higher levels in a multi-level description be weighted more heavily intriguing. It was noted that perhaps the most common factors for relevance ranking are those that libraries don't traditionally collect - number of times cited, checked out, clicked on in a search result set. Relationships between texts in print not as robust as those on the Web, and this might be evidenced by the fact that Google Book Search ranking doesn't seem to be as effective as the Google Search Engine ranking. Personal names, place names, and events might be weighted more heavily, as this report suggests those things are of primary interest to users. We could also leverage our existing controlled vocabularies by weighting terms in them more heavily than terms that are not, and "explode" queries in full text corpuses to also include synonyms, and change search terms in systems with items cataloged with controlled vocabularies to meet the terms in those vocabularies. Participants debated the degree to which the system should suggest alternatives vs. making changes to queries and telling the user about it after it's done.
The session closed with a discussion of the comprehensiveness issue mentioned in the report. If users don't trust our resources if they believe them to be incomplete, what do we do? The quickest answer is "Never admit it!" No resource is ever truly comprehensive. Libraries certainly have put a positive spin on retroconversion projects, calling them "done" when large pockets of material are still unaddressed.