Metadata Discussion Group: Summary of MDG Session, 03-10-11

Articles discussed:

Doctorow, Cory. (2001) “Metacrap: putting the torch to seven straw-men of the meta-utopia.” Available online at http://www.well.com/~doctorow/metacrap.htm

Ardö, Anders. (2010) “Can We Trust Web Page Metadata?” Journal of Library Metadata, 10: 1, 58-74. Available online at http://dx.doi.org/10.1080/19386380903547008.

Moderator: Dot Porter, Associate Director for Digital Library Content & Services, Digital Library Program

March's discussion began with an observation: both articles discuss pulling metadata from webpages. Libraries are typically in the business of pushing metadata. Ardö's article confirms the kinds of webpage metadata problems Doctorow identified nine years earlier. A non-cataloger seemed surprised that the quality of web metadata even needed to be studied at all—of course it's horrible! Search engines have had the task of developing and enhancing smarter search algorithms and "did you mean __" services in order to combat poor or misleading metadata. Most search engine algorithms ignore metadata completely because web page metadata is unreliable.

The discussion diverted to the subject of crosswalking. A participant mentioned the evolution of the way library folks define the term metadata—who wasn't tired of hearing the "data about data" definition parroted ad nauseam?—into something much more. The change seems to have coincided with the cataloging world's increasing familiarity with XML and XML technologies. A better understanding of XML changed the cataloging world's perception of how library metadata might become more web-ready and accessible.

One participant wondered: if library catalogs had more Google-like search capabilities, would discovery be improved? Most participants thought so, although one participant worried that language would be a barrier to search. The web was once a largely English-speaking entity but increasingly, that isn't the case. Search engines have a hard time determining language, especially if there are multiple languages represented on the page. Another participant warned that websites often contain misleading metadata in an attempt to drive up search engine optimization (SEO).

Does buying into the Google model and integrating our resources more with the web mean that libraries will need to accept that search engines impose value judgments on content of the web? Some argued that the library catalog doesn't make value judgments in the same way that Google Scholar can tell you who else cited the article you're reading or in the way that Amazon can tell you what other customers bought in addition to the product you're currently purchasing. As a participant summed up: it is increasingly a world ruled by the "good-enough" principle. Searchers may not find the right thing but they found something and that's good enough.

A participant wondered how on-demand acquisition impacts collections and metadata. Another participant explained that on-demand materials are often accompanied by records of very poor quality. OCLC exacerbates the problem by merging records and retaining data (whether good or bad) in one big, messy record. A definite downside of on-demand purchasing was illustrated in a pilot project IU Libraries embarked on with NetLibrary. Ebooks were bought by the library after three patrons clicked on the link to view the ebook. Within six months, the $100,000 budget for on-demand ebook acquisition was gone. The participant admitted that some of the ebooks that were bought would not have been selected by a collection manager. It is likely that patrons clicked on the link to browse through the book and may have spent all of 30 seconds using the resource before clicking away to something else. As one participant pointed out, how many on-demand purchases could have been avoided if the accompanying metadata on the title splash page had included a table of contents?

The fact that users of the web expect all information to be free was also discussed. Good metadata isn't free, nor is the hosting of resources. This lead to the question: how do young information seekers prefer to read? Do they read research papers and novels online? Do they seek out concise articles and skim for the bullet points? Are our assumptions about the information habits of undergraduate students correct?

The discussion moved onto a topic started by the statement: the internet has filters that users are often unaware of. Participants wondered about the polarizing effect this might have on users' perception of information. Search engines learn what you search for and display related ads. How might this skew broad understanding of a topic if search engines are putting blinders on search results in a similar way? One must go out of one's way to seek out an opposing view point because one isn't likely to see it while idly browsing the web.

In an attempt to end the discussion back on topic, the moderator asked, what are the minimum metadata requirements for a resource? A participant cited the CONSER Standard Record for Serials as an example of an effort to establish minimum metadata requirements. This standard was founded not on the traditional cataloging principle of "describe" but rather on the FRBR principle of "identify." What metadata is needed for a user to find and identify a resource? Implementing the CONSER Standard Record increased production in the IUL serials cataloging unit. It was conceded that minimum metadata requirements may differ depending upon the collection, material, owning institution, etc. One thing that was apparent, whatever the standard, is the need for a universal name authority file.

Metadata Discussion Group

Monday, March 28, 2011

Summary of MDG Session, 03-10-11

No comments:

Blog Archive

Contributors