Metadata Discussion Group: November 2009

Article read:

Nunberg, Geoff. "Google Books: A Metadata Train Wreck" Language Log blog, August 29, 2009. http://languagelog.ldc.upenn.edu/nll/?p=1701. Be sure to read through some of the comments, specifically the second comment left by Jon Orwant (Google engineer on the GBS team) on September 1, 2009 @ 1:51 am.
Nunberg, Geoff. "Google's Book Search: A Disaster for Scholars," The Chronicle of Higher Education, August 31, 2009. http://chronicle.com/article/Googles-Book-Search-A/48245/

This month's Metadata Discussion Group began with a discussion of the tone of the blog post and article, and the tone of the rhetoric in the community at large around the Google Book Search project. Participants expressed support for the idea that discussion needs to be reasoned and civil - neither Google nor libraries are all wrong or all right. It is more important to fix identified problems than to point fingers. One participant noted that the difference in tone between the blog post and the slightly later Chronicle article was telling. Numberg’s interest is clearly for the scholars, but this is more obvious in the Chronicle article than in the blog post. The Chronicle article immediately sets up a "this service is bad" tone by listing Elsevier as the first possible future owner! The Chronicle version doesn’t even give Google a chance for keeping this service as its own.

Participants quickly noted the findings of these article underscore what we already know, that there’s a lot of bad cataloging out there! A pre-1900 book might have 20 records in OCLC. The articles suggest the full OCLC or LC catalogs would help this service, and participants noted GBS does in fact have the full OCLC database. But are those really better than what GBS is actually using? An “authoritative” source of metadata is a library-centric view. There is no perfect catalog. There isn’t a “better catalog” for Google to get that would easily solve the problems found here. IU itself contributes to the problem: we send unlinked NOTIS item records with our shipments to Google. One participant noted that the “results of this are catastrophic” but we can’t feasibly do much better. We’re flagging these items to handle on return, but that doesn’t help Google.

Thinking about how to solve these problems led to a theme common in the Metadata Discussion Group sessions - what if we were to open up metadata editing to users? Wikipedia isn’t consistent, surely - would that approach here. A participant noted that OCLC itself is a cooperative venture and there are many inconsistencies there. Institutions futz with records locally and don’t send them back to OCLC. CONSER had a history of record edit wars and catalogers decided they just have to grit their teeth and deal with it.

Regarding date accuracy that received a great deal of space in these articles, a participant noted that expectations for these features in GBS are exceedingly unrealistic. Like one blog commenter posited, a user can’t assume all search results are relevant - one has to evaluate search results yourself from any info resource.

The discussion then turned to GBS' utility as a source for language usage. A participant noted that the traditional way to learn about when, say, "felicity" changed to "happiness," is to check the OED. But how does the person who wrote the OED entry know? Was it a manual process before, and could this change with GBS? Scholars haven’t LOST anything with the advent of GBS - it's just an additional tool for them.

Participants then noted that scholars aren't the only or perhaps even the primary audience for GBS. But should they be? A great deal (though not all - content comes from publisher too) comes from academic libraries who have built their collections primarily in support of scholarly activities. Shouldn't library partnerships come with some sort of responsibility on Google's part to pay attention to scholarly needs? For IU and the CIC and other academic libraries, HathiTrust is attempting to fulfill this role, but is that enough?

The next question the group considered was "Is GBS the 'last library'?" The proposed GBS settlement might stifle competition. However, libraries themselves haven’t shown we can really compete in this area. Enhanced cooperation seems to be the only way we might play a realistic role. Participants wondered whether the monopoly that seems to be emerging is the result of Google pushing others out or a lack of interest by potential competitors. Libraries have been wanting to enter into this area but the technology wasn’t there, then we didn’t have the resources. GBS is at an entirely different scale than libraries can realistically achieve. We’re struggling at IU with how to deal with only 6000 Google rejects.

Discussion then turned to some of the statistics presented by the Google engineer in a comment on the blog post, including the claim of “millions” of problems and BISAC accuracy rate of 90%. Participants guessed we have less than 10% howlers for subject headings in our catalogs, but there certainly are lots of them in there. Lots of redundancy in the MARC record gives more text that could be used to avoid this kind of obvious error. We wondered if Google is using any of this redundant information effectively.

The topic then turned back to whether Google Book Search should spend more effort meeting scholarly needs. What should they do differently to support this kind of user better? First, probably not use just a single classification scheme. Don't necessarily stop using BISAC, but they could also use alternatives - that's what Google is about, more information! They're definitely getting LCSH from MARC records, despite LCSH's limitations. The LCC class number could be used to devise a "primary" subject, and potentially words that aren't elsewhere in the record. Participants noted that as the GBS database grows each individual subject heading will start getting larger and larger result sets.

The session closed with some musings on how the Google and library communities might better learn from one another. The notion of constructive conversation rather than disdain was raised again. Then participants noted that the GBS engineer commenting on the blog post invited comment. Individuals can take advantage of this invitation, and IU as a GBS partner can provide information and start conversations at yet another level.

Metadata Discussion Group

Thursday, November 5, 2009

Summary of MDG Session, 10-15-09

Blog Archive

Contributors