Saturday, August 8, 2009

Summary of MDG Session, 4-23-09

Article discussed: Nellhaus, Tobin. "XML, TEI, and Digital Libraries in the Humanities." portal: Libraries and the Academy, Vol. 1, No. 3 (2001), pp. 257-277.

The conversation began by addressing terminology issues, as is typical of our Metadata Discussion Group sessions. "Stylesheet" and "boilerplate" were among the unfamiliar terms. One participant noted “expanded linking” is like the “paperless society” - a state much discussed but never remotely achieved.

Discussion of the points in the article began with a participant noting that the TEI is a set of guidelines rather than an official "standard." Is this OK? Can we feel safe using it? The group believed that if there is no “standard” for something then adopting guidelines is OK, with the idea that something better than nothing. Is there a standard that would compete with TEI? Docbook is the only real other option, and it hasn't been well adopted in the cultural heritage community. Participants wondered if the library commuity should push TEI towards standardization.

An interesting question then arose wondering if the TEI's roots in the humanitites made it less useful for other types of material. The problems with drama described in this article would extend to other formats too. What about music? How much should TEI expand into this and other areas?

Discussion at this point moved to how to implement TEI locally. Participants noted that local guidelines are necessary, and should be influenced by other projects. Having a standard or common best practices is powerful but that still leaves lots of room for local interepretation. Local practice is a potential barrier to interoperability - for example, a display stylesheet won’t work any more if you start using tags that aren’t in the stylesheet. Local implementations have to plan ahead of time for how the TEI will be used. In the library community, we create different levels of cataloging – encoding could follow the same model. Participants noted that we should do user studies to guide our local implentations.

The group performed an interesting thought experiment examining the many different ways TEI could be implemented, considering Romeo & Juliet. Begin with a version originally in print. Then someone typed it into Project Gutenberg so it was on the Web. Then someone figured out they needed scene markers so someone had to go back and encode for that. New uses mean we need new encoding. How do we balance adding more value to core stuff rather than doing new stuff? A participant noted that this is not a new problem - metadata has always been dynamic. The TEI tags for very detailed work are there, which makes it very tempting to do more encoding than a project specifies. Take the case of IU presidents' speeches. Do these need TEI markup or is full-text searching enough? It would be fascinating and fun to pull together all sorts of materials – primary, secondary, sound recordings of his speeches. But where is this type of treatment in our overall priorities?

A participant asked to step back and ask what can TEI do that full text searching can’t do. Some answers posed by the group were collocation and disambiguation of names, date searching for letters, pulling out and displaying just stage directions from a dramatic text.

We then returned to the notion of drama. It's hard to deal with plays both as literature and as performance. Is this like us treating sheet music bibliographically vs. archivally? Here the cover could be a work of art, music notation marked up in MusicXML, and text marked up in TEI. Nobody knew of an implementation that deals with this multiplicity of perspectives well. Something text-based has trouble dealing with time, for example. Participants noted that TEI starting to deal with this issue now, bt it's certainly a difficult problem.

A participant wondered what would happen if we were to just put images (or dirty OCR for typewritten originals, certainly not all of our stuff) up and mark it up later? This would be the “more product less process” approach currently in favor in the archival world. It would also be in keeping with current efforts to focus on unique materials and special collections rather than mass-produced and widely held materials.

Participants wondered if Google Book Search and HathiTrust do TEI markup. Nobody in the room new for absolute certain, but we didn’t think so.

The session concluded with a final thought, echoing many earlier conversations by this group: could crowdsourcing (user contributed efforts) be used as a means to help get the markup done?

No comments: