Thursday, March 27, 2008

Summary of MDG session, 3-18-08

The article for discussion this month was:

Yakel, Elizabeth, Seth Shaw, and Polly Reynolds. "Creating the Next Generation Archival Finding Aids." D-Lib Magazine 13, no. 5/6 (May/June 2007). Available from http://www.dlib.org/dlib/may07/yakel/05yakel.html.

Early on, the discussion focused around the predictability (or lack thereof) of EAD files. EAD as a markup language is designed to be flexible, for the encoding of many different types of finding aids. This means that any two EAD-encoded finding aids may not look very much alike. The potential of using a common controlled vocabulary across finding aids was envisioned as one way to tackle this fundamental unpredictability. The group expressed the idea that for sharing, broad subject headings are good, despite the claim of the article that these weren't adequate. However, within the local environment, the specific ones this article says were needed make sense.

A large part of the group's discussion of this article worked through how better access could be provided to these materials with some reasonable level of expediency. While detailed analysis such as noting that a proverb appears within a story within a volume in the IU Folklore collection could be beneficial, it's unlikely we can afford to be this detailed. Respondents reported that it's often difficult to resist the urge to provide this detailed analysis, even though there is pressure to process collections quickly. One has to ask, how meaningful will the description be if I don’t go into more detail? One has to stop and think. General practice is to only pull out the "important" data, such as only some names rather than all of them. One participant noted that a recent article in the American Archivist (Fall/Winter 2007, Vol. 70, No. 2), "Archives of the People, by the People, for the People," by Max J. Evans discusses how one might get more mileage out of an EAD-encoded finding aid. [Note from after the meeting: this same volume has another article on the Polar Bear Expedition project which might address some of the issues the group was wishing were discussed in the article we read for this week. "Interaction in Virtual Archives: The Polar Bear Expedition Digital Collections Next Generation Finding Aid," by Magia Ghetu Krause and Elizabeth Yakel.]

Some members of the group expressed interest in studying how keyword indexing of full text could be used to help add description for archival collections (although the group realized automatically generating transcriptions of scanned handwritten documents is currently not very feasible). The CLiMB project at Columbia was noted as an example of how this technology might work. The possibility of capturing transcriptions from users was discussed.

Participants noted the potential utility of user-supplied information, as these users often have a vested interest in and knowledge of the materials.

The group wondered why the project staff was hesitant to include information from the database with data on soldiers, including birth dates, death dates, etc. The prevailing thought in the room was that if the catalog can include this short of information, it should - that this sort of information was not fundamentally out of scope of the "catalog."

Participants noted several features of the Polar Bear Expeditions site that they believed had been implemented well, including providing coherence to a collection brought together by theme rather than by format, effective browsing (although it was noted the browse might be used more because the search feature was not very full-featured!), and the fact that the entire collection had been digitized rather than just highlights. Some drawbacks were mentioned as well, most notably the current lack of critical mass of user comments, and clear information on what it is that brings these various collections together.

A "wish list" for more information on this project emerged, including specifics on the metadata implementation (e.g., what controlled vocabularies were used), and to what degree site features were developed in response to use cases and user studies. For example - the "visitor awareness" feature appears to be a way of getting users to talk to each other. The article didn't describe how this feature was determined to be a priority - was it implemented in response to a defined need or just because it was interesting? Participants also wanted more information on balancing this sort of functionality with user privacy issues, while recognizing that this sort of project can open users’ minds as to what is possible, allow us to get feedback from them, and to ask them what they want, while they’re using it.

The challenges described in this article were disheartening to some participants, who felt that this project represents a best possible case, with all the material already digitized. The fact that there were still so many problems is a bit scary, as the mantra we've been hearing is that online materials were supposed to make this sort of thing much easier. Or are we just making these system too complex? Flickr seems to work, and it operates at a much simpler level. To what degree does the system need to reflect the complexity of the collections and the items within them?