Wednesday, January 14, 2009

Summary of MDG Session, 12-18-08

Article discussed: Kurth, Marty, David Ruddy, and Nathan Rupp. (2004). "Repurposing MARC metadata: using digital project experience to develop a metadata management design." Library Hi Tech 22(2): 153-165.

The discussion group felt that while it was desirable that the work described in this article was based on theoretical work on metadata management, the explanation of the metadata management theory, including the concept of enterprise, was not extensive enough to fully understand the connection. It was clear, however, that to do management, you have to do mapping and transformation. Management allows you to rethink and retool. Our group was interested to know what has happened since this article was written. Have they put this into production? What has changed? It appears there is a follow-up article to this one that would be interesting to read.

The article claims that MARC mapping work is representative of the metadata management task as a whole. Choosing metadata standards based on specific project needs is good, and the projects described here demonstrate how to do that. It's easy to imagine a project where you can start with MARC. But what do you do when no MARC already exists? At IU we have experience in many library departments wth projects that re-use existing MARC metadata.

The group identified three possible cases for metadata management for a digital project: have existing MARC, have existing non-MARC, have no existing structured metadata. Are the strategies outlined in this article useful in all three of these cases? We didn't come to a strong conclusion on this issue.

An interesting discussion grew up around the topic of how to deal with legacy (pre-AACR2) MARC records? Institutional memory is likely the best bet, as documentation comparing older practices and current ones is sparse. Politcal boundaries change and places of publication become no longer correct. Some legacy data is easier to deal with, however. An institution could use an authority vendor to update name headings with death dates. Yet certain data elements should be updated over time, but others shouldn’t. The group noted that most metadata work is bibliographic record based and doesn’t do enough with authority records. Making the full authority structure available to the metadata creation staff is sorely needed.

A substantial amount of discussion time was spent on the topic of collection-specific mappings. The benefits of corse are that these get it done, the way you want it. The drawbacks are potentially reduced shareability and interoperability. One has to take the whole scope of the project in mind to make good decisions and worry about what’s really important. Have to keep USER in mind. This is difficult to do, though. We think “the user needs this information” but we should think “how can the user use this system?” One participant noted that we worry too much about the specialized discovery case to the detriment of the generalized one. How much tweaking of metadata mapping is of use? The community seems to swing back and forth over time between the generalized and specialized approaches.

The discussion then turned more theoretical, with thoughts on the changing roles of libraries – specifically, to what degree should we be the intermediary? If the user is on his or her own, should this change the way we provide access to information? We do see a great deal of evidence that libraries have moved to a model where users interact directly with information with no active intermediation from us. The system provides the intermediation that staff once did. We expect better technologies to automatically enrich our records in the future to help with this. For us, participants felt it was more important to get something out than to get it perfect. We need to make a better effort to integrate authority control into non-MARC environments. Automated methods will rely on the authority records a great deal. It therefore follows that we should send less time on bibliographic records and more on authority work. The MARC world is certainly moving in this direction, with professional catalogers doing more high-value activity, leaving the lower-value tasks to machines or lower-level staff. Mapping activities are an example of the higher-value activity, as seen in this article.

This article describes the most common transformation as MARC to simple DC. To make sure information gets into the right DC fields, one need to understand DC. Those doing the mapping must ask - what is the essential information to go in DC? What really identifies rather than just describes? The role of the cataloger would be to oversee the transformation process, to make sure it works correctly. This would need to happen both on the content end and the technical end.

What should relationship of metadata staff to technical staff be? Metadata staff understand both the source and the target data. They would still have to correct things in the output in the end. It certainly helps if the technical staff understand the data as well. Similarly, metadata staff need to have technical skills. For metadata staff, understanding non-standard source data can be a big challenge. The Bradley films are an example of these challenges here at IU. Each set of materials will have different balance of effort spent on it, based on perceived importance and use. Mapping often unearths mistakes in the original metadata. We must get the best bang for our buck by spending more time on the information that’s really important for the users, and leave the rest alone. Effective projects will also need the involvement of collection development staff.

Summary of MDG Session, 11-19-08

Article read: Cundiff, Morgan V. (2004). "An Introduction to the Metadata Encoding and Transmission Standard (METS)." Library Hi Tech 22(1): 52-64.

The session began with a question raised: is allowing arbitrary descriptive and administrative metadata formats inside METS documents a good idea? The obvious advantage is that it makes METS very versatile. But this could also limit its scope – does that make METS only for digitized versions of physical things, excluding born digital material? The group as a whole didn't believe this was an inherent limitation. The ability to add authorized extension schema over time seems to be a good thing, and necessary for the external schema allowance to work.

The flexibility of METS allows it to be used beyond its textual origins – to scores, sound recordings, images, etc. It could potentially be useful beyond libraries, especially to archives and museums. To balance this flexibility, is knowing some sort of structured metadata is being presented enough to ensure a reasonable level of interoperability?

The discussion then turned to the TYPE attribute on <div>, a topic much discussed in the METS community. How does a METS implementer know what values to use? An organization will presumably develop its own practice but the practices won’t be the same across institutions. A clever name for this was suggested: “plantation” metadata – each place can develop their own.

Are there lessons from library cataloging that could help with this problem? Institutions dealing with the same types of material could join together and harmonize practices. METS Profiles provide the means for documenting this, but they don’t really encourage collaboration. Perhaps the expectation is that the metadata marketplace will converge, and those going their own way will lose out some significant benefits, and see it in their best interest to collaborate.

This line of thought led to the question - How did OCLC/LC/the library community get standardized in the first place? Probably because individuals would write up their own rules, then share them. Eventually these rules became shared practice. Maybe this same shift will happen when sharing really becomes a priority. Diverse practices will converge when people really want them to.

A question was then raised about when METS should be used instead of MARC. When is MARC not enough? A participant made the analogy that this was like comparing a plantation to a video arcade. The two are for different purposes, and METS can include descriptive metadata in any format, including MARC. If you want to allow a certain type of searching, for example, a user wants to search for a recording by a certain group, saying METS is better than MARC doesn't make sense. The descriptive metadata schema used within METS is what is going to make the difference in this case, not the use of METS itself. An implementer will still need good descriptive information.

Participants then noted that we had been talking about systems, but we need to talk more about people. Conversations between communities with different practices will help improve interoperability. Can we standardize access points? To do this we would need to develop vocabularies collaboratively between communities, and talk more so that we understand each other’s point of view.

One participant made an extremely astute observation that the structure of METS makes it seem that it wasn't designed to be used directly by people. While metadata specialists often need to look at METS, and plan for what METS produced by an institution should look like, the commenter is correct that for the most part, METS is intended for machine consumption. A developer present noted that we could write an application that does a lot of what METS does without actually storing it in XML/METS – but the benefit of METS is abstracting out one more layer. Coming full circle to the flexibility issue from earlier in the discussion, it was noted that it is difficult to make standard METS tools (including parsers and generators) due to the almost infinite practices that must be accommodated. This led to the thought that perhaps METS could go much farther in being machine-friendly than it already is. That's a scary thought to metadata specialists who work with it!