Saturday, August 8, 2009

Summary of MDG Session, 3-28-09

Article discussed: Allinson, Julie, Pete Johnston and Andy Powell. (January 2007) "A Dublin Core Application Profile for Scholarly Works." Ariadne 50.

As usual, the discussion group session began with time to talk about unfamiliar terminology or acronyms/terms from the article that were not fully explained. This month, JISC, UKOLN, and OAI-PMH were covered in this question period.

We then moved to a discussion of whether the list of functional requirements described by this article is really the right one. Topics covered included:
  • Should preservation be on the list? Governments and libraries generally have this as a requirement? Do researchers? The group believed that overall, yes, researchers need preservation but don’t understand it as such - they just want to find something they've seen again later.
  • Multiple versions. Preprint, edited version, publisher pdf are all available and need to be managed. But maybe we don’t need to keep them all, but just tell users which they’re looking at. The NISO author version standard out there (in draft maybe?) is setting up a common terminology for us to use. Its important to archive these things. One possible solution: keep final version, and track earlier versions more as personal papers.
  • What about earlier work products like data sets and excel spreadsheets? How much in-process work can/do we want to save? Data sets could be used by many different publications. We would need to make sure users can get to the final writeup easily without getting bogged down in the preliminary stuff. Managing these earlier work products would shift the focus from the writeup to the researcher. Both are important, maybe we deal with them in separate systems. One member brought up Darwin’s Origin of the Species – the text is online and you can see earlier drafts, how the work evolved over time. The work process has long been an interesting area of research that we could promote more. However, it raises issues of rights management, author control. Should we allow the researcher to be in control of deciding what to deposit? We'd have to have them choose while they’re alive.
  • Unpublished works have different copyright durations. Are things in the IR published or not? Is a dissertation published or unpublished? Does placing it in a library “publish” it? AACR2 thinks dissertations are unpublished. But does copyright law?
  • What about peer reviewed status? Is the peer reviewed/non peer reviewed vocabulary we tend to use now good enough? Are there things in the middle? Early versions of a paper won’t have gone through the peer review process, and we need to track that one version is peer reviewed and another is not. Individual journal titles are peer reviewed or not so we can guess a paper's status based on that. But in general we would probably want to get this information from the author - there are columns, etc in peer reviewed journals that aren’t actually peer-reviewed.
  • Participants noted the requirement to facilitate search and browse - not many of our IR systems now do this all that well!
  • A participant asked if we should be providing access to these types of material by journal/conference? It’s duplicating work that others do. But for the preservation function this information is important.
  • Our functional requirements discussion wrapped up with participants noting that “cataloging” in these repositories doesn’t look like cataloging in our OPACs. Is this difference going to bite us later? This article describes data an author could never create. The authors obviously have decided cataloger-created data is worth the time and effort. It would be interesting to hear the rationale behind this decision.
We then turned to discussing the minimum data requirements described in the article.
  • Some of the minimum requirements seem very high end and difficult to know.
  • Participants wondered which attributes are listed don’t apply to large numbers of scholarly works. The following were identified: has translation, grant number, references.
  • The group then wondered if the authors had to be so flexible with minimum metadata requirements to allow authors to deposit their own material. Why wouldn’t authors want to do this? Time and effort seem to be big barriers. Even figuring out what version to deposit takes more time than most researchers care to spend.
  • Participants wondered how effective OA mandates are. In discussion, it was noted that they don’t make it any easier to deposit, and researchers might think it's still not worth their time. A prticipant quoted data from one scientific conference that said if you publish with them you have to provide all your data. 50% provided the full data. 20% uploaded an empty file just to meet the “upload something” requirement!
  • Conclusion: better systems are a key to actually collecting and saving this stuff.
The discussion moved to pondering how author involvement in the archiving process is a fundamentally new requirement. We never asked researchers to deposit papers in the University Archives before. How do we decide what’s worth keeping? Should we really preserve all of this stuff? How do we get people to the right stuff? Is this a selection/appraisal issue or a metadata issue? Our final conclusions were that the model described in this article helps with creating more functional systems but doesn’t help with making the system easier to use. Minimum requirements for deposit might just be a first step, but to achieve our greater goals that data would likely need to be enhanced later.

No comments: