<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3563564247448844216</id><updated>2011-10-03T07:59:14.989-04:00</updated><title type='text'>Metadata Discussion Group</title><subtitle type='html'>The Metadata Discussion Group at Indiana University meets monthly to discuss an article relating to a current metadata topic. The group has no formal membership--discussions are open to anyone from the Libraries, the School of Library and Information Science, or elsewhere on campus who is interested in the topic under discussion.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>22</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-3543581819428792419</id><published>2011-08-17T17:08:00.002-04:00</published><updated>2011-08-17T17:14:15.224-04:00</updated><title type='text'>We've moved!</title><content type='html'>The IUB Metadata Discussion Group blog has been migrated to the IUB Libraries' blog service.&lt;br /&gt;&lt;br /&gt;You may now find us at: &lt;a href="https://blogs.libraries.iub.edu/metadata/"&gt;https://blogs.libraries.iub.edu/metadata/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Be sure to subscribe to the new RSS feed: &lt;a href="https://blogs.libraries.iub.edu/metadata/feed/"&gt;https://blogs.libraries.iub.edu/metadata/feed/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The archive will remain here for the time being but all future posts will appear at the link above.&lt;br /&gt;&lt;br /&gt;Thanks!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-3543581819428792419?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/3543581819428792419/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=3543581819428792419' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3543581819428792419'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3543581819428792419'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2011/08/weve-moved.html' title='We&apos;ve moved!'/><author><name>Jennifer Liss</name><uri>http://www.blogger.com/profile/15070900984853969794</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-1430885475433608336</id><published>2011-03-28T09:42:00.000-04:00</published><updated>2011-03-28T09:43:02.303-04:00</updated><title type='text'>Summary of MDG Session, 03-10-11</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Articles discussed&lt;/span&gt;:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Doctorow, Cory. (2001) “Metacrap: putting the torch to seven straw-men of the meta-utopia.” Available online at &lt;a href="http://www.well.com/%7Edoctorow/metacrap.htm"&gt;http://www.well.com/~doctorow/metacrap.htm&lt;/a&gt; &lt;/li&gt;&lt;br /&gt;&lt;li&gt;Ardö, Anders. (2010) “Can We Trust Web Page Metadata?” &lt;em&gt;Journal of Library Metadata&lt;/em&gt;, 10: 1, 58-74. Available online at &lt;a href="http://dx.doi.org/10.1080/19386380903547008"&gt;http://dx.doi.org/10.1080/19386380903547008&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Moderator&lt;/span&gt;: Dot Porter, Associate Director for Digital Library Content &amp;amp; Services, Digital Library Program&lt;br /&gt;&lt;br /&gt;March's discussion began with an observation: both articles discuss &lt;span style="font-style: italic;"&gt;pulling&lt;/span&gt; metadata from webpages. Libraries are typically in the business of &lt;span style="font-style: italic;"&gt;pushing&lt;/span&gt; metadata. Ardö's article confirms the kinds of webpage metadata problems Doctorow identified nine years earlier. A non-cataloger seemed surprised that the quality of web metadata even needed to be studied at all&amp;#151;of course it's horrible! Search engines have had the task of developing and enhancing smarter search algorithms and "did you mean __" services in order to combat poor or misleading metadata. Most search engine algorithms ignore metadata completely because web page metadata is unreliable.&lt;br /&gt;&lt;br /&gt;The discussion diverted to the subject of crosswalking. A participant mentioned the evolution of the way library folks define the term metadata&amp;#151;who wasn't tired of hearing the "data about data" definition parroted ad nauseam?&amp;#151;into something much more. The change seems to have coincided with the cataloging world's increasing familiarity with XML and XML technologies. A better understanding of XML changed the cataloging world's perception of how library metadata might become more web-ready and accessible.&lt;br /&gt;&lt;br /&gt;One participant wondered: if library catalogs had more Google-like search capabilities, would discovery be improved? Most participants thought so, although one participant worried that language would be a barrier to search. The web was once a largely English-speaking entity but increasingly, that isn't the case. Search engines have a hard time determining language, especially if there are multiple languages represented on the page. Another participant warned that websites often contain misleading metadata in an attempt to drive up search engine optimization (SEO).&lt;br /&gt;&lt;br /&gt;Does buying into the Google model and integrating our resources more with the web mean that libraries will need to accept that search engines impose value judgments on content of the web? Some argued that the library catalog doesn't make value judgments in the same way that Google Scholar can tell you who else cited the article you're reading or in the way that Amazon can tell you what other customers bought in addition to the product you're currently purchasing. As a participant summed up: it is increasingly a world ruled by the "good-enough" principle. Searchers may not find the right thing but they found &lt;span style="font-style: italic;"&gt;something&lt;/span&gt; and that's good enough.&lt;br /&gt;&lt;br /&gt;A participant wondered how on-demand acquisition impacts collections and metadata. Another participant explained that on-demand materials are often accompanied by records of very poor quality. OCLC exacerbates the problem by merging records and retaining data (whether good or bad) in one big, messy record. A definite downside of on-demand purchasing was illustrated in a pilot project IU Libraries embarked on with NetLibrary. Ebooks were bought by the library after three patrons clicked on the link to view the ebook. Within six months, the $100,000 budget for on-demand ebook acquisition was gone. The participant admitted that some of the ebooks that were bought would not have been selected by a collection manager. It is likely that patrons clicked on the link to browse through the book and may have spent all of 30 seconds using the resource before clicking away to something else. As one participant pointed out, how many on-demand purchases could have been avoided if the accompanying metadata on the title splash page had included a table of contents?&lt;br /&gt;&lt;br /&gt;The fact that users of the web expect all information to be free was also discussed. Good metadata isn't free, nor is the hosting of resources. This lead to the question: how do young information seekers prefer to read? Do they read research papers and novels online? Do they seek out concise articles and skim for the bullet points? Are our assumptions about the information habits of undergraduate students correct?&lt;br /&gt;&lt;br /&gt;The discussion moved onto a topic started by the statement: the internet has filters that users are often unaware of. Participants wondered about the polarizing effect this might have on users' perception of information. Search engines learn what you search for and display related ads. How might this skew broad understanding of a topic if search engines are putting blinders on search results in a similar way? One must go out of one's way to seek out an opposing view point because one isn't likely to see it while idly browsing the web.&lt;br /&gt;&lt;br /&gt;In an attempt to end the discussion back on topic, the moderator asked, what are the minimum metadata requirements for a resource? A participant cited the &lt;a href="http://www.loc.gov/catdir/cpso/conser.html"&gt;CONSER Standard Record for Serials&lt;/a&gt; as an example of an effort to establish minimum metadata requirements. This standard was founded not on the traditional cataloging principle of "describe" but rather on the FRBR principle of "identify." What metadata is needed for a user to find and identify a resource? Implementing the CONSER Standard Record increased production in the IUL serials cataloging unit. It was conceded that minimum metadata requirements may differ depending upon the collection, material, owning institution, etc. One thing that was apparent, whatever the standard, is the need for a universal name authority file.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-1430885475433608336?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/1430885475433608336/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=1430885475433608336' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/1430885475433608336'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/1430885475433608336'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2011/03/summary-of-mdg-session-03-10-11.html' title='Summary of MDG Session, 03-10-11'/><author><name>Jennifer Liss</name><uri>http://www.blogger.com/profile/15070900984853969794</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-727551780203311399</id><published>2011-03-08T12:04:00.005-05:00</published><updated>2011-03-08T13:21:42.472-05:00</updated><title type='text'>Summary of MDG Session, 02-03-11</title><content type='html'>&lt;p&gt;&lt;span style="font-weight: bold;"&gt;Article discussed:&lt;/span&gt; Ascher, James P. (Fall 2009) "Progressing toward Bibliography; or, Organic Growth in the Bibliographic Record." &lt;span style="font-style: italic;"&gt;RBM: a Journal of Rare Books, Manuscripts, and Cultural Heritage&lt;/span&gt; vol. 10, no. 2. Available online at &lt;a href="http://progressivebibliography.org/wp-content/uploads/2010/06/95.pdf"&gt;http://progressivebibliography.org/wp-content/uploads/2010/06/95.pdf&lt;/a&gt;.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Moderators:&lt;/span&gt; Lori Dekydtspotter, Rare Books and Special Collections Cataloger, Lilly Library and Whitney Buccicone, Literature Cataloger, Lilly Library&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The discussion began with a consideration of traditional cataloging models and how they measure up to assumptions made in the article. Ascher asserts that the cataloging of an item occurs once at full cataloging standards, making the cataloging process very time-intensive on the front end. However, in a shared cataloging utility such as OCLC, even the full-level cataloging of PCC records are often further revised by other member institutions. Many types of items are cataloged at a minimal level, perhaps because there is a mounting backlog, perhaps due to the nature of the collection. A cataloger with a government documents background pointed out the importance of controlling corporate body names and other headings in OCLC records—headings that are often left uncontrolled by PCC creators, thus requiring enhancement.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Mention of OCLC turned the conversation to a lamentation of the limits of cataloging tools. Rich copy-specific information that is required of special collections cataloging is handled poorly in OCLC, which is perhaps the price special libraries must pay by coming into the fold. Participants discussed problems that often arise when machines do the work of record cleanup and enhancement. OCLC's policy is to accept records from everyone and then merge records as needed—this makes a mess of item-level description. Clearly, we can't rely solely upon machines for metadata enrichment. It is unclear as to how applying the FRBR model to record creation might aid (or hinder) item-level metadata creation in a shared cataloging environment such as OCLC.&lt;/p&gt;&lt;p&gt;Not all of the discussion of OCLC as a tool for progressive bibliography was negative. It was observed that the infrastructure to support progressive bibliography seems to be in place; however, catalogers are not in the habit of using OCLC as a progressive cataloging tool. As one participant observed, technical services units routinely leave books sitting on frontlog shelves for six months, allowing time for full-level records to appear in OCLC. Another participant theorized that this “must-catalog-at-full-level” attitude in technical services departments, from acquisitions to cataloging to processing, may be linked to the fact that technical services still envisions records as paper files. We aren't printing cards anymore—a workflow in which cataloging would have to be full and complete—so why do we feel compelled to treat cataloging as a touch-it-once operation?&lt;/p&gt;&lt;p&gt;This isn't to say that progressive cataloging simply doesn't happen in technical services departments. As one participant pointed out, format often drives the need for progressive cataloging. For instance, serials catalogers constantly touch and retouch serial records due to the transitory nature of continuing resources. The cataloging of government documents sometimes requires retouching fully cataloged records without having the items in hand. Other times, there are collection management concerns that trigger the need to retouch records, for example, moving collections to an auxiliary off-site storage facility. Adding contents notes to the records of older multi-volume reference works makes requesting specific volumes possible. It seems that specific issues relating to material type and collection management lend themselves to progressive cataloging workflows.&lt;/p&gt;&lt;p&gt;The discussion turned to the topic of using non-catalogers—students, researchers and subject specialists—to create and enhance bibliographic records. Catalogers pointed out that, to a degree, this happens already. Examples included: cataloging units hiring graduate students with language expertise to create and edit records; the Library of Congress's crowd-sourcing of photograph tagging in the &lt;a href="http://www.flickr.com/commons/"&gt;Flickr Commons&lt;/a&gt;; and LibraryThing’s crowd-sourcing of name authority work (among many other things) in the &lt;a href="http://www.librarything.com/commonknowledge/"&gt;Common Knowledge&lt;/a&gt; system.&lt;/p&gt;&lt;p&gt;Living in an increasingly linked world, it is perhaps inevitable that participants were interested in how bibliographic records might be enhanced to include URLs linking to relevant information such as inventories, relevant blog posts about the item, etc. While the group didn't seem opposed to the idea, there were a couple caveats. In our records, we must be very clear about what those links are pointing to and how the links relate to the resource being described. And what happens when links break? Are reporting tools in place to notify catalogers when a PURL is broken?&lt;/p&gt;&lt;p&gt;We can't be sure how researchers will use our metadata in the future. From this idea, an interesting discussion arose regarding expected usage versus unanticipated usage of collections. Progressive bibliography ensures that bibliographic records remain relevant to current research trends. Researchers themselves are changing. A participant described an evolution in the type of researchers who are accessing special collections: patrons are less and less elite, serious researchers who are adept at navigating often complex discovery portals. Being able to serve both experienced and inexperienced researchers is paramount to exposing collections in a world in which collections are increasingly visible to the web.&lt;/p&gt;&lt;p&gt;Another idea that was explored was one of "on demand" progressive cataloging. A serials cataloger described how PCC/CONSER provides a list of available journals that have not yet been cataloged as a way of bolstering cooperative cataloging efforts. Another participant described two distinct queues for archives processing: one queue needing immediate, full-level processing and description, and another minimally processed queue awaiting full description and access. Into the former queue fall “on demand” record enrichment, driven by events, exhibitions, etc., for example, editing records relating to a well-known director's work shortly before that director appears at an event on campus. Following up on this idea, a rare books cataloger noted that "on demand" progressive bibliography often occurs at the request of donors who may want addition access points included in records for family members.&lt;/p&gt;&lt;p&gt;The meeting closed with a discussion of how catalogers of both general and special collections might implement a progressive bibliography/cataloging workflow. Everyone agreed that there was a need for more qualitative studies on the cost effectiveness of current MARC metadata production. One participant observed that non-MARC metadata project managers seem to be more amenable to using students and researchers to enhance metadata, leaving the initial processing and organization of materials to archivists and catalogers. Cited examples included the &lt;a href="http://www.dlib.indiana.edu/collections/vwwp/"&gt;Victorian Women Writers Project&lt;/a&gt;, the &lt;a href="http://www.dlib.indiana.edu/collections/newton/"&gt;Chymistry of Sir Isaac Newton&lt;/a&gt;, and much of the archival description and EAD creation in a number of archival units at the IUB campus. Are poorly designed tools the chief barrier to non-catalogers creating MARC metadata? Another participant responded that it's not just that the MARC metadata community needs better tools—cataloging needs simpler rules. Cataloging content standards such as AACR2, LCSH, and indeed, MARC itself are incredibly difficult to master. Is RDA an improvement over past content standards from a progressive bibliography perspective? Maybe. RDA's insistence on exact transcription of title page elements may provide someone who is enhancing the record at a later date with important clues about the author or publication that would otherwise be lost with AACR2 transcription rules. This RDA rule makes it easier to enhance a record or do NACO work at a later time without having the item in hand.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-727551780203311399?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/727551780203311399/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=727551780203311399' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/727551780203311399'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/727551780203311399'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2011/03/summary-of-mdg-session-02-03-11.html' title='Summary of MDG Session, 02-03-11'/><author><name>Jennifer Liss</name><uri>http://www.blogger.com/profile/15070900984853969794</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-4124407646380325830</id><published>2009-11-05T14:34:00.000-05:00</published><updated>2009-11-05T19:14:15.597-05:00</updated><title type='text'>Summary of MDG Session, 10-15-09</title><content type='html'>&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */ table.MsoNormalTable  {mso-style-name:"Table Normal";  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-parent:"";  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin:0in;  mso-para-margin-bottom:.0001pt;  mso-pagination:widow-orphan;  font-size:10.0pt;  font-family:"Times New Roman";  mso-ascii-font-family:Cambria;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Cambria;  mso-hansi-theme-font:minor-latin;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;!--StartFragment--&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;Article read:&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;ul style="margin-top: 0in; font-family: georgia;" type="disc"&gt;&lt;li class="MsoNormal" style="margin-top: 0.1pt; margin-bottom: 0.1pt;"&gt;Nunberg, Geoff.      "Google Books: A Metadata Train Wreck" Language Log blog, August      29, 2009. &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=1701"&gt;&lt;span style="color:blue;"&gt;http://languagelog.ldc.upenn.edu/nll/?p=1701&lt;/span&gt;&lt;/a&gt;. Be sure to read      through some of the comments, specifically the second comment left by Jon      Orwant (Google engineer on the GBS team) on September 1, 2009 @ 1:51 am.&lt;o:p&gt;&lt;/o:p&gt;&lt;/li&gt;&lt;li class="MsoNormal" style="margin-top: 0.1pt; margin-bottom: 0.1pt;"&gt;Nunberg, Geoff.      "Google's Book Search: A Disaster for Scholars," &lt;i style=""&gt;The Chronicle of Higher Education&lt;/i&gt;,      August 31, 2009. &lt;a href="http://chronicle.com/article/Googles-Book-Search-A/48245/"&gt;&lt;span style="color:blue;"&gt;http://chronicle.com/article/Googles-Book-Search-A/48245/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;This month's Metadata Discussion Group began with a discussion of the tone of the blog post and article, and the tone of the rhetoric in the community at large around the Google Book Search project. Participants expressed support for the idea that discussion needs to be reasoned and civil - neither Google nor libraries are all wrong or all right. It is more important to fix identified problems than to point fingers. One participant noted that the difference in tone between the blog post and the slightly later Chronicle article was telling. Numberg’s interest is clearly for the scholars, but this is more obvious in the Chronicle article than in the blog post. The Chronicle article immediately sets up a "this service is bad" tone by listing Elsevier as the first possible future owner! The Chronicle version doesn’t even give Google a chance for keeping this service as its own. &lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/p&gt;    &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt;Participants quickly noted the findings of these article underscore what we already know, that &lt;/o:p&gt;there’s a lot of bad cataloging out there! A pre-1900 book might have 20 records in OCLC. &lt;o:p&gt;The articles suggest the full OCLC or LC catalogs would help this service, and participants noted GBS does in fact have the full OCLC database.&lt;/o:p&gt; But are those really better than what GBS is actually using? An “authoritative” source of metadata is a library-centric view. There is no perfect catalog. There isn’t a “better catalog” for Google to get that would easily solve the problems found here. IU itself contributes to the problem: we send unlinked NOTIS item records with our shipments to Google. One participant noted that the “results of this are catastrophic” but we can’t feasibly do much better. We’re flagging these items to handle on return, but that doesn’t help Google.&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;Thinking about how to solve these problems led to a theme common in the Metadata Discussion Group sessions - what if we were to open up metadata editing to users? Wikipedia isn’t consistent, surely - would that approach here. A participant noted that OCLC itself is a cooperative venture and there are many inconsistencies there. Institutions futz with records locally and don’t send them back to OCLC. CONSER had a history of record edit wars and catalogers decided they just have to grit their teeth and deal with it. &lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt;Regarding &lt;/o:p&gt;date accuracy that received a great deal of space in these articles, a participant noted that expectations for these features in GBS are exceedingly unrealistic. Like one blog commenter posited, a user can’t assume all search results are relevant - one has to evaluate search results yourself from any info resource. &lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt;The discussion then turned to GBS' utility as a source for language usage. A participant noted that the traditional way to learn about when, say, "felicity" changed to "happiness," is to &lt;/o:p&gt;check the OED. But how does the person who wrote the OED entry know? Was it a manual process before, and could this change with GBS? &lt;o:p&gt;&lt;/o:p&gt;Scholars haven’t LOST anything with the advent of GBS - it's just an additional tool for them.&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;Participants then noted that scholars aren't the only or perhaps even the primary audience for GBS. But should they be? A great deal (though not all - content comes from publisher too) comes from academic libraries who have built their collections primarily in support of scholarly activities. Shouldn't library partnerships come with some sort of responsibility on Google's part to pay attention to scholarly needs? For IU and the CIC and other academic libraries, HathiTrust is attempting to fulfill this role, but is that enough?&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;The next question the group considered was &lt;o:p&gt;&lt;/o:p&gt;"Is GBS the 'last library'?" The proposed GBS settlement might stifle competition. However, libraries themselves haven’t shown we can really compete in this area. Enhanced cooperation seems to be the only way we might play a realistic role. Participants wondered whether the monopoly that seems to be emerging is the result of Google pushing others out or a lack of interest by potential competitors. Libraries have been wanting to enter into this area but the technology wasn’t there, then we didn’t have the resources. GBS is at an entirely different scale than libraries can realistically achieve. We’re struggling at IU with how to deal with &lt;span style="font-weight: bold;"&gt;only&lt;/span&gt; 6000 Google rejects.&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;Discussion then turned to some of the statistics presented by the Google engineer in a comment on the blog post, including the claim of “millions” of problems and BISAC accuracy rate of 90%. Participants guessed we have less than 10% howlers for subject headings in our catalogs, but there certainly are lots of them in there. Lots of redundancy in the MARC record gives more text that could be used to avoid this kind of obvious error. We wondered if Google is using any of this redundant information effectively.&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;The topic then turned back to whether Google Book Search should spend more effort meeting scholarly needs. What should they do differently to support this kind of user better? First, probably not use just a single classification scheme. Don't necessarily stop using BISAC, but they could also use alternatives - that's what Google is about, more information! They're definitely getting LCSH from MARC records, despite LCSH's limitations. The LCC class number could be used to devise a "primary" subject, and potentially words that aren't elsewhere in the record. Participants noted that as the GBS database grows each individual subject heading will start getting larger and larger result sets.&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;The session closed with some musings on how the Google and library communities might better learn from one another. The notion of constructive conversation rather than disdain was raised again. Then participants noted that the GBS engineer commenting on the blog post invited comment. Individuals can take advantage of this invitation, and IU as a GBS partner can provide information and start conversations at yet another level.&lt;/p&gt;&lt;span style="font-family:Times;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;  &lt;!--EndFragment--&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-4124407646380325830?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/4124407646380325830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=4124407646380325830' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/4124407646380325830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/4124407646380325830'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/11/summary-of-mdg-session-10-15-09.html' title='Summary of MDG Session, 10-15-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-2185826504695829701</id><published>2009-09-24T17:53:00.000-04:00</published><updated>2009-09-24T20:00:29.690-04:00</updated><title type='text'>Summary of MDG Session, 9-17-09</title><content type='html'>&lt;p class="MsoNormal"  style="margin: 0.1pt 0in; font-family: georgia;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;Article Read: Schaffner, Jennifer. (May 2009). "The Metadata &lt;i style=""&gt;is&lt;/i&gt; the Interface: Better Description for Better Discovery of Archives and Special Collections." Report produced by OCLC Research. Published online at: &lt;a href="http://www.oclc.org/programs/publications/reports/2009-06.pdf"&gt;http://www.oclc.org/programs/publications/reports/2009-06.pdf&lt;/a&gt;.&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin: 0.1pt 0in; font-family: georgia;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin: 0.1pt 0in; font-family: georgia;"&gt;&lt;span style="font-size:100%;"&gt;An online, user editable resource list accompanying this report can be found at &lt;a href="https://oclcresearch.webjunction.org/mdidresourcelist"&gt;https://oclcresearch.webjunction.org/mdidresourcelist&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin: 0.1pt 0in; font-family: georgia;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"  style="margin: 0.1pt 0in; font-family: georgia;font-family:georgia;"&gt;&lt;span style=";font-size:100%;" &gt;While questions regarding terminology in Metadata Discussion Group sessions often focus on techological terms, this month they focused on terms from the Archives sphere not commonly used in libraries. &lt;/span&gt;&lt;span style=";font-size:100%;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;RLIN AMC was explained as an RLIN database with archives format MARC records (before format integration), ISAD as the archival parallel to ISBD, and fonds as sets of materials organically created by an individual, family, or organization during the course of its regular work.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;  &lt;p style="font-family: georgia;font-family:georgia;" class="MsoNormal" &gt;&lt;span style="font-size:100%;"&gt;The group began the primary discussion by considering the third sentence in the report's Introduction, "These days we are writing finding aids and cataloging collections largely to be discovered by search engines." Participants wondered if this statement was accurate, and if so what it meant for our descriptive practices. The first reaction expressed was "So what?" OCLC records are exposed to Google through WorldCat.org - does this mean we're already starting to recognize the importance of search engine exposure? Another participant wondered if this statement were true for all classes of users - we certainly have many different types, and presumably the studies cited in the report refer to different groups as well. Different types of users need different types of discovery tools. Regardless, there is a recognition that recent activities reflect a big paradigm shift for special collections – they’re no longer “elite” and only for serious researchers with letters of recommendation in order to see them. In wondering if our descriptive practices need to change to reflect this new user base and new discovery environments, participants noted that there are efforts ongoing to pull more out of library and archives-generated metadata, including structured vocabularies such as LCSH.&lt;br /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p style="font-family: georgia;font-family:georgia;" class="MsoNormal" &gt;&lt;span style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;The discussion then turned to the report's presentation of users' interest in the "aboutness" of resources. How do we go about supporting this? If we digitize everything will that help? For textual records, relevancy ranking could definitely have an impact.  B&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;ut we can’t have it both ways – getting some level of description out quickly and describing things robustly seem to be antithetical. Can we do this in two phases – first get it out there, then let the scholar figure out what it’s about? Do archivists and catalogers have the background knowledge to do the “aboutness” cataloging?&lt;br /&gt;&lt;br /&gt;User-supplied metadata could certainly be part of this solution. At SAA last month, there was a session on Web 2.0 where one repository that presented touted the importance of user-supplied metadata for some of their materials. The repository reported that the user contributions needed some level of vetting but overall they were useful. It was noted that just scanning is not enough, though – not all resources are textual, those that are can be handwritten, and in languages other than English, both of which can pose challenges to automated transcription (OCR).&lt;br /&gt;&lt;br /&gt;The group then wondered what other factors could be used in relevancy ranking algorithms, which libraries are notoriously suspicious of. Participants found the idea in p. 8 of the report that higher levels in a multi-level description be weighted more heavily intriguing. It was noted that perhaps the most common factors for relevance ranking are those that libraries don't traditionally collect - number of times cited, checked out, clicked on in a search result set. Relationships between texts in print not as robust as those on the Web, and this might be evidenced by the fact that Google Book Search ranking doesn't seem to be as effective as the Google Search Engine ranking. Personal&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt; names, place names, and events might be weighted more heavily, as this report suggests those things are of primary interest to users. We could also leverage our existing controlled vocabularies by weighting terms in them more heavily than terms that are not, and "explode" queries in full text corpuses to also include synonyms, and change search terms in systems with items cataloged with controlled vocabularies to meet the terms in those vocabularies. Participants debated the degree to which the system should suggest alternatives vs. making changes to queries and telling the user about it after it's done.&lt;br /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;  &lt;p style="font-family: georgia;font-family:georgia;" class="MsoNormal" &gt;&lt;span style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;The discussion then turned to a frequent topic in "future of libraries" conversation today - getting our resources out to where the users are. Scholars in general make reasonable use of spe&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;cialized portals, but not everyone knows how to do that. Can we “train” our users to go to IU resources if they’re in the IU community? Many present think this approach is nearly hopeless.&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt; Could we guide users to appropriate online resources based on their status? Some participants noted that personalization efforts haven't been all that effective. We can’t box people into specific disciplines – research is increasingly interdisciplinary. Even if they don’t log in we can capture their search strings, though, and potentially use this data. We could c&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;ount how many times something was looked at, use this in relevance ranking. This system isn't perfect of course – what's on the first page of results naturally gets clicked more. A click, or a checkout, isn't necessarily a positive review, though - could we capture negative reviews? We certainly would benefit from knowing more about how our resources are used. How extensive/serious is each use? Were things actually read? Could we put up a pop up survey on our web site? Users can write reviews in WorldCat Local, should we do this too, or point people to those reviews? Participants noted there is still a role for the librarian/archivist mediator, helping users to understand what tools are available, then letting them use these tools on their own. When we don’t have “aboutness” in our data, users can miss things, and the much maligned “omniscient archivist” can fill in the gaps. &lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;&lt;br /&gt;&lt;br /&gt;The session closed with a discussion of the comprehensiveness issue mentioned in the report. If users don't trust our resources if they believe them to be incomplete, what do we do? The quickest answer is "&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;Never admit it!" No resource is ever truly comprehensive. Libraries certainly have put a positive spin on retroconversion projects, calling them "done" when large pockets of material are still unaddressed.&lt;/span&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;  &lt;p style="font-family: georgia;font-family:georgia;" class="MsoNormal" &gt;&lt;span style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-family: georgia;font-size:100%;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;  &lt;p style="font-family: georgia;font-family:georgia;" class="MsoNormal" &gt;&lt;span style="font-size:100%;"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;!--EndFragment--&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-2185826504695829701?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/2185826504695829701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=2185826504695829701' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/2185826504695829701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/2185826504695829701'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/09/summary-of-mdg-session-9-17-09.html' title='Summary of MDG Session, 9-17-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-7469031261594589598</id><published>2009-08-08T16:28:00.000-04:00</published><updated>2009-08-08T16:52:29.414-04:00</updated><title type='text'>Summary of MDG Session, 5-28-09</title><content type='html'>Article discussed: Smith-Yoshimura, Karen. 2009. "&lt;a href="http://www.oclc.org/programs/publications/reports/2009-05.pdf"&gt;Networking Names&lt;/a&gt;." Report produced by OCLC Research.&lt;br /&gt;&lt;br /&gt;Terminology issues discussed this month included "cooperative identies hub" (is this OCLC’s term? Yes) and API.&lt;br /&gt;&lt;br /&gt;The meat of the session began with a discussion of the statement in the report that a preferred form of a name depends on context. Is this a switch for the library community? National authority files tend not to do it this way, but merging international files raises these same issues – they might transliterate Cyrillic differently for example. The VIAF project is having to deal with this issue. The group believes Canada, where this issue likely is raised a great deal due to its bilingual nature, just picks one form and goes with it.&lt;br /&gt;&lt;br /&gt;Context here could mean many things: 1) show different form in different circumstances, 2) include contextual info about the person in the record, etc. For #2, work going on right now (or at least in the planning stages) trying to minimize how much language specific stuff goes into a record. One could then code each field by which language is used in the citation. Library practice includes vernacular forms in 400 fields now – these could then be primary in another language catalog. But the coding doesn’t yet distinguish which is really a cross-reference, and which will be preferred in some other language. So this might not be as useful as it would seem at first.&lt;br /&gt;&lt;br /&gt;To achieve the first interpretation of flexible context, a 400 field in an authority record can no longer mean “don’t use this one, use the other one”. Purposes of authority file now: for catalogers to justify headings, for systems to automatically map cross-references. Displaying a name form based on context is definitely a new use case for these authority files.&lt;br /&gt;&lt;br /&gt;Participants wondered if we add more information to our authority files to make them useful for other purposes, how do we ensure they still fulfil their primary purpose? Should our authority records become biographies? The group reached basic consensus that adding new stuff to these reocrds won’t substantively take away from the current disambiguation function.&lt;br /&gt;&lt;br /&gt;The group then turned to privacy issues raised by the expanding functions of authority files. One individual noted that in the past, researchers went to reference books to find information on people. Has this information moved into the public sphere? An author's birth date is often on the book jacket. Notable people are in Wikipedia. The campus phone book is not private information. In the archival community, context is everything. Overall, the group felt we didn't need to worry &lt;span style="font-style: italic;"&gt;too&lt;/span&gt; much about the privacy issue – for the most part functionality trumps the privacy issues. We still need to be careful but it looks like we're looking at an evolution of what we think of as privacy. We no longer think privacy = public but not easy to get to. Privacy and access control by obscurity is no longer a viable practice. One solution would be to keep some data private for some period of time. Some authors don’t want to give birth date, middle initial for privacy issues, and might respond better to a situation where this is stored but not openly public. Participants noted that this additional data is generally not needed for justification of headings or cross-references. But with expanded functionality, we'll need expanded data.&lt;br /&gt;&lt;br /&gt;One participant wondered almost rhetorically if the authority file should be a list of your works, or also a list of your DUIs? In the archival community especially, the latter helps understand the person. How far should we go?&lt;br /&gt;&lt;br /&gt;The Institutional Repository use case was the next topic of discussion. When getting faculty publications, it would help to expand the scope of the authority file at the national level. But at local level, many are already struggling with these issues. Do A&amp;amp;I firms do name collocation? Participants don’t think so.&lt;br /&gt;&lt;br /&gt;Participants then wondered about the implication of opening up services to contributions from non-catalogers. Some felt we needed to just do it. Others thought opening to humans was a good idea but buggy machine processes could cause havoc. Even OCLC has a great deal of trouble with batch processing (duplicate detection, etc.), and they’re better at this than anyone else in our sphere. For human edits, the same issues apply as with Wikipedia, but our system don't get as many eyes. What is the right nodel for vetting and access control? Who is an authorized user?&lt;br /&gt;&lt;br /&gt;Participants believed we need to keep the identities hub separate from the main name authority file for a while to work out issues before expanding the scope of the authority file significantly. The proposed discussion model in the report (p. 7) will help with the vandalism issue. The proposal flips the authority file model on its head, with lots of people adding data rather than just a few highly trained individuals. A participant wondered if the NAF in the end becomes the identities hub. Maybe the NAF feeds the identities hub instead.&lt;br /&gt;&lt;br /&gt;Discussion then moved to the possibility of expanding this model to other things beyond names. Geographic places might benefit, but probably not subjects - this process contradicts the very idea of a controlled vocabulary. One participant noted a hub model could be used to document current linguistic practice with regards to subjects.&lt;br /&gt;&lt;br /&gt;The session concluded with participants noting that authority control is the highest value activity catalogers do. The data that’s created by this process is the most useful of our data beyond libraries. We need to coordinate work and not duplicate effort.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-7469031261594589598?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/7469031261594589598/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=7469031261594589598' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/7469031261594589598'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/7469031261594589598'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/08/summary-of-mdg-session-5-28-09.html' title='Summary of MDG Session, 5-28-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-3614860030821488582</id><published>2009-08-08T15:38:00.000-04:00</published><updated>2009-08-08T16:25:10.981-04:00</updated><title type='text'>Summary of MDG Session, 4-23-09</title><content type='html'>Article discussed: Nellhaus, Tobin. "XML, TEI, and Digital Libraries in the Humanities." &lt;span style="font-style: italic;"&gt;portal: Libraries and the Academy&lt;/span&gt;, Vol. 1, No. 3 (2001), pp. 257-277.&lt;br /&gt;&lt;br /&gt;The conversation began by addressing terminology issues, as is typical of our Metadata Discussion Group sessions. "Stylesheet" and "boilerplate" were among the unfamiliar terms. One participant noted “expanded linking” is like the “paperless society” - a state much discussed but never remotely achieved.&lt;br /&gt;&lt;br /&gt;Discussion of the points in the article began with a participant noting that the TEI is a set of guidelines rather than an official "standard." Is this OK? Can we feel safe using it? The group believed that if there is no “standard” for something then adopting guidelines is OK, with the idea that something better than nothing. Is there a standard that would compete with TEI? &lt;a href="http://www.oasis-open.org/docbook/"&gt;Docbook&lt;/a&gt; is the only real other option, and it hasn't been well adopted in the cultural heritage community. Participants wondered if the library commuity should push TEI towards standardization.&lt;br /&gt;&lt;br /&gt;An interesting question then arose wondering if the TEI's roots in the humanitites made it less useful for other types of material. The problems with drama described in this article would extend to other formats too. What about music? How much should TEI expand into this and other areas?&lt;br /&gt;&lt;br /&gt;Discussion at this point moved to how to implement TEI locally. Participants noted that local guidelines are necessary, and should be influenced by other projects. Having a standard or common best practices is powerful but that still leaves lots of room for local interepretation. Local practice is a potential barrier to interoperability - for example, a display stylesheet won’t work any more if you start using tags that aren’t in the stylesheet. Local implementations have to plan ahead of time for how the TEI will be used. In the library community, we create different levels of cataloging – encoding could follow the same model. Participants noted that we should do user studies to guide our local implentations.&lt;br /&gt;&lt;br /&gt;The group performed an interesting thought experiment examining the many different ways TEI could be implemented, considering &lt;span style="font-style: italic;"&gt;Romeo &amp;amp; Juliet&lt;/span&gt;. Begin with a version originally in print. Then someone typed it into Project Gutenberg so it was on the Web. Then someone figured out they needed scene markers so someone had to go back and encode for that. New uses mean we need new encoding. How do we balance adding more value to core stuff rather than doing new stuff? A participant noted that this is not a new problem - metadata has always been dynamic. The TEI tags for very detailed work are there, which makes it very tempting to do more encoding than a project specifies. Take the case of IU presidents' speeches. Do these need TEI markup or is full-text searching enough? It would be fascinating and fun to pull together all sorts of materials – primary, secondary, sound recordings of his speeches. But where is this type of treatment in our overall priorities?&lt;br /&gt;&lt;br /&gt;A participant asked to step back and ask what can TEI do that full text searching can’t do. Some answers posed by the group were collocation and disambiguation of names, date searching for letters, pulling out and displaying just stage directions from a dramatic text.&lt;br /&gt;&lt;br /&gt;We then returned to the notion of drama. It's hard to deal with plays both as literature and as performance. Is this like us treating sheet music bibliographically vs. archivally? Here the cover could be a work of art, music notation marked up in MusicXML, and text marked up in TEI. Nobody knew of an implementation that deals with this multiplicity of perspectives well. Something text-based has trouble dealing with time, for example. Participants noted that TEI starting to deal with this issue now, bt it's certainly a difficult problem.&lt;br /&gt;&lt;br /&gt;A participant wondered what would happen if we were to just put images (or dirty OCR for typewritten originals, certainly not all of our stuff) up and mark it up later? This would be the “more product less process” approach currently in favor in the archival world. It would also be in keeping with current efforts to focus on unique materials and special collections rather than mass-produced and widely held materials.&lt;br /&gt;&lt;br /&gt;Participants wondered if Google Book Search and HathiTrust do TEI markup. Nobody in the room new for absolute certain, but we didn’t think so.&lt;br /&gt;&lt;br /&gt;The session concluded with a final thought, echoing many earlier conversations by this group: could crowdsourcing (user contributed efforts) be used as a means to help get the markup done?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-3614860030821488582?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/3614860030821488582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=3614860030821488582' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3614860030821488582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3614860030821488582'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/08/summary-of-mdg-session-4-23-09.html' title='Summary of MDG Session, 4-23-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-6034593936217167239</id><published>2009-08-08T11:50:00.000-04:00</published><updated>2009-08-08T16:26:54.365-04:00</updated><title type='text'>Summary of MDG Session, 3-28-09</title><content type='html'>Article discussed: Allinson, Julie, Pete Johnston and Andy Powell. (January 2007) "&lt;a href="http://www.ariadne.ac.uk/issue50/allinson-et-al/"&gt;A Dublin Core Application Profile for Scholarly Works&lt;/a&gt;." Ariadne 50.&lt;br /&gt;&lt;br /&gt;As usual, the discussion group session began with time to talk about unfamiliar terminology or acronyms/terms from the article that were not fully explained. This month, JISC, UKOLN, and OAI-PMH were covered in this question period.&lt;br /&gt;&lt;br /&gt;We then moved to a discussion of whether the list of functional requirements described by this article is really the right one. Topics covered included:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Should preservation be on the list? Governments and libraries generally have this as a requirement? Do researchers? The group believed that overall, yes, researchers need preservation but don’t understand it as such - they just want to find something they've seen again later.&lt;/li&gt;&lt;li&gt;Multiple versions. Preprint, edited version, publisher pdf are all available and need to be managed. But maybe we don’t need to keep them all, but just tell users which they’re looking at. The NISO author version standard out there (in draft maybe?) is setting up a common terminology for us to use. Its important to archive these things. One possible solution: keep final version, and track earlier versions more as personal papers.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;What about earlier work products like data sets and excel spreadsheets? How much in-process work can/do we want to save? Data sets could be used by many different publications. We would need to make sure users can get to the final writeup easily without getting bogged down in the preliminary stuff. Managing these earlier work products would shift the focus from the writeup to the researcher. Both are important, maybe we deal with them in separate systems. One member brought up &lt;span style="font-style: italic;"&gt;Darwin’s Origin of the Species&lt;/span&gt; – the text is online and you can see earlier drafts, how the work evolved over time. The work process has long been an interesting area of research that we could promote more. However, it raises issues of rights management, author control. Should we allow the researcher to be in control of deciding what to deposit? We'd have to have them choose while they’re alive.&lt;/li&gt;&lt;li&gt;Unpublished works have different copyright durations. Are things in the IR published or not? Is a dissertation published or unpublished? Does placing it in a library “publish” it? AACR2 thinks dissertations are unpublished. But does copyright law?&lt;/li&gt;&lt;li&gt;What about peer reviewed status? Is the peer reviewed/non peer reviewed vocabulary we tend to use now good enough? Are there things in the middle? Early versions of a paper won’t have gone through the peer review process, and we need to track that one version is peer reviewed and another is not. Individual journal titles are peer reviewed or not so we can guess a paper's status based on that. But in general we would probably want to get this information from the author - there are columns, etc in peer reviewed journals that aren’t actually peer-reviewed. &lt;/li&gt;&lt;li&gt;Participants noted the requirement to facilitate search and browse - not many of our IR systems now do this all that well!&lt;/li&gt;&lt;li&gt;A participant asked if we should be providing access to these types of material by journal/conference? It’s duplicating work that others do. But for the preservation function this information is important.&lt;/li&gt;&lt;li&gt;Our functional requirements discussion wrapped up with participants noting that “cataloging” in these repositories doesn’t look like cataloging in our OPACs. Is this difference going to bite us later? This article describes data an author could never create. The authors obviously have decided cataloger-created data is worth the time and effort. It would be interesting to hear the rationale behind this decision.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;We then turned to discussing the minimum data requirements described in the article.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Some of the minimum requirements seem very high end and difficult to know.&lt;/li&gt;&lt;li&gt;Participants wondered which attributes are listed don’t apply to large numbers of scholarly works. The following were identified: has translation, grant number, references.&lt;/li&gt;&lt;li&gt;The group then wondered if the authors had to be so flexible with minimum metadata requirements to allow authors to deposit their own material. Why wouldn’t authors want to do this? Time and effort seem to be big barriers. Even figuring out what version to deposit takes more time than most researchers care to spend.&lt;/li&gt;&lt;li&gt;Participants wondered how effective OA mandates are. In discussion, it was noted that they don’t make it any easier to deposit, and researchers might think it's still not worth their time. A prticipant quoted data from one scientific conference that said if you publish with them you have to provide all your data. 50% provided the full data. 20% uploaded an empty file just to meet the “upload something” requirement!&lt;/li&gt;&lt;li&gt;Conclusion: better systems are a key to actually collecting and saving this stuff.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The discussion moved to pondering how author involvement in the archiving process is a fundamentally new requirement. We never asked researchers to deposit papers in the University Archives before. How do we decide what’s worth keeping? Should we really preserve all of this stuff? How do we get people to the right stuff? Is this a selection/appraisal issue or a metadata issue? Our final conclusions were that the model described in this article helps with creating more functional systems but doesn’t help with making the system easier to use. Minimum requirements for deposit might just be a first step, but to achieve our greater goals that data would likely need to be enhanced later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-6034593936217167239?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/6034593936217167239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=6034593936217167239' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6034593936217167239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6034593936217167239'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/08/summary-of-mdg-session-2-28-09.html' title='Summary of MDG Session, 3-28-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-6795915849192699275</id><published>2009-03-14T14:05:00.000-04:00</published><updated>2009-03-14T15:29:55.539-04:00</updated><title type='text'>Summary of MDG Session, 2-12-09</title><content type='html'>Article discussed: Alexander, Arden and Tracy Meehleib (2001). "The Thesaurus for Graphic Materials: Its History, Use, and Future." &lt;span style="font-style: italic;"&gt;Cataloging &amp;amp; Classification Quarterly&lt;/span&gt; 31(3/4): 189-212.&lt;br /&gt;&lt;br /&gt;February's Metadata Discussion Group session was a lively one. The topic of subject vocabularies beyond LCSH sparked a great deal of interest. The session began with discussion of why a separate subject vocabulary for graphic materials was needed, especially in the Library of Congress. Some participants had even cataloged pictures or posters with LCSH, not knowing that other options existed. Participants realized the need for subject terms that were not in LCSH for describing photographic materials, but recognized the potential to add these terms to LCSH rather than starting a new subject vocabulary. The primary reason for needing a separation of subject vocabularies identified during this discussion was a difference in the level of specificity needed for cataloging visual material as opposed to textual material.&lt;br /&gt;&lt;br /&gt;Participants then noted that LCSH and TGM I are structured differently; LCSH is a subject heading list while TGM I is a true thesaurus. While this is an important distinction to understand, the group was uncertain as to the specific implications for practice. Both are standardized vocabularies and are applied in a similar fashion. In the last 15 years LCSH has become more thesaurus-like in standardizing cross-reference structure and describing narrower, broader, and related terms instead of see and see also references.&lt;br /&gt;&lt;br /&gt;Overall the discussion group thought that the existence of TGM has struck a reasonable balance between one big general vocabulary and lots of little specific ones. While TGM is specifically focused on graphic images, that is a big space and TGM can be applied in many ways. For a big image collection, a graphic materials-specific vocabulary is a great deal more useful than LCSH would be.  The group expected image cataloging (and TGM use) to continue to grow as libraries focus more and more on special collections.&lt;br /&gt;&lt;br /&gt;From here, the discussion moved on briefly to comparing the top-down design approach of TGM II with the bottom-up (literary warrant) approach of TGM I. A significant issue with the bottom-up approach was identified - that it is difficult and time-consuming to maintain a robust reference structure for a vocabulary that is constantly growing.&lt;br /&gt;&lt;br /&gt;The topic of whether or not to subdivide TGM was a main focus of this month's discussion. A participant noted that the precoordinated approach has its origins in the printed card catalog, where it was necessary. Now that we are in online systems, this approach can be rethought.The subdivided approach takes more time to apply (this was consensus but nobody knew of data to cite) and it's not possible to be as specific with geographic locations in subdivisions than it is with postcoordinated geographic headings. Postcoordinated approaches allow the user to decide which feature is of primary interest, rather than having one selected ahead of time. Subdivisions also introduce redundancy as the same subdivisions are often applied to many main headings. But are there cases when they would be different? Would a TGM I heading ever have a different time period subdivision than a TGM II heading on the same record? Perhaps in the case of a contemporary poster of a historic event? It would be more difficult to make this distinction in a postcoordinated approach. A potential benefit of a precoordinated (subdivided) approach is the creation of a browse index. This is achieved in a different way via faceted browsing with the postcoordinated approach. The group felt strongly that the most important goal was to produce a product that is easy and understandable for our usrs. More user studies are needed to learn more about this issue.&lt;br /&gt;&lt;br /&gt;The group then wondered what the literature on precoordinated vs. postcoordinated vocabularies looks like. Is there anything recent? Thomas Mann wrote recently on this topic, but no one was aware offhand of other recent work other than Lois Chan describing FAST.&lt;br /&gt;&lt;br /&gt;At this point, the discussion turned to the type of training that would be necessary for someone to effectively apply the TGM. For TGM II  (genre), the individual would need some level of  background with formats of graphic materials. But for topic, participants thought that the same training to perform subject analysis on textual works would apply to graphical works. For some image materials, it is necessary to become familiar with important buildings and people likely to be in the collection, for example buildings on the IU campus and IU presidents for photographs in the University Archives.&lt;br /&gt;&lt;br /&gt;The discussion wrapped up with thoughts on the lack of information inherent in the resource that helps with the cataloging process for graphic materials as opposed to textual materials. Generally images come with &lt;span style="font-style: italic;"&gt;something &lt;/span&gt;that helps identify the content and its origin. Given at least a small amount of information, a cataloger would apply the same type of research techniques, including those applied for authority work, that are already in place in many cataloging units. Image description could be portrayed as an extension of existing work rather than a departure.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-6795915849192699275?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/6795915849192699275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=6795915849192699275' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6795915849192699275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6795915849192699275'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/03/summary-of-mdg-session-2-12-09.html' title='Summary of MDG Session, 2-12-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-8895332614144780276</id><published>2009-02-02T20:28:00.000-05:00</published><updated>2009-02-02T22:08:01.330-05:00</updated><title type='text'>Summary of MDG Session, 1-29-09</title><content type='html'>Article read: Chris Freeland, Martin Kalfatovic, Jay Paige, and Marc Crozier. (December 2008). "Geocoding LCSH in the Biodiversity Heritage Library." The Code4Lib Journal 5. &lt;a href="http://journal.code4lib.org/articles/52"&gt;http://journal.code4lib.org/articles/52&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As with many MDG sessions, this one began with a discussion of unfamiliar terminology in the article read. This article contained a few technical terms that members were interested in hearing more about, likely due to the fact that the audience for the Code4Lib Journal is primarily programmers rather than catalogers or metadata specialists. One participant wondered where the term "folksonomy" came from. Nobody had an answer, although some thought it had been around a decade or so, and Clay Shirky's "&lt;a href="http://www.shirky.com/writings/ontology_overrated.html"&gt;Ontology is Overrated&lt;/a&gt;" was mentioned. (The Wikipedia &lt;a href="http://en.wikipedia.org/wiki/Folksonomy"&gt;article&lt;/a&gt; on folksonomy credits the term to Thomas Vander Wal in the early 2000's.)&lt;br /&gt;&lt;br /&gt;The substance of this month's discussion began by addressing the question: Would you catalog differently if you knew the data were to be used in this way? Participants noted that the burden is on the cataloger to verify and provide information that isn't immediately obvious from the resource itself. The limits of MARC/AACR2 practice (missing geographic headings in some cases) described in the article are very real – if the terms aren't there you can’t build this type of service. If you know the data is going to be used in this way then you make more of an effort to provide it. Participants repeated an often-heard comment about MARC cataloging - that populating the fixed fields takes a great deal of effort, but few systems use them. This discourages catalogers from populating them, which discourages system designers from using them... The current environment doesn't make it easy to justify doing the work to create the structured data that's really needed to provide advanced servcies.&lt;br /&gt;&lt;br /&gt;The conversation then turned to where geographic data to support a service like the one described in this article would be in a MARC record, if those records were created with this use in mind. One important point to note is that the level of specificity is different between the coded geographic values (043, country of publication in fixed fields) and what is present in LCSH subdivisions. The former are generally continent/country/state level, while the latter can be much more specific. Discussion of these fields led participants to note that these fields represent different things - the place something is published is of interest in different circumstances than the place something is &lt;span style="font-style: italic;"&gt;about&lt;/span&gt;. This represents one area (of many) where system designers need to have an in depth understanding of the data. Building a resource with more consistent geographic data (say, always at the state level) would alleviate some of the challenges described in this article, but leave out users who are interested in more granular information than an implementation like this could provide.&lt;br /&gt;&lt;br /&gt;Some participants advocated that to promote services like the one described in this article, one should use a vocabulary that's designed specifically for geographic access only for this purpose, such as the Getty &lt;a href="http://www.getty.edu/research/conducting_research/vocabularies/tgn/"&gt;Thesaurus of Geographic Names&lt;/a&gt; or &lt;a href="http://www1.nga.mil/ProductsServices/GeographicNames/Pages/default.aspx/html/"&gt;GeoNet&lt;/a&gt;. One advantage of these types of vocabularies is that they are based on "official" data of some sort (for example, the US government, the UN), whereas LCSH is based on literary warrant. LCSH therefore might not match up well with current and detailed places such as those one would ideally want for a resource map interface. Similarly, AACR2 treats some objects with geographic features (for example, buildings) as corporate bodies, which are subject to different rules for cross-references and the like.&lt;br /&gt;&lt;br /&gt;Participants noted that there have been significat successes in geographic and user-friendly access in the MARC/AACR2/LCSH stack, however. MARC records for newspapers have a 752 field with semi-structured data listing country-state-county-city. The terms used in this field come from the authority file. The 752 field represents an early example of a field existing in response to user discovery needs. Would it be possible for us to generate this data automatically for other types of resources?&lt;br /&gt;&lt;br /&gt;The conversation at this point moved to user behavior in general. A participant noted that at the&lt;br /&gt; recent PCC meeting at ALA Midwinter, Karen Calhoun gave a presenation describing OCLC's recent research on user behavior. Their conclusion is that delivery is becoming more important than discovery. Does this mean libraries should start changing our priorities?&lt;br /&gt;&lt;br /&gt;Different types of discovery were then briefly discussed, noting that one wants different things at different times. The subject search serves a different purpose than the keyword search. Especially for scholars, the former is useful for introductory and overview work. When delving deeper, looking for the obscure reference that will serve as a key piece of original research, the latter will be more useful. The "20% rule" for subject cataloging is one reason for this. Are tag clouds of subject headings therefore useful? Participants thought they were for some types of discovery. Other possibilities would be clouds of Table of Contents data and full text. All would have different uses, and for some the cloud presentation might be more effective than others.&lt;br /&gt;&lt;br /&gt;A significant proportion of the discussion in the second half of the session revolved around ways to integrate together different types of geographic access. The first topic on this theme was one of granularity - how specific should the geographic heading be? Why shouldn't we provide acces to a famous neighborhood in a big city? Using the structure of a robust geographic vocabulary as part of a discovery system would help with this issue.&lt;br /&gt;&lt;br /&gt;The changing of place names and agreed-upon boundaries over time was raised next. A place with a single name might have different bondaries over time. Political change is ongoing, and one place does not simply turn into another; maps are constantly reorganizing. Curated vocabularies such as LCSH and TGN take time to respond to these changes. Is it necessary to update older records when place names change? Participants settled on the standard answer: it depends. For resources such as biological specimens, current place names are likely to be more useful, to assist the researcher with understanding the relationships between them over time. For works about specific places, the place as it existed during the time described is more important.&lt;br /&gt;&lt;br /&gt;The next issue raised in the session was that geographic places don't exist in a strict hierarchy. National parks, rivers, and lakes, for example, aren’t within single states. LCSH headings exist for these, and for rivers can be separate for each state the river crosses. Participants were not certain if cross-references existed between the river name and the headings for all states it crossed, which would create a machine-readable link between the two.&lt;br /&gt;&lt;br /&gt;It was at this point that GIS technology as a solution came up. By defining everything as a polygon rather than a label with some classification of type ("state," "river," park"), geometry can be used to retrieve places relevant to a specific point. Effectively connecting all of these overlapping but not exclusive things in traditional library authority files would be a challenge. Many other geographic-type units could be used for retrieval, including zip codes, area codes, and congressional districts. These change over time as well, further coplicating the situation.&lt;br /&gt;&lt;br /&gt;The final issue raised in connection with geographic access was the notion of places being referred to with different names in different languages. Libraries are increasingly adding cross-references from multiple scripts and languages into authority files. This is a good thing, certainly. The lack of a 1:1 mapping from historic places makes this difficult. Even for the residents of a place, the dominant language changes over time and therefore the "official" name.&lt;br /&gt;&lt;br /&gt;The Virtual International Authority File is attempting to address this issue by linking together names for the same places from multiple national authority files. It's a bit unclear what the status of this project is, though. LC and OCLC consistently report progress but no clear indication of when it's going to become a production system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-8895332614144780276?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/8895332614144780276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=8895332614144780276' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/8895332614144780276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/8895332614144780276'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/02/summary-of-mdg-session-1-29-09.html' title='Summary of MDG Session, 1-29-09'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-5149889004898119134</id><published>2009-01-14T22:10:00.000-05:00</published><updated>2009-01-15T10:58:57.783-05:00</updated><title type='text'>Summary of MDG Session, 12-18-08</title><content type='html'>Article discussed: Kurth, Marty, David Ruddy, and Nathan Rupp. (2004). "Repurposing MARC metadata: using digital project experience to develop a metadata management design." &lt;span style="font-style: italic;"&gt;Library Hi Tech&lt;/span&gt; 22(2): 153-165.&lt;br /&gt;&lt;br /&gt;The discussion group felt that while it was desirable that the work described in this article was based on theoretical work on metadata management, the explanation of the metadata management theory, including the concept of enterprise, was not extensive enough to fully  understand the connection. It was clear, however, that to do management, you have to do mapping and transformation. Management allows you to rethink and retool. Our group was interested to know what has happened since this article was written. Have they put this into production? What has changed? It appears there is a follow-up article to this one that would be interesting to read.&lt;br /&gt;&lt;br /&gt;The article claims that MARC mapping work is representative of the metadata management task as a whole. Choosing metadata standards based on specific project needs is good, and the projects described here demonstrate how to do that. It's easy to imagine a project where you can start with MARC. But what do you do when no MARC already exists? At IU we have experience in many library departments wth projects that re-use existing MARC metadata.&lt;br /&gt;&lt;br /&gt;The group identified three possible cases for metadata management for a digital project: have existing MARC, have existing non-MARC, have no existing structured metadata. Are the strategies outlined in this article useful in all three of these cases? We didn't come to a strong conclusion on this issue.&lt;br /&gt;&lt;br /&gt;An interesting discussion grew up around the topic of how to deal with legacy (pre-AACR2) MARC records? Institutional memory is likely the best bet, as documentation comparing older practices and current ones is sparse. Politcal boundaries change and places of publication become no longer correct. Some legacy data is easier to deal with, however. An institution could use an authority vendor to update name headings with death dates. Yet certain data elements should be updated over time, but others shouldn’t. The group noted that most metadata work is bibliographic record based and doesn’t do enough with authority records. Making the full authority structure available to the metadata creation staff is sorely needed.&lt;br /&gt;&lt;br /&gt;A substantial amount of discussion time was spent on the topic of collection-specific mappings. The benefits of corse are that these get it done, the way you want it. The drawbacks are potentially reduced shareability and interoperability. One has to take the whole scope of the project in mind to make good decisions and worry about what’s really important. Have to keep USER in mind. This is difficult to do, though. We think “the user needs this information” but we should think “how can the user use this system?” One participant noted that we worry too much about the specialized discovery case to the detriment of the generalized one. How much tweaking of metadata mapping is of use? The community seems to swing back and forth over time between the generalized and specialized approaches.&lt;br /&gt;&lt;br /&gt;The discussion then turned more theoretical, with thoughts on the changing roles of libraries – specifically, to what degree should we be the intermediary? If the user is on his or her own, should this change the way we provide access to information? We do see a great deal of evidence that libraries have moved to a model where users interact directly with information with no active intermediation from us. The system provides the intermediation that staff once did. We expect better technologies to automatically enrich our records in the future to help with this. For us, participants felt it was more important to get &lt;span style="font-style: italic;"&gt;something&lt;/span&gt; out than to get it perfect.  We need to make a better effort to integrate authority control into non-MARC environments. Automated methods will rely on the authority records a great deal. It therefore follows that we should send less time on bibliographic records and more on authority work. The MARC world is certainly moving in this direction, with professional catalogers doing more high-value activity, leaving the lower-value tasks to machines or lower-level staff. Mapping activities are an example of the higher-value activity, as seen in this article.&lt;br /&gt;&lt;br /&gt;This article describes the most common transformation as MARC to simple DC. To make sure information gets into the right DC fields, one need to understand DC. Those doing the mapping must ask - what is the essential information to go in DC? What really identifies rather than just describes? The role of the cataloger would be to oversee the transformation process, to make sure it works correctly. This would need to happen both on the content end and the technical end.&lt;br /&gt;&lt;br /&gt;What should relationship of metadata staff to technical staff be? Metadata staff understand both the source and the target data. They would still have to correct things in the output in the end. It certainly helps if the technical staff understand the data as well. Similarly, metadata staff need to have technical skills. For metadata staff, understanding non-standard source data can be a big challenge. The Bradley films are an example of these challenges here at IU. Each set of materials will have different balance of effort spent on it, based on perceived importance and use. Mapping often unearths mistakes in the original metadata. We must get the best bang for our buck by spending more time on the information that’s really important for the users, and leave the rest alone. Effective projects will also need the involvement of collection development staff.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-5149889004898119134?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/5149889004898119134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=5149889004898119134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/5149889004898119134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/5149889004898119134'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/01/summary-of-mdg-session-12-18-08.html' title='Summary of MDG Session, 12-18-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-7090841926061921472</id><published>2009-01-14T19:18:00.000-05:00</published><updated>2009-01-14T22:10:30.733-05:00</updated><title type='text'>Summary of MDG Session, 11-19-08</title><content type='html'>Article read: Cundiff, Morgan V. (2004). "An Introduction to the Metadata Encoding and Transmission Standard (METS)." &lt;span style="font-style: italic;"&gt;Library Hi Tech&lt;/span&gt; 22(1): 52-64.&lt;br /&gt;&lt;br /&gt;The session began with a question raised: is allowing arbitrary descriptive and administrative metadata formats inside METS documents a good idea? The obvious advantage is that it makes METS very versatile. But this could also limit its scope – does that make METS only for digitized versions of physical things, excluding born digital material? The group as a whole didn't believe this was an inherent limitation. The ability to add authorized extension schema over time seems to be a good thing, and necessary for the external schema allowance to work.&lt;br /&gt;&lt;br /&gt;The flexibility of METS allows it to be used beyond its textual origins – to scores, sound recordings, images, etc. It could potentially be useful beyond libraries, especially to archives and museums. To balance this flexibility, is knowing &lt;span style="font-style: italic;"&gt;some&lt;/span&gt; sort of structured metadata is being presented enough to ensure a reasonable level of interoperability?&lt;br /&gt;&lt;br /&gt;The discussion then turned to the TYPE attribute on &amp;lt;div&amp;gt;, a topic much discussed in the METS community. How does a METS implementer know what values to use? An organization will presumably develop its own practice but the practices won’t be the same across institutions. A clever name for this was suggested: “plantation” metadata – each place can develop their own.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Are there lessons from library cataloging that could help with this problem? Institutions dealing with the same types of material could join together and harmonize practices. METS Profiles provide the means for documenting this, but they don’t really encourage collaboration. Perhaps  the expectation is that the metadata marketplace will converge, and those going their own way will lose out some significant benefits, and see it in their best interest to collaborate.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;This line of thought led to the question - How did OCLC/LC/the library community get standardized in the first place? Probably because individuals would write up their own rules, then share them. Eventually these rules became shared practice. Maybe this same shift will happen when sharing really becomes a priority. Diverse practices will converge when people really want them to.&lt;br /&gt;&lt;br /&gt;A question was then raised about when METS should be used instead of MARC. When is MARC not enough? A participant made the analogy that this was like comparing a plantation to a video arcade. The two are for different purposes, and METS can include descriptive metadata in any format, including MARC. If you want to allow a certain type of searching, for example, a user wants to search for a recording by a certain group, saying METS is better than MARC doesn't make sense. The descriptive metadata schema used within METS is what is going to make the difference in this case, not the use of METS itself. An implementer will still need good descriptive information.&lt;br /&gt;&lt;br /&gt;Participants then noted that we had been talking about systems, but we need to talk more about people. Conversations between communities with different practices will help improve interoperability. Can we standardize access points? To do this we would need to develop vocabularies collaboratively between communities, and talk more so that we understand each other’s point of view.&lt;br /&gt;&lt;br /&gt;One participant made an extremely astute observation that the structure of METS makes it seem that it wasn't designed to be used directly by people. While metadata specialists often need to look at METS, and plan for what METS produced by an institution should look like, the commenter is correct that for the most part, METS is intended for machine consumption. A developer present noted that we could write an application that does a lot of what METS does without actually storing it in XML/METS – but the benefit of METS is abstracting out one more layer. Coming full circle to the flexibility issue from earlier in the discussion, it was noted that it is difficult to make standard METS tools (including parsers and generators) due to the almost infinite practices that must be accommodated. This led to the thought that perhaps METS could go much farther in being machine-friendly than it already is. That's a scary thought to metadata specialists who work with it!&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-7090841926061921472?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/7090841926061921472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=7090841926061921472' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/7090841926061921472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/7090841926061921472'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2009/01/summary-of-mdg-session-11-19-08.html' title='Summary of MDG Session, 11-19-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-389820759720267967</id><published>2008-11-05T18:02:00.000-05:00</published><updated>2008-11-05T18:03:15.199-05:00</updated><title type='text'>Summary of MDG Session, 10-16-08</title><content type='html'>Article read: Eklund, Janice. (2007) "Herding Cats: CCO, XML, and the VRA Core." VRA Bulletin 34, no. 1: 45-68.&lt;br /&gt;&lt;br /&gt;The Discussion Group began by picking up a theme from the first meeting of the semester: effective use of terminology in writing about metadata. This article did a good job using new terms consistently and frequently, although terms from VRA Core 3 were occasionally applied to a discussion of Core 4. The discussion of consistency then expanded to consistency in metadata itself. Consistency is very useful when one is combining metadata from multiple sources, and content standards like CCO can go a long way towards promoting this consistency.&lt;br /&gt;&lt;br /&gt;The mention of CCO sparked a lively conversation about the way the word “standards” is tossed about in metadata circles. Is CCO a standard or not? CCO and VRA Core are not in total agreement, so what does it mean if both are standards we should follow? One can track why the difference exists—CCO has a broader scope, including museums, than VRA Core. CCO is a standard in the way AACR2 is a standard, but not in the way MARC is a standard. AACR2 is learned by practice, and less by reading the book. CCO is still evolving, taking time to learn and implement. It’s more a guide to best practice than AACR2 is. CCO is principle-based like RDA is supposed to be, because it needs to be applicable to many communities.&lt;br /&gt;&lt;br /&gt;The next topic of discussion was whether or not VRA Core is really “core.” Its greater coverage for works of art than Dublin Core certainly speaks to it being a domain-specific “core.” The group was less sure if it represented an “exhaustive core.” Tracking VRA Core’s history could be instructive in this analysis – the evolution from Core 2 to Core 3 to Core 4 shows some stabilization, so this could be evidence that they’ve achieved an agreed-upon core. The only really new thing in Core 4 is the collection root element (in addition to work and image).&lt;br /&gt;&lt;br /&gt;The linking capability of VRA Core was singled out as an especially effective part of the format, encouraging the use of identifiers to track relationships within text strings. There is not the infrastructure for collaborative development and sharing of authority records in the visual resources community that there is in the library community, so the process of record linking is more manual now in the VR environment than in the library/MARC community. But there is significant progress being made. The community needs to build good systems, and cooperate between institutions. They also need to expand the notion of authority control, to allow for more variety in name references, for example.&lt;br /&gt;&lt;br /&gt;Efforts such as CONA (Cultural Objects Name Authority, forthcoming from the Getty) and the &lt;a href="http://www.sah.org/index.php?src=gendocs&amp;amp;ref=AVRN&amp;amp;category=Main"&gt;Society of Architectural Historians Architectural Visual Resources Network&lt;/a&gt;  are helping to build the needed infrastructure. More cooperation overall is needed – the VR community and library community are both starting to realize that each of us having our own copies of records isn’t sustainable. Formats like VRA Core can promote fuller record sharing.&lt;br /&gt;&lt;br /&gt;Using separate fields for display and indexing was another feature of VRA Core of interest to the discussion group. It was noted that this practice allowed a great deal of flexibility but also required twice as much work. To decide when this is necessary, one must consider how the information will be used—for search or display? in future systems in addition to future ones? how easy will it be to upgrade systems? It’s more important to include both for data elements that represent key features of the work or medium, for example, cultural context.&lt;br /&gt;&lt;br /&gt;The discussion group noted that cultural objects cataloging could be a model for library catalogers looking to re-examine which aspects of their work require the attention of cataloging professionals. Cultural objects cataloging places a greater emphasis on analysis than transcription, which is necessary because cultural objects in general don’t explain themselves. Interestingly enough, some visual resources units are “outsourcing” subject indexing to traditional catalogers. Many catalogers on both sides don’t feel competent to do subjective indexing – is something “about death”? It’s much easier to record form/style, what something is rather than what it is about.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-389820759720267967?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/389820759720267967/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=389820759720267967' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/389820759720267967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/389820759720267967'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/11/summary-of-mdg-session-10-16-08.html' title='Summary of MDG Session, 10-16-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-2376314176050806497</id><published>2008-10-07T14:33:00.000-04:00</published><updated>2008-10-08T09:12:24.489-04:00</updated><title type='text'>Summary of MDG session, 9-30-08</title><content type='html'>Article discussed: Greenberg, Jane. (2005). "Understanding Metadata and Metadata Schemes." Cataloging &amp;amp; Classification Quarterly 40, no. 3/4: 17-36.&lt;br /&gt;&lt;br /&gt;The discussion began with a general question: Does the MODAL framework appear to be a useful way of evaluating metadata schemas?  The group in general thought it was, although expressed concern that some of the language in the article was very academic, which sometimes made it difficult for practicing librarians to follow the argument.&lt;br /&gt;&lt;br /&gt;Participants appreciated the fact that some metadata schema such as TEI (p. 28 of the article) have as a stated principle the conversion of resources to newer communication formats. This principle is of great benefit, and would be useful for other metadata schemas as well. Data formats will not stay static - our metadata must adapt its format over time to accommodate new ways of communicating.&lt;br /&gt;&lt;br /&gt;Some participants noticed a contrast between the design of metadata schemas based on experience and observation and library cataloging rules that are more formalized and change less frequently. This observation led to the question of whether cataloging rules &lt;span style="font-style: italic;"&gt;should&lt;/span&gt; be more fluid. When the rules do change, the changes are based on experience. From an implementation point of view, it is difficult both for libraries and our users if the rules are constantly changing. Our legacy data is a very real consideration here. So how do we be flexible and adaptable but at the same time consistent and keep up with the legacy data?&lt;br /&gt;&lt;br /&gt;The MODAL framework spoke to participants as an analysis tool - helping evaluate the fitness of a given schema for a given purpose. This gets us away from saying a metadata format is "bad" - rather it lets us say that records using the Dublin Core Metadata Element Set are not well-fit to handle FRBRized data, for example.&lt;br /&gt;&lt;br /&gt;The article's methodology of bringing in Cutter's objectives as an example of underlying objectives and principles sat well with the discussion group. One participant noted that not many current studies do this. These assumptions can help us focus our efforts. Follow up work could to do some comparison of Cutter's objectives to different metadata formats.&lt;br /&gt;&lt;br /&gt;Terminology issues were a hot topic of discussion at the session. Participants thought some kind of collaboratively-developed metadata glossary would be a good idea. They felt it was important for librarians interested in metadata issues to learn new vocabularies. We need to read more, ingest as much as possible, make connections to what we already do. “Cardinality” was an example of a term which was unfamiliar - it brings in the repeatable vs. not repeatable notion that is familiar, but also covers required/not required. Domains do have specialized vocabularies – they serve as “rites of passage” into various professions. Metadata schemes all have context that assumes a specific knowledge base – this article recognizes that. It would be nice if articles had glossaries, though.&lt;br /&gt;&lt;br /&gt;Even with discussion, definitions of some terms did not establish a clear consensus. The term “granularity” was defined in the group as "refinement," "the amount you want to analyze down to,” “extent of the description,” "specificity," and "granular means you can slice in different ways."&lt;br /&gt;&lt;br /&gt;Participants appreciated the empirical focus of the article, saying that metadata schema design should be observation/experiment based. It's certainly a good thing to have metadata be practical – actually useful. To help decide what metadata schema to use, try out a couple schemas and see how they work, rather than thinking more abstractly. But also need consider community as a factor. The MODAL framework is “multi-focal” – focusing first on one aspect then go to another. Helps implementers think, for example, about both the community and the data itself.&lt;br /&gt;&lt;br /&gt;Participants noted two schools of thought for metadata design: a difference of orientation thinking of a problem looking for a solution, as contrasted with a solution looking for a problem. Is there sill room for cataloger judgment? Absolutely. Perhaps cataloger's judgment is needed more in the application of a content standard rather than a structure standard.&lt;br /&gt;&lt;br /&gt;This distinction led participants to speculate whether the line between the two is blurring (although all recognized it has always been somewhat blurry). RDA especially seems to be trying to do both simultneously. One participant noted that libraries seem to be moving to blur the two, while other communities are moving to separate them more.&lt;br /&gt;&lt;br /&gt;Is terminology the only barrier to learning more about metadata? Some individuals learn better with theory and others with practice. All need a little of both. It really just takes time – remember what it was like to learn cataloging? Getting out of one's comfort zone is difficult. It’s also difficult to be adventurous, when there is less precedent to follow. It's hard to learn many standards – don’t always know which one to use. When you have to learn lots of things, you learn each of them less well. We also have new objectives, including reaching new people and operating in additional systems. It would be helpful to identify models of other institutions where a technical services unit has made significant progress in these areas.&lt;br /&gt;&lt;br /&gt;The group found Table 1, which outlines some typologies of metadata schemas, to be interesting. The lines between them seem arbitrary at worst and murky at best. Over time the thinking in this area has gone from 7 categories to 4 – does this mean our community is looking for simplicity? Does this mean this environment is settling down? Maybe, but initiatives such as the DCMI Abstract Model seem to be going the other direction.&lt;br /&gt;&lt;br /&gt;The discussion moved relatively seamlessly from topic to topic, and featured a number of insightful comments, often from new participants. Both nitty-gritty and "big picture" issues were raised. Thanks to all who participated for an enlightening discussion.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-2376314176050806497?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/2376314176050806497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=2376314176050806497' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/2376314176050806497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/2376314176050806497'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/10/summary-of-mdg-session-9-30-08.html' title='Summary of MDG session, 9-30-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-165020997090124930</id><published>2008-06-05T09:44:00.000-04:00</published><updated>2008-06-05T09:45:08.383-04:00</updated><title type='text'>Summary of MDG session, 5-27-08</title><content type='html'>Article discussed: Hagedorn, Kat, Suzanne Chapman, and David Newman. (July/August 2007) "Enhancing search and browse using automated clustering of subject metadata." D-Lib Magazine 13, no. 7/8. http://www.dlib.org/dlib/july07/hagedorn/07hagedorn.html&lt;br /&gt;&lt;br /&gt;The session began with a brief explanation of the methodology employed by this experiment and the OAI-PMH protocol, as it may not have been clear to those who don’t deal with this sort of technology on a regular basis. After this introduction, discussion moved to wondering why the Michigan “high-level browse list” was chosen for grouping clusters, rather than a more standard list? The group realized the value of a short, extremely general list for this purpose, and noted our own Libraries use a similar locally-developed list. Most standard library controlled vocabularies and classification schemes have far too many top terms to be effective for this sort of use. It was noted that choosing cluster labels, if not the high-level grouping, from a library standard controlled vocabulary would promote interoperability of this enhanced data.&lt;br /&gt;&lt;br /&gt;The question of quality control then arose: the article described on person performing a quality check on the cluster labels – this must have been an enormous task! The article mentioned mis-assigned categories that would have been found with a more formal quality review process. Have they thought about how they would fix things on the fly – features like “click here to tell us this is wrong”? Did the experiment designers talk to catalogers or faculty as part of the cluster labeling process? Who were their colleagues they asked to do the labeling?&lt;br /&gt;&lt;br /&gt;Is their proposal to not label the clusters at all, but to just connect to the high-level browse categories a good on? The group posited that the high-level browse used the campus structure of majors, rather than not organizational structure of the university. (This is the way the IU Libraries web site is structured). In this case, the subcategories more meaningful than main categories, so at least this level would likely be needed.&lt;br /&gt;&lt;br /&gt;The discussion group noted evidence of campus priorities in the high-level browse list, for example that the arts and humanities seemed to be under-represented and lumped together while the sciences received more specific attention. Did this make a difference in the clustering too? As noted in the article, titles in the humanities can be less straightforward than in other discipline, making greater use of metaphors. What do the science records have that humanities records don’t? Abstracts, probably – anything else? Perhaps it’s just that the titles were more specific. Do science subject headings contain more information? Description in humanities collections might be more varied than the language in sciences? Many possibilities were presented but the group wasn’t sure which would really affect the clustering methodology.&lt;br /&gt;&lt;br /&gt;The group then wondered if the humanities/sciences differences noted in this article would show up in a single institution, or was it just caused in OAIster because of the fact that different data providers tend to focus on one or the other and the difference is really between data providers rather than between disciplines. The group noted (as a gross generalization) that humanities tend to be more interested in time period, people, and places, whereas the sciences are more interested in topic.&lt;br /&gt;&lt;br /&gt;Would the clustering strategy work locally ad not just on aggregations? The suggestion in the article that results might improve if run just on one discipline at a time suggests it might. In this case, clusters would likely be more specific. Perhaps an individual institution could employ this method on full text, and leave running it on metadata records alone to the aggregators. It would be interesting to find out if there’s a difference in effectiveness of this methodology on metadata records for different formats, for example, image vs. text.&lt;br /&gt;&lt;br /&gt;The group noted the clustering technique would only be as good as the records from the original site. What if context were missing? (the “on the horse” problem) Garbage in, garbage out, as they say. We understood why the experiment only used English-language records, but it would be interesting to extend this.&lt;br /&gt;&lt;br /&gt;The clustering experiment was run using only the data from the title, subject, and description fields. Should they use more? Why not creator? This is useful information. Was it because clusters would then form around creators, which could be collocated using existing creator information? The stopword list was interesting to the group. It made sense why terms such as library and copyright were on it, but there are resources about these things, so we don’t want to artificially exclude them. What if the stopword list were not applied to the title field?&lt;br /&gt;&lt;br /&gt;The discussion group wondered how these techniques relate to those operating in the commercial world. Amazon uses “statistically improbable phrases” which seems to be the opposite of this technique – identifying terminology that’s different rather than the same between resources. What about studies comparing these automatic methods to user tagging? No participants knew of such a study in the library literature, but it was noted there might be information on this topic in the information retrieval literature. It would be interesting to compare data from this process to the tags from users generated as part of the LC Flickr project.&lt;br /&gt;&lt;br /&gt;The article described the overall approach as attempting to create simple interfaces to complex resources. Is this really our goal? We definitely want to collocate like resources. The interface in the screenshots didn’t seem “Google-style” simple. The group noted that in the library field many believe simple interfaces can only yield simple answers and that people looking with simple techniques are generally just looking for something rather than a comprehensive research goal. This article doesn’t have in its scope a discussion as to whether this is true. One big problem is that the article never defines its user base, ad different user bases employ different search techniques.&lt;br /&gt;&lt;br /&gt;The discussion group believed that browseability, as promoted by the clustering technique, is a key idea. With a good browse, the interface can provide more ways to get at resources, and then they are more findable. Hierarchical information can be a good way to get users to resources. With the experiment described in this article, the hierarchy is discipline/genre. Would retrieval improve if we pulled in other data from the record to do faceted browsing? Would this work better for humanities rather than science? Do we need to treat the disciplines differently?&lt;br /&gt;&lt;br /&gt;Discussion group participants noted that “this isn’t moonwalking,” meaning that this technique looks promising. It needs some tweaking, but the technique hasn’t promised the moon – it’s not purporting to be a be all, end all solution. Its just something we can do, as one part of the many other techniques we use. Can a simple, Google-style interface eventually work for intensive research needs on this data? Or should it? Should the search just lead them to a seminal article and then they citation chase from there? These are interesting questions.&lt;br /&gt;&lt;br /&gt;The group then wondered if the proposal to recluster only every few years was a good one. They would certainly need to do it when getting big new chunks of data that are dissimilar to what’s already in the repository. A possible method would be to randomly test once per month to see if clusters are working out well.&lt;br /&gt;&lt;br /&gt;The session ended with some more philosophical questions. Why should services like OAIster exist at all if Google can pick these resources up? Is this type of services beneficial for resources that will never get to the top of a Google search in their native environments? What would happen if one were to apply these techniques to a repository with a more resource-based rather than subject-based collection development policy?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-165020997090124930?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/165020997090124930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=165020997090124930' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/165020997090124930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/165020997090124930'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/06/summary-of-mdg-session-5-27-08.html' title='Summary of MDG session, 5-27-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-478356632240935611</id><published>2008-05-04T12:49:00.000-04:00</published><updated>2008-05-04T13:27:54.903-04:00</updated><title type='text'>Summary of MDG session, 4-22-08</title><content type='html'>The article for discussion this month was:&lt;br /&gt;&lt;br /&gt;Borbinha, José. (2004). "Authority control in the world of metadata." &lt;span style="font-style: italic;"&gt;Cataloging &amp;amp; Classification Quarterly&lt;/span&gt; 38(3/4): 105-116.&lt;br /&gt;&lt;br /&gt;The article provoked a lively discussion that centered largely around the future and functions of authority control.  It began by wondering what the tie-in to authority control in the article, as implied by the title, really was. The creator concept is very strong in the article, and authors are something we traditionally control in libraries, although archives treat creators differently. The explicit connection of the article to authority control is at the very end: it’s not so much about all using the same rules, but that we know what rules are being used. Control not as important as interoperability. Is this a good conclusion? It’s a practical one.&lt;br /&gt;&lt;br /&gt;The discussion in this article is most useful to practicioners, in that it helps us think about why we do authority control in the first place. Some concern was expressed about the very general statements being made in what was perceived as overly technical language.&lt;br /&gt;&lt;br /&gt;At this point, there was a bit of confusion in the room, as two participants realized they'd read the wrong article in preparation for the session. This article:&lt;br /&gt;&lt;br /&gt;Vitali, Stefano. "Authority Control of Creators and the Second Edition of ISAAR(CPF), International Standard Archival Authority Record for Corporate Bodies, Persons, and Families." &lt;span style="font-style: italic;"&gt;Cataloging &amp;amp; Classification Quarterly&lt;/span&gt; 38(3/4):185-199&lt;br /&gt;&lt;br /&gt;...is an interesting one, discussing in depth the motivations and methods for authority control in archives. It's well worth a read.&lt;br /&gt;&lt;br /&gt;That being settled, we returned to the Borbinha article. The effectiveness of crossing institituional borders was questioned– we don’t do this well but Amazon seems to. Perhaps we’re still too focused on our methodology, rather than the goal.&lt;br /&gt;&lt;br /&gt;The moderator asked if the conceptual vs. context, etc., perspective was useful. The group was uncertain on this issue, and the gulf between theory and practice emerged as a discussion topic. Practitioners mostly know the records one sees in the cataloging interface, and in this mode, the distinction between, say, structure standards and content standards, can be confusing. AACR/MARC/data entry system are all taught together – the distinction between them is not generally made in training or daily life. Practitioners tend to move through the learning curve with both integrated into one’s mind. So it’s hard to think that we can make an AACR record in Dublin Core - overall it's very hard to talk about one without the other. Most never see the under-the-hood record coding at all. But in some cases it is useful to  keep these distinctions in mind. What do we gain from thinking of them differently? It's likely not going to be effective to teach the conceptual first and then the practical.&lt;br /&gt;&lt;br /&gt;From public service perspective, these functions as shown in Figure 3 are a black box that searches go into and come out of. How useful are these distinctions to that community?&lt;br /&gt;&lt;br /&gt;Discussion then turned to the Fellini example in the paper– how would a system bring these together without authority control? Can we live with a system that isn’t perfect? Can we trust a secret and proprietary algorithm? What about the model of a human-generated Wikpedia page with a disambiguation step? Is it better to do the matching up of names ahead of time, or at search time?&lt;br /&gt;&lt;br /&gt;Can Google connect Mark Twain and Samuel Clemens, as well as just misspelling Mark? Our authority records handle forms of names found in items, Google handles common misspellings. Are these things different? Authority control serves lots of purposes: disambiguate same name, collocate same person with multiple names, etc. But there’s room for both approaches. Search systems could have an authority list running underneath. Google works differently, pulling data from the Web rather than from an authority file. Ranganathan’s principle: “every book its reader” – can we say every search term its hit? OCLC WorldCat – we can guess they’re employing both the authority-based search and Google-based methodologies? Maybe not, doesn’t seem like they’re stemming. Their searches seem to be very literal.&lt;br /&gt;&lt;br /&gt;The dual approach seems promising, as authority files have different purposes than the Google-style work. Maybe we could pull in data from 670 references (or from more structured data proposed by FRAD and showing up in RDA drafts)? Ask the user: “do you want the chemist or social scientist”?&lt;br /&gt;&lt;br /&gt;Heterogeneity is part of our life, as the article mentions. We simply have to deal with it. We should find small models that can deal with it, and build on those. What about heterogeneity of thesauri? Some cases it’s clear what thesaurus to use, in others it's not. When you use different vocabularies, it’s a barrier to interoperability – how do we overcome this? This is the tension between doing one thing well and using the same standards for everything but do each less well that has come up in our discussions before. Yet Google and Amazon aren’t worrying about this.  Google connects Twain and Clemens because somebody made the connection on the web page.&lt;br /&gt;&lt;br /&gt;This is one of the drivers behind the "OPAC sucks" movement – for the audience that just wants something, not everything. There's a mistmatch between this goal and the one OPACs are designed around. But maybe users actually want something good (not everything good). Our systems don’t have most useful stuff at the top. WorldCat tries to do this by ranking by holdings. We’re doing something wrong when the Director of Technical Services goes to Amazon to find a book because she can’t find it in or catalog!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-478356632240935611?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/478356632240935611/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=478356632240935611' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/478356632240935611'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/478356632240935611'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/05/summary-of-mdg-session-4-22-08.html' title='Summary of MDG session, 4-22-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-6879061267091895963</id><published>2008-03-27T09:51:00.000-04:00</published><updated>2008-03-27T15:48:18.902-04:00</updated><title type='text'>Summary of MDG session, 3-18-08</title><content type='html'>The article for discussion this month was:&lt;br /&gt;&lt;br /&gt;Yakel, Elizabeth, Seth Shaw, and Polly Reynolds. "Creating the Next Generation Archival Finding Aids." D-Lib Magazine 13, no. 5/6 (May/June 2007). Available from  http://www.dlib.org/dlib/may07/yakel/05yakel.html.&lt;br /&gt;&lt;br /&gt;Early on, the discussion focused around the predictability (or lack thereof) of EAD files. EAD as a markup language is designed to be flexible, for the encoding of many different types of finding aids. This means that any two EAD-encoded finding aids may not look very much alike. The potential of using a common controlled vocabulary across finding aids was envisioned as one way to tackle this fundamental unpredictability. The group expressed the idea that for sharing, broad subject headings are good, despite the claim of the article that these weren't adequate. However, within the local environment, the specific ones this article says were needed make sense.&lt;br /&gt;&lt;br /&gt;A large part of the group's discussion of this article worked through how better access could be provided to these materials with some reasonable level of expediency. While detailed analysis such as noting that a proverb appears within a story within a volume in the IU Folklore collection could be beneficial, it's unlikely we can afford to be this detailed. Respondents reported that it's often difficult to resist the urge to provide this detailed analysis, even though there is pressure to process collections quickly. One has to ask, how meaningful will the description be if I don’t go into more detail? One has to stop and think. General practice is to only pull out the "important" data, such as only some names rather than all of them. One participant noted that a recent article in the &lt;span style="font-style: italic;"&gt;American Archivist&lt;/span&gt; (Fall/Winter 2007, Vol. 70, No. 2), "Archives of the People, by the People, for the People," by  Max J. Evans discusses how one might get more mileage out of an EAD-encoded finding aid. [Note from after the meeting: this same volume has another article on the Polar Bear Expedition project which might address some of the issues the group was wishing were discussed in the article we read for this week. "Interaction in Virtual Archives: The Polar Bear Expedition Digital Collections Next Generation Finding Aid," by Magia Ghetu Krause and Elizabeth Yakel.]&lt;br /&gt;&lt;br /&gt;Some members of the group expressed interest in studying how keyword indexing of full text could be used to help add description for archival collections (although the group realized automatically generating transcriptions of scanned handwritten documents is currently not very feasible). The &lt;a href="http://www.columbia.edu/cu/libraries/inside/projects/climb/"&gt;CLiMB&lt;/a&gt; project at Columbia was noted as an example of how this technology might work. The possibility of capturing transcriptions from users was discussed.&lt;br /&gt;&lt;br /&gt;Participants noted the potential utility of user-supplied information, as these users often have a vested interest in and knowledge of the materials.&lt;br /&gt;&lt;br /&gt;The group wondered why the project staff was hesitant to include information from the database with data on soldiers, including birth dates, death dates, etc. The prevailing thought in the room was that if the catalog &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; include this short of information, it should - that this sort of information was not fundamentally out of scope of the "catalog."&lt;br /&gt;&lt;br /&gt;Participants noted several features of the Polar Bear Expeditions site that they believed had been implemented well, including providing coherence to a collection brought together by theme rather than by format, effective browsing (although it was noted the browse might be used more because the search feature was not very full-featured!), and the fact that the entire collection had been digitized rather than just highlights. Some drawbacks were mentioned as well, most notably the current lack of critical mass of user comments, and clear information on what it is that brings these various collections together.&lt;br /&gt;&lt;br /&gt;A "wish list" for more information on this project emerged, including specifics on the metadata implementation (e.g., what controlled vocabularies were used), and to what degree site features were developed in response to use cases and user studies. For example - the "visitor awareness" feature appears to be a way of getting users to talk to each other. The article didn't describe how this feature was determined to be a priority - was it implemented in response to a defined need or just because it was interesting? Participants also wanted more information on balancing this sort of functionality with user privacy issues, while recognizing that this sort of project can open users’ minds as to what is possible, allow us to get feedback from them, and to ask them what they want, while they’re using it.&lt;br /&gt;&lt;br /&gt;The challenges described in this article were disheartening to some participants, who felt that this project represents a best possible case, with all the material already digitized. The fact that there were still so many problems is a bit scary, as the mantra we've been hearing is that online materials were supposed to make this sort of thing much easier. Or are we just making these system too complex? Flickr seems to work, and it operates at a much simpler level. To what degree does the system need to reflect the complexity of the collections and the items within them?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-6879061267091895963?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/6879061267091895963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=6879061267091895963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6879061267091895963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/6879061267091895963'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/03/summary-of-mdg-session-3-18-08.html' title='Summary of MDG session, 3-18-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-3284825789742496981</id><published>2008-02-29T10:20:00.000-05:00</published><updated>2008-02-29T13:35:36.040-05:00</updated><title type='text'>Summary of MDG session, 2-26-08</title><content type='html'>The February 2008 meeting of the Metadata Discussion Group drew about 50 attendees. Thank you to everyone who continues to make this group a success.&lt;br /&gt;&lt;br /&gt;The article for discussion this month was:&lt;br /&gt;&lt;br /&gt;Chapman, John. "The Roles of the Metadata Librarian in a Research Library." Library Resources &amp;amp; Technical Services v. 51 no. 4 (October 2007).&lt;br /&gt;&lt;br /&gt;The discussion began examining what job responsibilities presented in the article represented entirely new tasks, which were slight evolutions from current practice, and which seem pretty much the same as duties of some current technical services positions. For the most part, the group felt that the four areas described (collaboration, research, education, development) were generally part of technical services responsibilities currently. Collaboration, especially with collection managers (Archive-It is an example of this at IU), was an area participants felt was already a strong part of technical services jobs, although expanding the scope to working directly with faculty might be necessary in the future.&lt;br /&gt;&lt;br /&gt;The area of development was thought to be the most different from "traditional" technical services positions, requiring a stronger need to think about the final form of access for materials being described. Technical services staff needed to deal with this in the early days of automation but since access hasn’t changed much since then there needs to be more thinking in this area. Staff will increasingly need to deal with different levels and types of metadata – some web presentable, some more internally-focused. The will need to work closely with technology-intensive positions (although they do this already with MARC data). Designing new platforms and interfaces is what’s new.&lt;br /&gt;&lt;br /&gt;The decision in this article to only look at positions within technical services may have been a practical one, but it does potentially introduce homogeneity into an inherently heterogeneous environment. Dealing with this heterogeneity is a key role of metadata staff. The MARC/AACR2 stack is well-tested, some of the newer ones aren’t. Metadata librarians will have to determine for each new set of materials which sets of standards to use. A major weak point now is that our mainstream cataloging system can only handle the one set of standards, so our users have to go to multiple places to access content. This heterogenity makes interoperability difficult. How do we allow things to be different when they need to be but not make them different just to be different? It’s fun to make up new things, but we have to be sustainable.&lt;br /&gt;&lt;br /&gt;A participant noted that a colleague from another university had observed a trend that in places where there is significant funding for digital library work, metadata tends to be a separate operation, outside of technical services, and  in places where there’s no funding, technical services is often asked to take non-MARC metadata work on themselves. Library organizational models are so fluid, it’s no surprise there are so many different models out there for metadata librarians and digital library work.&lt;br /&gt;&lt;br /&gt;Asking technical to do non-MARC metadata is a huge investment – it's asking already busy people to do more things. We also think we need higher-level salary lines for this planning work. But technical budgets are being cut. How do we deal with this? Don’t think of it as dumping more work on folks. Think of it as adapting to the world as it changes. It’s an exciting opportunity. Think outside the box – metadata work in acquisitions perhaps.&lt;br /&gt;&lt;br /&gt;Regardless of the reporting structure, the group felt a strong need to move to mainstream processes. We know enough about how to deal with many types of material, even with non-MARC metadata, to make it operationalized.&lt;br /&gt;&lt;br /&gt;A participant posed an interesting hypothetical situation: tomorrow we all came to work and all jobs with “cataloging” in the title changes to “metadata”. What would we need to make that happen? The first reaction to this proposal was that MARC is metadata so this could be true now. To expand into other types of metadata, would need training. More contact would be required with subject specialists to learn about needs these staff would not currently be aware of based on current standards. Staff would need a lot of support during the transition.&lt;br /&gt;&lt;br /&gt;One function of metadata is to organize information, another function is to make connections between things. This means subject specialties will be more important into the future.&lt;br /&gt;&lt;br /&gt;People think cataloging of online resources when you say metadata. But many definitions of metadata are broader, so the word is almost useless now in many cases. A view raised in discussion was that metadata facilitaties online discovery, whether the material is online or not. It's the long-held idea of metadata as a document surrogate.&lt;br /&gt;&lt;br /&gt;Although descriptive metadata is what's primarily being discussed, acquisitions departments may need to deal with other types of metadata, especially rights metadata. Flexible staff that can take on digital library type activities when other duties lull are likely to be needed. Libraries will need to continue to prepare individuals for new work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-3284825789742496981?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/3284825789742496981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=3284825789742496981' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3284825789742496981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/3284825789742496981'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/02/summary-of-mdg-session-2-26-08.html' title='Summary of MDG session, 2-26-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-453025372209463232</id><published>2008-01-30T14:46:00.000-05:00</published><updated>2008-01-30T15:38:30.293-05:00</updated><title type='text'>Summary of MDG session, 1-29-08</title><content type='html'>The second meeting of the Metadata Discussion Group was a lively session, with about 50 people in attendance. We're glad to have a new exit installed in the staff lounge so that we no longer have to count heads to keep under room capacity!&lt;br /&gt;&lt;br /&gt;The article for discussion in the January session was:&lt;br /&gt;&lt;br /&gt;Elings, Mary W. and Günter Waibel. "Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives and Museums." &lt;em&gt;First Monday&lt;/em&gt; 12, no. 3 (March 2007). &lt;span class="nobr"&gt;&lt;a href="http://www.firstmonday.org/issues/issue12_3/elings/index.html" rel="nofollow"&gt;http://www.firstmonday.org/issues/issue12_3/elings/index.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The discussion began surrounding to what degree the article successfully described the different perspectives of the library, archives, and museum communities. Some in the group thought it was a good introduction, but didn't make clear possible differences in broad vs. narrow scope or unique vs. non-unique materials. One participant noted that these institutions all have collection management needs in common.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Overall, the group felt that choosing standards based on material type rather than institution type made a great deal of sense. Visual materials were mentioned especially as needing a different approach than textual materials, and this participant postulated that this new way of thinking has been influenced heavily by the visual resources community, where content and structure standards developed side by side and influenced each other. Many questions about how to put this approach into practice were raised, however. Are we breaking user expectations by using different models for different types of materials? Some of our current categorizations of materials are hard to separate out by type - government documents, for example, are a mixture of archival-style things, visual materials, and texts, and a complete collection of the photographic work of one person would benefit from the context provided by archival description but would also benefit from in-depth item-level indexing.&lt;br /&gt;&lt;br /&gt;The group discussed for a time why things developed they way they have. One participant noted that the article (politely!) casts blame and says we insist on using the wrong tools because we’re used to them. There’s a reason we use the tools we have, because we have financial, administrative pressures to produce more. We care about interoperability, but we can’t afford it. Implementing a major shift in how we approach things has enormous financial implications.  No good models for institutions making this shift on a large scale, and our technological tools haven't caught up to this new way of thinking either.&lt;o:p&gt;&lt;/o:p&gt; We need to "expand our personal toolkits."&lt;br /&gt;&lt;br /&gt;The group spent a significant amount of time discussing issues of efficiency and streamlining the descriptive process. The Greene/Meissner report cited by the article was mentioned, as was RLG Programs' recent report:  &lt;a href="http://www.oclc.org/programs/publications/reports/2007-02.pdf" target="_blank"&gt;Shifting Gears: Gearing Up to Get Into the Flow&lt;/a&gt;. The disconnect between collection- or file-level description and item-level digitization was noted, and seemed problematic. Questions of efficiency led us to discuss user-contributed metadata, especially as seen in the recent &lt;a href="http://www.flickr.com/commons"&gt;LC experiment with Flickr&lt;/a&gt;. Participants felt this approach could shift some burden from us to our users, and increases exposure for our collections. But we still need to process collections, and do a great deal of work with them. We need to experiment – things will keep changing. Questions about the need for oversight of user-driven data were raised, with overall acknowledgment of the problem but no concrete solutions.&lt;br /&gt;&lt;br /&gt;Participants raised some specific questions with regards to the article:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A version of the chart in the conclusion adding visual resources as another category was distributed at a conference last summer. Is this simply another category or does it change the argument somewhat?&lt;/li&gt;&lt;li&gt;Could the&lt;o:p&gt;&lt;/o:p&gt; chart in the conclusion include MARC as data structure for each community?&lt;/li&gt;&lt;li&gt;Where does Dublin Core fit in all of this?&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;o:p&gt;&lt;/o:p&gt; The session ended on a somewhat philosophical note, with participants commenting that the shift in thinking proposed by this article reflects changes in society in general – more collaboration, themes emerging between groups are happening. The social definition of knowledge is changing. The group closed by noting interesting content on the web and remembering that the "interesting stuff" is why we do all of this in the first place.&lt;br /&gt;&lt;br /&gt;Our next session will be Tuesday, February 26. The article for discussion for that session will be distributed by February 12, Please send ideas for topics for future session to the MDG listserv, and feel free to use the &lt;a href="mailto:metadata-discuss-l@indiana.edu"&gt;listserv&lt;/a&gt; for discussion between in-person sessions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-453025372209463232?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/453025372209463232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=453025372209463232' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/453025372209463232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/453025372209463232'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2008/01/summary-of-mdg-session-1-29-08.html' title='Summary of MDG session, 1-29-08'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-5332366810281074481</id><published>2007-11-27T13:03:00.000-05:00</published><updated>2007-12-03T15:12:17.942-05:00</updated><title type='text'>Summary of MDG session, 11-27-07</title><content type='html'>The first meeting of the IUB Metadata Discussion Group seems to have been an unqualified success. Although we held the session in a room with the largest seating capacity in the Wells Library, we still had to turn some people away. My apologies go out to those who wanted to attend but were unable to because of space. By our next session, a second door should be installed in the room which will raise its legal maximum capacity. Thank you to all who attended, and who tried to.&lt;br /&gt;&lt;br /&gt;The article for discussion this month was:&lt;br /&gt;&lt;br /&gt;Gilliand, Anne J. "Setting the Stage." In Introduction to Metadata: Pathways to Digital Information, ed. Murtha Baca. Online edition, version 2.1. Available from http://www.getty.edu/research/conducting_research/standards/intrometadata/setting.pdf&lt;br /&gt;&lt;br /&gt;The group used the general principles outlined in the article to discuss the role of metadata in libraries and their technical services departments. Participants appreciated the breadth and high-level focus of the article, but expressed an interest in balancing this approach with more practical approaches in future meetings. The difficulty of describing the concept of "metadata" in any succinct way was noted by participants.&lt;br /&gt;&lt;br /&gt;Two features of the article were brought out in discussion: the thought of metadata as something that grows and changes over time, and the fact that "lay" metadata is important in addition to "expert" metadata.&lt;br /&gt;&lt;br /&gt;Regarding the continued accrual of metadata over the lifecycle of an object, the group discussed the potential effects on copy cataloging of this need, noted that WorldCat Local could play a part in this, and postulated that one of the roles of a technical services department could be the adding to of metadata records over time.&lt;br /&gt;&lt;br /&gt;The concepts of "Lay" vs. "Expert" metadata, not surprisingly, generated a good deal of discussion. No participant voiced the sometimes-heard opinion that metadata from lay sources such as users, publishers, etc. (including user reviews, sales data, tagging of images, etc.) had no place in the library environment, although several individuals cautioned that the metadata we maintain must support effective retrieval and that more uncontrolled metadata could threaten that goal. One participant voiced an opinion that one role of libraries is to supplement lay metadata with expert metadata, to help ensure authority, a sentiment that seemed to have general agreement.&lt;br /&gt;&lt;br /&gt;From this point, the discussions turned to the role of systems in providing services based on metadata. Participants felt that our systems needed to handle both factual, structured data like ISBNs and more fluid, organic, unstructured data like that our users can provide. It was noted that to provide high-quality services on these different types of metadata, our systems need to have *more* structure on the back end, rather than less. While the discussion didn't delve very far into specific metadata formats, there was a general sense that the data being recorded was more important than the format in which it was stored. One participant summarized this view as "I don’t need to have MARC, I need to have the specificity of MARC." The need for different approaches for different types of materials was raised, which led to a request for future MDG sessions to study in more depth these different approaches, the standards that emerge from them, and the communities behind them, and to discuss whether there is more these communities can do to work together. Participants also expressed interest in system design issues, allowing complex linking of records but still allowing them to make sense out of context.&lt;br /&gt;&lt;br /&gt;Throughout the discussion, possible roles for technical services staff in the metadata environment emerged. Most ides centered around creating and maintaining descriptive metadata, although a need was expressed for all involved in metadata creation to know about all types of metadata being created and how it is used. Possible roles for technical services staff included:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Recording relationships between information objects that are not possible to generate automatically. (For one initiative designed to help automatic recognition of relationships between objects, see http://www.openarchives.org/ore/)&lt;/li&gt;&lt;li&gt;Authority control, to allow more powerful discovery mechanisms&lt;/li&gt;&lt;li&gt;"Expert" metadata to supplement that from other sources&lt;/li&gt;&lt;li&gt;Describing hidden collections with no or inadequate existing descriptive metadata&lt;/li&gt;&lt;li&gt;Describing Web sites intended for archiving&lt;/li&gt;&lt;li&gt;Describing objects deposited into IU ScholarWorks&lt;/li&gt;&lt;li&gt;Targeted projects to enhance older metadata&lt;/li&gt;&lt;li&gt;Provide value-added content&lt;/li&gt;&lt;li&gt;Managing groups of records&lt;/li&gt;&lt;li&gt;Providing acquisition information to fund managers&lt;/li&gt;&lt;/ul&gt;We had a lively discussion, with many points of view raised. Our next session will be Tuesday, January 29. The article for discussion for that session will be distributed by January 15. Please send ideas for topics for future session to the MDG listserv, and feel free to use the &lt;a href="mailto:metadata-discuss-l@indiana.edu"&gt;listserv&lt;/a&gt; for discussion between in-person sessions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-5332366810281074481?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/5332366810281074481/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=5332366810281074481' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/5332366810281074481'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/5332366810281074481'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2007/11/summary-of-mdg-session-11-27-07.html' title='Summary of MDG session, 11-27-07'/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-4279353038878905588</id><published>2007-11-27T09:02:00.000-05:00</published><updated>2007-11-27T09:16:29.972-05:00</updated><title type='text'></title><content type='html'>The first meeting of the newly formed Metadata Discussion Group is coming up soon! Two weeks prior to each Metadata Discussion Group meeting, we will distribute an article on a metadata-related issue that all who plan to attend are encouraged to read. At the discussion group session, we will engage in informal conversation and analysis about the points raised in the article. Details regarding the November meeting and the article we will be discussing are below. We hope to see you there!&lt;br /&gt;&lt;br /&gt;Date: Nov. 27, 2007&lt;br /&gt;Time: 10-11 AM&lt;br /&gt;Place: Wells Library Staff Lounge, 3rd floor East tower&lt;br /&gt;Topic: What is metadata and why is it important to me?&lt;br /&gt;Article to discuss: Gilliand, Anne J. "Setting the Stage." In Introduction to Metadata: Pathways to Digital Information, ed. Murtha Baca. Online edition, version 2.1. Available from &lt;a href="http://www.getty.edu/research/conducting_research/standards/intrometadata/setting.pdf" &gt;http://www.getty.edu/research/conducting_research/standards/intrometadata/setting.pdf&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-4279353038878905588?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/4279353038878905588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=4279353038878905588' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/4279353038878905588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/4279353038878905588'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2007/11/first-meeting-of-newly-formed-metadata.html' title=''/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3563564247448844216.post-8029279135517800720</id><published>2007-11-27T09:00:00.000-05:00</published><updated>2007-11-27T09:02:30.720-05:00</updated><title type='text'></title><content type='html'>The IUB Libraries is starting a new "Metadata Discussion Group," which will meet monthly to discuss an article relating to a current metadata topic. The group has no formal "membership" – discussions are open to anyone from the Libraries, SLIS, or elsewhere on campus who is interested in the topic under discussion. Come join us for lively collaborative conversation about diverse topics in metadata, both broad and detailed. A citation for the article to be discussed will be distributed two weeks before each Metadata Discussion Group meeting, to give participants time to read articles prior to the group discussion.&lt;br /&gt;&lt;br /&gt;Metadata Discussion Group sessions are scheduled for the last Tuesday of the month, from 10-11 AM, in the Wells Library Staff Lounge, 3rd floor East tower. The first meeting will be November 27, 2007. The topic for this meeting will be "What is metadata and why is it important to me?" A citation to a specific article covering this topic will be distributed soon. At the November meeting we will also discuss possible future topics for Metadata Discussion group sessions.&lt;br /&gt;&lt;br /&gt;Announcements for Metadata Discussion Group meetings will be widely distributed; however, we have also created a dedicated email list for the group. This list will provide a forum for more detailed information about the group, and communication between those interested in its activities. To join the email list, send the following in the body of an message (not the subject) to listserv@indiana.edu:&lt;br /&gt;&lt;br /&gt;subscribe metadata-discuss-l&lt;br /&gt;&lt;br /&gt;The Metadata Discussion Group is sponsored by the IUB Libraries Technical Services Advisory Council (TSAC). Comments, questions, or ideas for discussion topics or articles can be sent to Jenn Riley, Metadata Librarian, at jenlrile@indiana.edu or the chair of the TSAC, Lynda Clendenning, at lfclende@indiana.edu.&lt;br /&gt;&lt;br /&gt;More information on the Metadata Discussion Group can be found at &lt;&lt;a href="http://www.dlib.indiana.edu/services/metadata/activities/mdg.shtml"&gt;http://www.dlib.indiana.edu/services/metadata/activities/mdg.shtml&lt;/a&gt;&gt;.&lt;br /&gt;&lt;br /&gt;Jenn Riley, Metadata Librarian&lt;br /&gt;Lynda Clendenning, Chair, Technical Services Advisory Council&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3563564247448844216-8029279135517800720?l=metadatadiscuss.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadatadiscuss.blogspot.com/feeds/8029279135517800720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3563564247448844216&amp;postID=8029279135517800720' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/8029279135517800720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3563564247448844216/posts/default/8029279135517800720'/><link rel='alternate' type='text/html' href='http://metadatadiscuss.blogspot.com/2007/11/iub-libraries-is-starting-new-metadata.html' title=''/><author><name>Jenn Riley</name><uri>http://www.blogger.com/profile/04435425141213679870</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
