Library Cataloging

Identifiers and Subject Access

A while back I posted a criticism of David Weinberger's piece in the Boston Globe. He was kind enough to respond. Since many folks might miss the comments, I'm reposting them here.

Here's what I was trying to say, in a highly-compressed article.

Of course subject headings let us classify objects in more than one way. But the number of subject headings under which an object can fall is limited by the physical constraints of card catalogs and books. Further, the physical world requires us to shelve books in one spot and not another. (Multiple copies can be shelved in multiple spots, but that gets messy fast.) So, if we want a collection through which users can roam, we have to make a decision about the primary subject area within which the book will be physically shelved, and then a limited number of other subheadings under which it can be classified (with some number of see-also's). The limit (ten for the LoC, for example) is based not on the number of subject headings that might be relevant but on the awkwardness of physical material.

Digitizing the content as well as the metadata not only removes the limitation, it also allows for richer ways of identifying books one might want to read. Subjects, author and title are obvious ways we want to find books, but there are many more relationships that are useful for locating books we know or don't yet know we want to read. Cf. Amazon for a commercially-inspired -- and plain old inspired -- example.

But, to enable these richer ways of finding books, we need identifiers. IMO (and it's an uncertain opinion), semantics-free global unique IDs are the best choice. The minimal semantics and prevalence of ISBNs make them a good candidate, although there are some obvious problems with them (e.g., they only started in the 1960s). In any case, there's no reason to stick with a single set of GUIDs because computers are good at coordinating multiple sets of related data. So bring on the multiple ID schemes! (I hope Google Print publishes whatever ID's its using internally.)

That's what my piece in the Globe intended to say. If it led readers to a different understanding, then I wrote it badly.

Libraries provide many more access points than authors, titles and subjects. Format, genre, geographic codes, publisher numbers, time codes, keywords, and dates of publication or content all spring readily to mind. The bibliographic record in a library catalog is a very rich source of metadata. How easy it is to access that richness is another story. Collocation by many different facets is possible with the current metadata. Users can roam through the search results as easily as through digital collections.

Due to concerns about patron privacy we have not implemented recommendation systems. I think we could do so and still protect an individual's personal data. I think we will move in that direction in the next few years.

Identifiers are a problem. There will, as you suggest, have to be many. There already are. Many records in a library catalog will contain an ISBN, EAN and UPC. Many other standard identifiers can be included in a bibliographic record.

A greater problem is what do the identifiers identify. If I'm looking for Hamlet do I want a particular format, or edition? Would a book on CD do or a large print, or a film do, or do I require the Everyman's edition with a particular introduction? ISBNs are acceptable for identifying a particular manifestation. Searching for a expression or all manifestations of a work is a problem. OCLC has the xISBN service that collects all other ISBNs for a work and allows searching by all of them. That helps somewhat, it is not a good long-term solution. Librarians are working on an identifier for works. Parts of a work will also need to have identifiers, maybe standard citations would work. The OpenURL is a possible solution since it uses citation data. The Functional Requirements for Bibliographic Records (FRBR) will be useful in pulling together all the different manifestations of a work and differentiating among them.

Folksonomies, trackbacks, reader's comments will all enrich access to materials in the library (either physical or digital) in the not too distant future. RSS allows distribution of new item lists and other information from libraries. This is already being done and will become more widespread.

Identifiers
Subjects

- Xoclcnum
A new service from OCLC.I'd like to announce and invite you to try xOCLCnum, the latest in the xIdentifier family of Web services from OCLC. Just as xISBN allows you to find all related editions of a book by entering its ISBN, xOCLCnum does the same...

- Frbr
The latest issue of D-Lib Magazine has the paper Hierarchical Catalog Records: Implementing a FRBR Catalog by David Mimno, Gregory Crane and Alison JonesMuch work has gone into finding ways to infer FRBR relationships between existing catalog records...

- Persistent Digital Object Identifiers
An open source release of a tool for generating unique persistent digital object names and other identifiers has been made. The tool, called "noid" (nice opaque identifier), can be used as a major piece of an overall identifier strategy no matter which...

- Serials
Antelman, Kristin (2004) Identifying the Serial Work as a Bibliographic Entity. Library Resources & Technical Services 48(4):pp. 238-255.A solid theoretical foundation has been built over the years exploring the bibliographic work and in developing cataloging...

- Persistence On The Web
On the DC-General mail list there is a discussion going on about persistence identifiers for Web resources. I've always thought the OCLC PURL resolver was an elegant solution and wondered why it was not more widely used. There also exists CNRI's...

Library Cataloging