Using Semantic Web Technologies to Collaboratively Collect and Share User-Generated Content in Order to Enrich the Presentation of Bibliographic Records?Development of a Prototype Based on RDF, D2RQ, Jena, SPARQL and WorldCat?s FRBRization Web Service
Ragnhild Holgersen, Michael Preminger, David Massey
In this article we present a prototype of a semantic web-based framework for collecting and sharing user-generated content (reviews, ratings, tags, etc.) across different libraries in order to enrich the presentation of bibliographic records. The user-generated data is remodeled into RDF, utilizing established linked data ontologies. This is done in a semi-automatic manner utilizing the Jena and the D2RQ-toolkits. For the remodeling, a SPARQL-construct statement is tailored for each data source.
In the data source used in our prototype, user-generated content is linked to the relevant books via their ISBN. By remodeling the data according to the FRBR model, and expanding the RDF graph with data returned by WorldCat?s FRBRization web service, we are able to greatly increase the number of entry points to each book. We make the social content available through a RESTful web service with ISBN as a parameter. The web service returns a graph of all user-generated data registered to any edition of the book in question in the RDF/XML format. Libraries using our framework would thus be able to present relevant social content in association with bibliographic records, even if they hold a different version of a book than the one that was originally accessed by users. Finally, we connect our RDF graph to the linked open data cloud through the use of Talis? openlibrary.org SPARQL endpoint.
GLIMIR: Manifestation and Content Clustering within WorldCat
Janifer Gatenby, Richard O. Greene, W. Michael Oskins, Gail Thornburg
The GLIMIR project at OCLC clusters and assigns an identifier to WorldCat records representing the same manifestation. These include parallel records in different languages (e.g., a record with English descriptive notes and subject headings and one for the same book with French equivalents). It also clusters records that probably represent the same manifestation, but which could not be safely merged by OCLC?s Duplicate Detection and Resolution (DDR) program for various reasons. As the project progressed, it became clear that it would also be useful to create content-based clusters for groups of manifestations that are generally equivalent from the end user perspective (e.g., the original print text with its microform, ebook and reprint versions, but not new editions). Lessons from the GLIMIR project have improved OCLC?s duplicate detection program through the introduction of new matching techniques. GLIMIR has also had unexpected benefits for OCLC?s FRBR algorithm by providing new methods for identifying outliers thus enabling more records to be included in the correct work cluster.