OAI Capture by Search Engines
Library Cataloging

OAI Capture by Search Engines


Search Engine Coverage of the OAI-PMH Corpus by Frank McCown, Xiaoming Liu, Michael L. Nelson, and Mohammad Zubair has been submitted to IEEE Internet Computing.
The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.
OAI




- Database Of Databases
The Internet Search Environment Number (ISEN) intends to catalog catalogs and other databases.You know how the ISBN is assigned to books. Over 1 million books are assigned ISBNs each year. What ISEN plans to do is emulate that system for databases. We...

- Blogging The Catalog
Library 2.0 in the Real World by Jenny Levine points to some very interesting work.The prototype is built on the WordPress open source, blogging platform, which gives it some very interesting features. For example, every record in the catalog gets its...

- Dp9
Not new, but something I've not mentioned before is DP9.DP9 is a gateway service that enables indexing of an OAI data provider by an Internet search engine. DP9 does this by providing a persistent URL for repository records, and converting this to...

- Metadata For Web Pages
Matthew Eberle at Library Techlog asked if I knew of any search engines that used Dublin Core metadata. The answer is yes and no. The regular search engines we all use can not make use of it, or very limited use. Search Engine Watch has details on how...

- Cataloging & Search Engines
Here is an interesting article comparing search engines and OPACs. Before the comparison, there is much discussion of the purpose of the catalog and how it is achieved. "On the Theory of Library Catalogs and Search Engines" by B. Eversberg....



Library Cataloging








.