Library Cataloging

Using Wikipedia

Two new reports from HP Labs show interesting uses of Wikipedia in information management.

Boosting Inductive Transfer for Text Classification using Wikipedia by Somnath Banerjee. HPL-2008-42

Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1 corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. Publication Info: Published and presented at ICMLA 2007, the Sixth International Conference on Machine Learning and Applications (ICMLA'07), 13-15 Dec. 2007 Cincinnati, Ohio, USA

Clustering Short Texts using Wikipedia by Somnath Banerjee, Krishnan Ramanathan, and Ajay Gupta. HPL-2008-41

Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation. Publication Info: Published and presented at SIGIR 2007, the 30th Annual International ACM SIGIR Conference, 23-27 July 2007, Amsterdam, Netherlands

- Organizing Tags
Tag Clustering with Self Organizing Maps by Marco Luca Sbodio and Edwin Simpson is a recent HP Labs Technical Report.Today, user-generated tags are a common way of navigating and organizing collections of resources. However, their value is limited by...

- Document Summarization Using Wikipedia
Document Summarization using Wikipedia by Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, and Ajay Gupta is a recent HP Technical Report. It seems the small screens used by mobile devices are creating a demand for document summarization.Although...

- Algorithms For Clustering Tags
Clustering Tags in Enterprise and Web Folksonomies by Simpson, Edwin will be published and presented at the International Conference on Weblogs & Social Media, Seattle, March 31st, 2008 (HPL-2008-18 )Tags lack organizational structure limiting their...

- Clustering Tags
Simpson, Edwin has published HP technical report HPL-2007-190 Clustering Tags in Enterprise and Web FolksonomiesRecently there has been massive growth in the use of tags as a simple, flexible way to categorize resources. Tags are often used collaboratively...

- Web Logs
Web logs, what are they good for? Steven M. Cohen recently addressed the issue of why we write them, but why do we read them? In what instances do they work?Here are my views. First, they are one-to-many or a few-to-many format. Topics that require give...

Library Cataloging