skip to main content
10.1145/1378889.1378891acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Enhancing digital libraries using missing content analysis

Published: 16 June 2008 Publication History

Abstract

This work shows how the content of a digital library can be enhanced to better satisfy its users' needs. Missing content is identified by finding missing content topics in the system's query log or in a pre-defined taxonomy of required knowledge. The collection is then enhanced with new relevant knowledge, which is extracted from external sources that satisfy those missing content topics. Experiments we conducted measure the precision of the system before and after content enhancement. The results demonstrate a significant improvement in the system effectiveness as a result of content enhancement and the superiority of the missing content enhancement policy over several other possible policies.

References

[1]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly analysis of a very large topically categorized web query log. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 321--328. ACM Press, 2004.
[2]
D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In ECDL '02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pages 91--106. Springer-Verlag, 2002.
[3]
G. Buchanan and A. Hinze. A generic alerting service for digital libraries. In JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 131--140. ACM Press, 2005.
[4]
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 390--397. ACM Press, 2006.
[5]
S. Chakrabarti, M. M. Joshi, K. Punera, and D. M. Pennock. The structure of broad topics on the web. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 251--262. ACM Press, 2002.
[6]
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 31(11-16):1623--1640, 1999.
[7]
Citeseer. http://citeseer.nj.nec.com.
[8]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306. ACM Press, 2002.
[9]
R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, Inc, New-York, USA, 2001.
[10]
WorldCat®. http://www.oclc.org/collectionanalysis.
[11]
L. Fitzpatrick and M. Dent. Automatic feedback using past queries: social searching? In SIGIR '97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 306--313. ACM Press, 1997.
[12]
D. Gruhl, D. N. Meredith, J. H. Pieper, A. Cozzi, and S. Dill. The web beyond popularity: a really simple system for web scale rss. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 183--192. ACM Press, 2006.
[13]
M. Lee. A knowledge network approach for building distributed digital libraries. In Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access, 6th International Conference on Asian Digital Libraries, ICADL 2003, Kuala Lumpur, Malaysia, December 8-12, 2003, pages 373--383, 2003.
[14]
J. Mothe and L. Tanguy. Linguistic features to predict query difficulty. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, 2005.
[15]
P. Over. TREC-7 interactive track report. In Text REtrieval Conference, pages 33--39, 1998.
[16]
S. Pandey and C. Olston. User-centric web crawling. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 401--411. ACM Press, 2005.
[17]
G. Pant, K. Tsioutsiouliklis, J. Johnson, and C. L. Giles. Panorama: extending digital libraries with topical crawlers. In JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 142--150. ACM Press, 2004.
[18]
V. Ramasubramanian, R. Peterson, and E. G. Sirer. Corona: A high performance publish-subscribe system for the world wide web. In NSDI '06: 3rd Symposium on Networked Systems Design and Implementation, 2006.
[19]
S. Spangler and J. Kreulen. Knowledge base maintenance using knowledge gap analysis. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 462--466. ACM Press, 2001.
[20]
V. Vinay, I. J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 398--404. ACM Press, 2006.
[21]
Wikipedia. http://www.wikipedia.org.
[22]
J. L. Wolf, M. S. Squillante, P. S. Yu, J. Sethuraman, and L. Ozsen. Optimal crawling strategies for web search engines. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 136--147. ACM Press, 2002.
[23]
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 512--519. ACM Press, 2005.
[24]
YouTube. http://www.youtube.com.
[25]
C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 10--17. ACM Press, 2003.
[26]
Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 567--574. ACM Press, 2006.
[27]
Z. Zhuang, R. Wagle, and C. L. Giles. What's there and what's not?: focused crawling for missing documents in digital libraries. In JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 301--310. ACM Press, 2005.

Cited By

View all
  • (2024)Query Performance Prediction: Techniques and Applications in Modern Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698438(291-294)Online publication date: 8-Dec-2024
  • (2024)Query Performance Prediction: From Fundamentals to Advanced TechniquesAdvances in Information Retrieval10.1007/978-3-031-56069-9_51(381-388)Online publication date: 23-Mar-2024
  • (2022)Estimating the Query Difficulty for Information RetrievalundefinedOnline publication date: 10-Mar-2022
  • Show More Cited By

Index Terms

  1. Enhancing digital libraries using missing content analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    JCDL '08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
    June 2008
    490 pages
    ISBN:9781595939982
    DOI:10.1145/1378889
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. content analysis
    2. crawling policy
    3. query difficulty

    Qualifiers

    • Research-article

    Conference

    JCDL08
    JCDL08: Joint Conference on Digital Libraries
    June 16 - 20, 2008
    PA, Pittsburgh PA, USA

    Acceptance Rates

    JCDL '08 Paper Acceptance Rate 33 of 117 submissions, 28%;
    Overall Acceptance Rate 415 of 1,482 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Query Performance Prediction: Techniques and Applications in Modern Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698438(291-294)Online publication date: 8-Dec-2024
    • (2024)Query Performance Prediction: From Fundamentals to Advanced TechniquesAdvances in Information Retrieval10.1007/978-3-031-56069-9_51(381-388)Online publication date: 23-Mar-2024
    • (2022)Estimating the Query Difficulty for Information RetrievalundefinedOnline publication date: 10-Mar-2022
    • (2021)A Comparative Studies of Automatic Query Formulation in Full-Text Database Search of Chinese Digital HumanitiesDiversity, Divergence, Dialogue10.1007/978-3-030-71292-1_35(457-468)Online publication date: 17-Mar-2021
    • (2019)An Approach for Focused Crawler to Harvest Digital Academic Documents in Online Digital LibrariesInternational Journal of Information Retrieval Research10.4018/IJIRR.20190701039:3(23-47)Online publication date: Jul-2019
    • (2014)Who and what links to the Internet ArchiveInternational Journal on Digital Libraries10.1007/s00799-014-0111-514:3-4(101-115)Online publication date: 1-Aug-2014
    • (2013)The Effect of Social and Physical Detachment on Information NeedACM Transactions on Information Systems10.1145/2414782.241478631:1(1-19)Online publication date: 1-Jan-2013
    • (2013)Who and What Links to the Internet ArchiveResearch and Advanced Technology for Digital Libraries10.1007/978-3-642-40501-3_35(346-357)Online publication date: 2013
    • (2012)Investigating the Performance of Cosine Value and Jensen-Shannon Divergence in the kNN AlgorithmAdvanced Materials Research10.4028/www.scientific.net/AMR.532-533.1455532-533(1455-1459)Online publication date: Jun-2012
    • (2012)Query performance prediction for IRProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348540(1196-1197)Online publication date: 12-Aug-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media