skip to main content
10.1145/1526709.1526891acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

User-centric content freshness metrics for search engines

Published: 20 April 2009 Publication History

Abstract

In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence, it is vital for a production search engine continually to monitor and improve repository freshness. Most previous freshness metrics, formulated in the context of developing better synchronization policies, focused on the web crawler while ignoring other parts of a search engine. But, the freshness of documents in a web crawler does not necessarily translate directly into the freshness of search results as seen by users. We propose metrics for measuring freshness from a user's perspective, which take into account the latency between when documents are crawled and when they are viewed by users, as well as the variation in user click and view frequency among different documents. We also describe a practical implementation of these metrics that were used in a production search engine.

References

[1]
J. Cho and H. Garcia-Molina. Synchronizing a database to improve freshness. SIGMOD Rec., 29(2):117--128, 2000.
[2]
J. Han, N. Cercone, and X. Hu. A weighted freshness metric for maintaining search engine local repository. In Proc. of Int. Conf. on Web Intelligence (WI), pp. 677--680. IEEE, 2004.
[3]
Q. Tan, P. Mitra, and C. L. Giles. Designing clustering-based web crawling policies for search engine crawlers. In Proc. of Conf. on Info. and Knowledge Management (CIKM), pp. 535--544. ACM, 2007.

Index Terms

  1. User-centric content freshness metrics for search engines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '09: Proceedings of the 18th international conference on World wide web
    April 2009
    1280 pages
    ISBN:9781605584874
    DOI:10.1145/1526709

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crawling
    2. document age
    3. freshness
    4. latency
    5. metrics
    6. monitoring
    7. search engine

    Qualifiers

    • Poster

    Conference

    WWW '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 296
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media