skip to main content
10.1145/1321440.1321547acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Effective top-k computation in retrieving structured documents with term-proximity support

Published: 06 November 2007 Publication History

Abstract

Modern web search engines are expected to return top-k results efficiently given a query. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignore some especially important factors in ranking functions, e.g. term proximity (the distance relationship between query terms in a document). The inclusion of term proximity breaks the monotonicity of ranking functions and therefore leads to additional challenges for efficient query processing. This paper studies the performance of some existing top-k computation approaches using term-proximity-enabled ranking functions. Our investigation demonstrates that, when term proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. According to our analysis and experimental results, we propose two index structures and their corresponding index pruning strategies: Structured and Hybrid, which performs much better on the new settings. Moreover, the efficiency of index building and maintenance would not be affected too much with the two approaches.

References

[1]
V. N. Anh and A. Moffat. Pruned Query Evaluation Using Pre-Computed Impacts. Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 372--379, Seattle, WA, August 2006. ACM Press, Seattle.
[2]
V. N. Anh and A. Moffat. Pruning Strategies for Mixed-Mode Querying. In Proc. Of 2006 CIKM Int. Conf. on Information and Knowledge Management (CIKM 2006).
[3]
V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (SIGIR 2005). Salvador, Brazil, August 15 - 19, 2005.
[4]
S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual (Web) Search Engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
[5]
C. Buckley and A. Lewit. Optimization of inverted vector searches. In Proc. 8th Annual SIGIR conf, pages 97--110, 1985.
[6]
M. Burrows. Method for Parsing, Indexing and Searching World-Wide-Web Pages. US Patent 5,864,863, 1999.
[7]
S. Büttcher and C. Clarke. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM international Conference on information and Knowledge Management CIKM '06. ACM Press, New York, NY, 182--189
[8]
D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, 2001.
[9]
R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. In Proc. of the Twentieth ACM Symposium on Principles of Database Systems, 2001.
[10]
R. Fagin. Combining Fuzzy Information from Multiple Systems. In Proc. of the Fifteenth ACM Symposium on Principles of Database Systems, 1996.
[11]
R. Fagin. Combining Fuzzy Information: an Overview. SIGMOD Record, 31(2): 109--118, June 2002.
[12]
Google: http://www.google.com.
[13]
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8): 581--589, August 1990.
[14]
R. Kaushik, R. Krishnamurthy, J. Naughton and R. Ramakrishnan. On the Integration of Structure Indexes and Inverted Lists. SIGMOD 2004.
[15]
X. Long and T. Suel. Optimized Query Execution in Large Search Engines with Global Page Ordering. In Proc. of the 29th Int. Conf. on Very Large Data Bases, September 2003.
[16]
A. Marian et al. Adaptive processing of Top-k queries in XML. In ICDE 2005.
[17]
Microsoft Live Search: http://www.live.com.
[18]
A. Moffat and J. Zobel. Self-Indexing Inverted Files for Fast Text Retrieval. TOIS 14(4), 1996.
[19]
E. de Moura et al. Improving Web Search Efficiency via a Locality Based Static Pruning Method. In Proceedings of the 14th international conference on World Wide Web, pages 235--244, 2005.
[20]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered Document Retrieval with Frequency-Sorted Indexes. Journal of the American Scociety for Information Science, 47(10): 749--764, May 1996.
[21]
Y. Rasolofo and J. Savoy. Term Proximity Scoring for Keyword-based Retrieval Systems. In Proceedings of the 25th European Conference on IR Research (ECIR 2003) pages 207--218, April 2003.
[22]
S. E. Robertson, S. Walker, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC-3, November 1994.
[23]
R. Song, J. Wen, S. Shi, G. Xin, Ti. Liu, et al. Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004. In 2004 Text REtrieval Conference.
[24]
M. Theobald, G. Weikum, and R. Schenkel. Top-k Query Evaluation with Probabilistic Guarantees. In Proceedings of the 30th VLDB conference, Toronto, Canada, 2004.
[25]
TREC home page: http://trec.nist.gov.
[26]
H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995.
[27]
Yahoo Search: http://search.yahoo.com.
[28]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys. Vol. 38, No 2, Jul. 2006.

Cited By

View all
  • (2017)Top-k Term-Proximity in Succinct SpaceAlgorithmica10.1007/s00453-016-0167-278:2(379-393)Online publication date: 1-Jun-2017
  • (2016)Fast First-Phase Candidate Generation for Cascading RankersProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911515(295-304)Online publication date: 7-Jul-2016
  • (2014)Efficient instant-fuzzy search with proximity ranking2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816662(328-339)Online publication date: Mar-2014
  • Show More Cited By

Index Terms

  1. Effective top-k computation in retrieving structured documents with term-proximity support

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. document structure
      2. dynamic index pruning
      3. hybrid index structure
      4. term proximity
      5. top-k

      Qualifiers

      • Research-article

      Conference

      CIKM07

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Top-k Term-Proximity in Succinct SpaceAlgorithmica10.1007/s00453-016-0167-278:2(379-393)Online publication date: 1-Jun-2017
      • (2016)Fast First-Phase Candidate Generation for Cascading RankersProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911515(295-304)Online publication date: 7-Jul-2016
      • (2014)Efficient instant-fuzzy search with proximity ranking2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816662(328-339)Online publication date: Mar-2014
      • (2014)Top-$$k$$ Term-Proximity in Succinct SpaceAlgorithms and Computation10.1007/978-3-319-13075-0_14(169-180)Online publication date: 8-Nov-2014
      • (2013)Two-Stage learning to rank for information retrievalProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_36(423-434)Online publication date: 24-Mar-2013
      • (2012)Optimized top-k processing with global page scores on block-max indexesProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124346(423-432)Online publication date: 8-Feb-2012
      • (2012)High-performance processing of text queries with tunable pruned term and term pair indexesACM Transactions on Information Systems10.1145/2094072.209407730:1(1-32)Online publication date: 6-Mar-2012
      • (2012)Efficient processing of top-k queries: selective NRA algorithmsJournal of Intelligent Information Systems10.1007/s10844-012-0208-539:3(687-710)Online publication date: 27-May-2012
      • (2011)Upper-bound approximations for dynamic pruningACM Transactions on Information Systems10.1145/2037661.203766229:4(1-28)Online publication date: 8-Dec-2011
      • (2010)Efficient term proximity search with term-pair indexesProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871593(1229-1238)Online publication date: 26-Oct-2010
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media