skip to main content
10.1145/2537734.2537738acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

Managing short postings lists

Published: 05 December 2013 Publication History

Abstract

Previous work has examined space saving and throughput increasing techniques for long postings lists in an inverted file search engine. In this contribution we show that highly sporadic terms (terms that occur in 1 or 2 documents) are a high proportion of the unique terms in the collection and that these terms are seen in queries. The previously known space saving method of storing their short postings lists in the vocabulary is compared to storing in the postings file. We quantify the saving as about 6.5%, with no loss in precision, and suggest the adoption of this technique.

References

[1]
J. Allan, B. Carterette, J. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Million Query Track 2007 Overview. In TREC 2007.
[2]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-Space Ranking With Effective Early Termination. In SIGIR 2001, pages 35--42.
[3]
V. N. Anh and A. Moffat. Inverted Index Compression Using Word-Aligned Binary Codes. Information Retrieval, 8(1): 151--166, 2005.
[4]
V. N. Anh and A. Moffat. Improved Word-Aligned Binary Compression for Text Indexing. TKDE, 18(6): 857--861, 2006.
[5]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient Query Evaluation Using A Two-Level Retrieval Process. In CIKM 2003, pages 426--434.
[6]
X.-F. Jia, A. Trotman, and J. Holdsworth. Fast Search Engine Vocabulary Lookup. In ADCS 2011.
[7]
X.-F. Jia, A. Trotman, R. O'Keefe, and Z. Huang. Application-Specific Disk I/O Optimisation for a Search Engine. In PDCAT 2008, pages 399--404.
[8]
A. Moffat, J. Zobel, and R. Sacks-Davis. Memory Efficient Ranking. IP&M, 30(6): 733--744, 1994.
[9]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered Document Retrieval With Frequency-Sorted Indexes. JASIS, 47(10): 749--764, 1996.
[10]
A. Trotman. Compressing Inverted Files. Information Retrieval, 6(1): 5--19, 2003.
[11]
A. Trotman, X. Jia, and M. Crane. Towards an efficient and effective search engine. In SIGIR 2012 Workshop on Open Source Information Retrieval, pages 40--47.
[12]
A. Trotman and V. Subramanya. Sigma Encoded Inverted Files. In CIKM 2007, pages 983--986.
[13]
H. Turtle and J. Flood. Query Evaluation: Strategies And Optimizations. IP&M, 31(6): 831--850, 1995.
[14]
M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE 2006.

Cited By

View all
  • (2023)Efficient immediate-access dynamic indexingInformation Processing & Management10.1016/j.ipm.2022.10324860:3(103248)Online publication date: May-2023
  • (2017)Efficient In-Memory, List-Based Text InversionProceedings of the 22nd Australasian Document Computing Symposium10.1145/3166072.3166080(1-8)Online publication date: 7-Dec-2017
  • (2016)In Vacuo and In Situ Evaluation of SIMD CodecsProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015023(1-8)Online publication date: 5-Dec-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ADCS '13: Proceedings of the 18th Australasian Document Computing Symposium
December 2013
126 pages
ISBN:9781450325240
DOI:10.1145/2537734
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. efficiency
  2. indexing
  3. procrastination
  4. storage

Qualifiers

  • Research-article

Conference

ADCS '13
ADCS '13: The Australasian Document Computing Symposium
December 5 - 6, 2013
Queensland, Brisbane, Australia

Acceptance Rates

ADCS '13 Paper Acceptance Rate 12 of 23 submissions, 52%;
Overall Acceptance Rate 30 of 57 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient immediate-access dynamic indexingInformation Processing & Management10.1016/j.ipm.2022.10324860:3(103248)Online publication date: May-2023
  • (2017)Efficient In-Memory, List-Based Text InversionProceedings of the 22nd Australasian Document Computing Symposium10.1145/3166072.3166080(1-8)Online publication date: 7-Dec-2017
  • (2016)In Vacuo and In Situ Evaluation of SIMD CodecsProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015023(1-8)Online publication date: 5-Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media