skip to main content
10.1145/1321440.1321591acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Sigma encoded inverted files

Published: 06 November 2007 Publication History

Abstract

Compression of term frequency lists and very long document-id lists within an inverted file search engine are examined. Several compression schemes are compared including Elias γ and δ codes, Golomb Encoding, Variable Byte Encoding, and a class of word-based encoding schemes including Simple-9, Relative-10 and Carryover-12. It is shown that these compression methods are not well suited to compressing these kinds of lists of numbers. Of those tested, Carryover-12 is preferred because it is both effective at compression and fast at decompression.
A novel technique, Sigma Encoding prior to compression, is proposed and tested. Sigma Encoding utilizes a parameterized dictionary to reduce the number of bits necessary to store an integer. This method shows an about 0.3 bit per integer improvement over Carryover-12 while costing only about 3 extra clock cycles per integer to decompress.

References

[1]
Anh, V. N., & Moffat, A. (2002). Improved retrieval effectiveness through impact transformation. Australian Computer Science Communications, 24(2), 41--47.
[2]
Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval.
[3]
Anh, V. N., & Moffat, A. (2006). Structured index organizations for high-throughput text querying. In Proceedings of the 13th Int. Symp. String Processing and Information Retrieval, (pp. 304--315).
[4]
Scholer, F., Williams, H. E., Yiannis, J., & Zobel, J. (2002). Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th ACM SIGIR Conference on Information Retrieval, (pp. 222--229).
[5]
Trotman, A. (2003). Compressing inverted files. Information Retrieval, 6(1), 5--19.
[6]
Williams, H. E., & Zobel, J. (1999). Compressing integers for fast file access. Computer Journal, 42(3), 193--201.
[7]
Zobel, J., & Moffat, A. (1995). Adding compression to a full-text retrieval system. Software - Practice and Experience, 25(8), 891--903.

Cited By

View all
  • (2013)Managing short postings listsProceedings of the 18th Australasian Document Computing Symposium10.1145/2537734.2537738(113-116)Online publication date: 5-Dec-2013
  • (2010)Index compression using 64-bit wordsSoftware—Practice & Experience10.5555/1712666.171266840:2(131-147)Online publication date: 1-Feb-2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compression
  2. inverted files

Qualifiers

  • Poster

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Managing short postings listsProceedings of the 18th Australasian Document Computing Symposium10.1145/2537734.2537738(113-116)Online publication date: 5-Dec-2013
  • (2010)Index compression using 64-bit wordsSoftware—Practice & Experience10.5555/1712666.171266840:2(131-147)Online publication date: 1-Feb-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media