skip to main content
10.1145/1321440.1321545acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Efficient on-line index maintenance for dynamic text collections by using dynamic balancing tree

Published: 06 November 2007 Publication History

Abstract

Previous on-line index maintenance strategies are mainly designed for document insertions without considering document deletions. In a truly dynamic search environment, however, documents may be added to and removed from the collection at any point in time. In this paper, we examine issues of on-line index maintenance with support for instantaneous document deletions and insertions. We present a DBT Merge strategy that can dynamically adjust the sequence of sub-index merge operations during index construction, and offers better query processing performance than previous methods, while providing an equivalent level of index maintenance performance when document insertions and deletions exist in parallel. Using experiments on 426 GB of web data we demonstrate the efficiency of our method in practice, showing that on-line index construction for dynamic text collections can be performed efficiently and almost as fast as for growing text collections.

References

[1]
S. Brin and L. Page. The anatomy of large-scale hypertextual web search engine. In Proc. 7th International WWW Conference, pages 107--117, 1998.
[2]
A. Broder, D. Carmel, M. Herscovichi, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In In Twelfth International Conference on Information and Knowledge Management (CIKM 2003), pages 426--434. New Orleans, LA, USA, November 2003.
[3]
S. Büttcher, C. L. A. Clarke, and B. Lushman. Hybrid index maintenance for growing text collections. In Proceedings of the 29th ACM Conference on Research and Development on Information Retrieval (SIGIR 2006). Seattle, USA, August 2006.
[4]
S. Büttcher and C. L. A. Clarke. Indexing time vs. query time trade-offs in dynamic information retrieval systems. In Proceedings of the 14th ACM Conference on Information and Knowledge Management (CIKM 2005). Bremen, Germany, November 2005.
[5]
T. Chiueh and L. Huang. Efficient real-time index updates in text retrieval systems. Technical report, Stony Brook, August 1998.
[6]
S. Heinz and J. Zobel. Efficient single-pass index construction for text database. Jour. Of the American Society for Information Science and Technology, 54(8):731--729, 2003.
[7]
M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. acm transactions on information systems. ACM Transactions on Information Systems, 17(4):406--439,October 1999.
[8]
N. Lester, A. Moffat, and J. Zobel. Fast on-line index construction by geometric partitioning. In Proceedings of the ACM CIKM Conference on Information and Knowledge Management, pages 776--783, November 2005.
[9]
N. Lester, J. Zobel, and H. Williams. Efficient online index maintenance for text retrieval systems. Information Processing and Management, 42(4):916--933, July 2006.
[10]
W. Shieh and C. Chung. A statistics-based approach to incrementally update inverted files. In Proc. Int. Conf. on Information and Knowledge Engineering, pages 38--43. Las Vegas, Nevada, June 2003.
[11]
K. Shoens, A. Tomasic, and H. Garcia-Molina. Synthetic workload performance analysis of incremental updates. In Proc. International ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 329--338. Dublin, Ireland, July 1994.
[12]
A. Tomasic, H. Garcia-Molina, and K. Shoens. Incremental updates of inverted lists for text document retrieval. In Proceedings of the 1994 ACM SIGMOD Conference on Management of Data, pages 289--300. New York, USA, 1994.
[13]
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images, second edition. Morgan Kaufmann Publishing, San Francisco, California, 1999.
[14]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), June 2006.
[15]
J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998.

Cited By

View all
  • (2016)Efficient index updates for mixed update and query loads2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840697(984-991)Online publication date: Dec-2016
  • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
  • (2014)Incremental Text Indexing for Fast Disk-Based SearchACM Transactions on the Web10.1145/25608008:3(1-31)Online publication date: 8-Jul-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic text collections
  2. garbage collection
  3. index maintenance
  4. information retrieval
  5. on-line index

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Efficient index updates for mixed update and query loads2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840697(984-991)Online publication date: Dec-2016
  • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
  • (2014)Incremental Text Indexing for Fast Disk-Based SearchACM Transactions on the Web10.1145/25608008:3(1-31)Online publication date: 8-Jul-2014
  • (2013)Fast candidate generation for real-time tweet search with bloom filter chainsACM Transactions on Information Systems10.1145/2493175.249317831:3(1-36)Online publication date: 5-Aug-2013
  • (2012)Cache-Oblivious dictionaries and multimaps with negligible failure probabilityProceedings of the First Mediterranean conference on Design and Analysis of Algorithms10.1007/978-3-642-34862-4_15(203-218)Online publication date: 3-Dec-2012
  • (2011)Index tuning for query-log based on-line index maintenanceProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063874(1997-2000)Online publication date: 24-Oct-2011
  • (2011)External-memory multimapsProceedings of the 22nd international conference on Algorithms and Computation10.1007/978-3-642-25591-5_40(384-394)Online publication date: 5-Dec-2011
  • (2010)Scalable online index construction with multi-core CPUsProceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 10410.5555/1862242.1862249(29-36)Online publication date: 1-Jan-2010
  • (2009)Low-cost management of inverted files for online full-text searchProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646012(455-464)Online publication date: 2-Nov-2009
  • (2009)On-line index maintenance using horizontal partitioningProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646010(435-444)Online publication date: 2-Nov-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media