|
ABSTRACT
Search engines need to evaluate queries extremely fast, a challenging task given the quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this article we consider how phrase queries can be efficiently supported with low disk overheads. Our previous research has shown that phrase queries can be rapidly evaluated using nextword indexes, but these indexes are twice as large as conventional inverted files. Alternatively, special-purpose phrase indexes can be used, but it is not feasible to index all phrases. We propose combinations of nextword indexes and phrase indexes with inverted files as a solution to this problem. Our experiments show that combined use of a partial nextword, partial phrase, and conventional inverted index allows evaluation of phrase queries in a quarter the time required to evaluate such queries with an inverted file alone; the additional space overhead is only 26% of the size of the inverted file.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Bahle, D. 2003. Efficient phrase querying. Ph.D. dissertation. School of Computer Science and Information Technology, RMIT, Melbourne, Australia.
|
| |
3
|
Bahle, D., Williams, H. E., and Zobel, J. 2001a. Compaction techniques for nextword indexes. In Proceedings of the String Processing and Information Retrieval Symposium (San Rafael, Chile). IEEE Computer Society Press, Los Alamitos, CA, 33--45.
|
| |
4
|
Dirk Bahle , Hugh E. Williams , Justin Zobel, Optimised phrase querying and browsing of large text databases, Proceedings of the 24th Australasian conference on Computer science, p.11-19, January 29-February 02, 2001, Gold Coast, Queensland, Australia
|
| |
5
|
|
| |
6
|
Clarke, C. L., Cormack, G. V., and Tudhope, E. A. 1997. Relevance ranking for one- to three-term queries. In Proceedings of the Fifth RIAO International Conference "Recherche d'Information Assistee par Ordinateur" (Montreal P.Q., Canada). 388--400.
|
 |
7
|
W. Bruce Croft , Howard R. Turtle , David D. Lewis, The use of phrases and structured queries in information retrieval, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.32-45, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122864]
|
 |
8
|
Erika F. de Lima , Jan O. Pedersen, Phrase recognition and expansion for short, precision-biased queries based on a query log, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.145-152, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312669]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
Gordon W. Paynter , Ian H. Witten , Sally Jo Cunningham , George Buchanan, Scalable browsing for large collections: a case study, Proceedings of the fifth ACM conference on Digital libraries, p.215-223, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336666]
|
| |
15
|
|
 |
16
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
 |
17
|
|
| |
18
|
|
| |
19
|
Spink, A. and Xu, J. 2000. Selected results from a large study of Web searching: The Excite study. Informat. Res. 6, 1. Available online at: http://InformationR.net/ir/6-1/paper90.html.
|
| |
20
|
Voorhees, E. M. and Harman, D. K. 2001. Overview of TREC 2001. In The Tenth Text REtrieval Conference (TREC 2001), E. M. Voorhees and D. K. Harman, Eds. NIST Spec. pub. 500-250. National Institute of Standards and Technology, Gaithersburg, MD, 1--15.
|
| |
21
|
Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.
|
| |
22
|
Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference (Auckland, New Zealand), M. Orlowska, Ed. Springer-Verlag, Berlin, Germany, 141--152.
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
|