Article

Mining anchor text for query refinement

Authors:
Reiner Kraft

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Jason Zien

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

WWW '04: Proceedings of the 13th international conference on World Wide WebMay 2004Pages 666–674https://doi.org/10.1145/988672.988763

Published:17 May 2004Publication History

WWW '04: Proceedings of the 13th international conference on World Wide Web

Pages 666–674

ABSTRACT

When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automatically generating refinements or related terms to queries by mining anchor text for a large hypertext document collection. We show that the usage of anchor text as a basis for query refinement produces high quality refinement suggestions that are significantly better in terms of perceived usefulness compared to refinements that are derived using the document content. Furthermore, our study suggests that anchor text refinements can also be used to augment traditional query refinement algorithms based on query logs, since they typically differ in coverage and produce different refinements. Our results are based on experiments on an anchor text collection of a large corporate intranet.

References

P. Anick. Using terminological feedback for web search refinement: a log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 88--95. ACM Press, 2003. Google ScholarDigital Library
P. G. Anick and S. Tipirneni. The paraphrase search assistant: terminological feedback for iterative information seeking. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 153--159. ACM Press, 1999. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
A. Z. Broder. A taxonomy of web search. SIGIR Forum, 36(2), 2002. Google ScholarDigital Library
E. W. Brown and H. A. Chong. The GURU system in TREC-6. In Text REtrieval Conference, pages 535--540, 1997.Google Scholar
C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. In Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, 1994. Google ScholarDigital Library
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Text REtrieval Conference, pages 69--80, 1994.Google Scholar
D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 283--290. ACM Press, 2002. Google ScholarDigital Library
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proceedings of the 7th World Wide Web Conference, 1998. Google ScholarDigital Library
J. Cooper and R. Byrd. OBIWAN a visual interface for prompted query refinement. H1CSS31, Hawaii, USA, 2:277--285, January 1998. Google ScholarDigital Library
N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In Research and Development in Information Retrieval, pages 250--257, 2001. Google ScholarDigital Library
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International Conference on World Wide Web, pages 613--622. ACM Press, 2001. Google ScholarDigital Library
N. Eiron and K. S. McCurley. Analysis of anchor text for web search. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 459--460. ACM Press, 2003. Google ScholarDigital Library
R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 301--312. ACM Press, 2003. Google ScholarDigital Library
L. Fitzpatrick and M. Dent. Automatic feedback using past queries: social searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 306--313. ACM Press, 1997. Google ScholarDigital Library
W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, New Jersey, 1992. Google ScholarDigital Library
M. Kobayashi and K. Takeda. Information retrieval on the web. ACM Comput. Surv., 32(2):144--173, 2000. Google ScholarDigital Library
D. Lawrie, W. B. Croft, and A. Rosenberg. Finding topic words for hierarchical summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 349--357. ACM Press, 2001. Google ScholarDigital Library
W.-H. Lu, L.-F. Chien, and H.-J. Lee. Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing (TALIP), 1(2):159--172, 2002. Google ScholarDigital Library
O. A. McBryan. GENVL and WWWW: Tools for taming the web. In World Wide Web Conference (WWW'94), Geneva, Switzerland, 1994.Google ScholarCross Ref
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
Y. Qiu and H.-P. Frei. Concept-based query expansion. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, pages 160--169, Pittsburgh, US, 1993. Google ScholarDigital Library
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999. Google ScholarDigital Library
B. Velez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 6--15. ACM Press, 1997. Google ScholarDigital Library
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996. Google ScholarDigital Library
J. Zien, J. Meyer, J. Tomlin, and J. Liu. Web query characteristics and their implications on search engines. IBM Research Report, RJ 10199, November 2000.Google Scholar

Index Terms

Mining anchor text for query refinement
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Query reformulation using anchor text
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected ...
Read More
A query refinement framework for xml keyword search

Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, ...
Read More
Disjunctive Sets of Phrase Queries for Diverse Query Suggestion
WI '19: IEEE/WIC/ACM International Conference on Web Intelligence

This paper proposes a method of suggesting expanded queries that disambiguate the original Web query which has multiple interpretations. In order to produce a diverse set of queries including those corresponding to infrequent query intents, our method ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '04: Proceedings of the 13th international conference on World Wide Web
May 2004
754 pages
ISBN:158113844X
DOI:10.1145/988672
Conference Chairs:
Stuart Feldman
IBM Research
,
Mike Uretsky
New York University
,
Program Chairs:
Marc Najork
Microsoft Research
,
Craig Wills
Worcester Polytechnic Institute
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anchor text
query refinement
rank
web search
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 108
  Total Citations
  View Citations
- 1,783
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining anchor text for query refinement

WWW '04: Proceedings of the 13th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query reformulation using anchor text

A query refinement framework for xml keyword search

Disjunctive Sets of Phrase Queries for Diverse Query Suggestion