skip to main content
10.1145/988672.988763acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Mining anchor text for query refinement

Published:17 May 2004Publication History

ABSTRACT

When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automatically generating refinements or related terms to queries by mining anchor text for a large hypertext document collection. We show that the usage of anchor text as a basis for query refinement produces high quality refinement suggestions that are significantly better in terms of perceived usefulness compared to refinements that are derived using the document content. Furthermore, our study suggests that anchor text refinements can also be used to augment traditional query refinement algorithms based on query logs, since they typically differ in coverage and produce different refinements. Our results are based on experiments on an anchor text collection of a large corporate intranet.

References

  1. P. Anick. Using terminological feedback for web search refinement: a log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 88--95. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. G. Anick and S. Tipirneni. The paraphrase search assistant: terminological feedback for iterative information seeking. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 153--159. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Z. Broder. A taxonomy of web search. SIGIR Forum, 36(2), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. W. Brown and H. A. Chong. The GURU system in TREC-6. In Text REtrieval Conference, pages 535--540, 1997.Google ScholarGoogle Scholar
  6. C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. In Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Text REtrieval Conference, pages 69--80, 1994.Google ScholarGoogle Scholar
  8. D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 283--290. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proceedings of the 7th World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Cooper and R. Byrd. OBIWAN a visual interface for prompted query refinement. H1CSS31, Hawaii, USA, 2:277--285, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In Research and Development in Information Retrieval, pages 250--257, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International Conference on World Wide Web, pages 613--622. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Eiron and K. S. McCurley. Analysis of anchor text for web search. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 459--460. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 301--312. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Fitzpatrick and M. Dent. Automatic feedback using past queries: social searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 306--313. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, New Jersey, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kobayashi and K. Takeda. Information retrieval on the web. ACM Comput. Surv., 32(2):144--173, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lawrie, W. B. Croft, and A. Rosenberg. Finding topic words for hierarchical summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 349--357. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W.-H. Lu, L.-F. Chien, and H.-J. Lee. Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing (TALIP), 1(2):159--172, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. A. McBryan. GENVL and WWWW: Tools for taming the web. In World Wide Web Conference (WWW'94), Geneva, Switzerland, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  22. Y. Qiu and H.-P. Frei. Concept-based query expansion. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, pages 160--169, Pittsburgh, US, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Velez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 6--15. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Zien, J. Meyer, J. Tomlin, and J. Liu. Web query characteristics and their implications on search engines. IBM Research Report, RJ 10199, November 2000.Google ScholarGoogle Scholar

Index Terms

  1. Mining anchor text for query refinement

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '04: Proceedings of the 13th international conference on World Wide Web
      May 2004
      754 pages
      ISBN:158113844X
      DOI:10.1145/988672

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 May 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader