skip to main content
10.1145/1008992.1009030acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Learning to cluster web search results

Published: 25 July 2004 Publication History

Abstract

Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the clustering problem as a salient phrase ranking problem. Given a query and the ranked list of documents (typically a list of titles and snippets) returned by a certain Web search engine, our method first extracts and ranks salient phrases as candidate cluster names, based on a regression model learned from human labeled training data. The documents are assigned to relevant salient phrases to form candidate clusters, and the final clusters are generated by merging these candidate clusters. Experimental results verify our method's feasibility and effectiveness.

References

[1]
Liu B., Chin C. W., and Ng, H. T. Mining Topic-Specific Concepts and Definitions on the Web. In Proceedings of the Twelfth International World Wide Web Conference (WWW'03), Budapest, Hungary, 2003.
[2]
Chien L. F. PAT-Tree-Based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval. In Proceedings of the 20th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97), pages 50--58, Phliadelphia, 1997.
[3]
Cutting D. R., Karger D. R., and Pederson J. O. Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93), pages 125--135, Pittsburgh, PA, 1993.
[4]
Google search engine, (2004) http://www.google.com.
[5]
Hastie T., Tibshirani R., and Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag, 2001.
[6]
Hearst M. A., Pedersen J. O. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'96), Zurich, June 1996.
[7]
Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. Schölkopf B. and Burges C. and Smola A. (ed.), MIT-Press, 1999.
[8]
Lawrie D. and Croft W. B. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pages 349--357, 2001.
[9]
Lent B., Agrawal R., and Srikant R. Discovering Trends in Text Databases. In Proceedings of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD'97), Newport Beach, California, August 1997.
[10]
Leouski A. V. and Croft W. B. An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996.
[11]
Leuski A. and Allan J. Improving Interactive Retrieval by Combining Ranked List and Clustering. Proceedings of RIAO, College de France, pp. 665--681, 2000.
[12]
MSN search engine, (2004) http://search.msn.com.
[13]
Smola, A. J. and Schlkopf, B. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series, NC2-TR-1998-030. October, 1998.
[14]
Vivisimo clustering engine, (2004) http://vivisimo.com.
[15]
Yahoo search engine, (2004) http://www.yahoo.com.
[16]
Zamir O., Etzioni O. Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of the Eighth International World Wide Web Conference (WWW8), Toronto, Canada, May 1999.
[17]
Zamir O., Etzioni O. Web Document Clustering: A Feasibility Demonstration, Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98), 46--54, 1998.

Cited By

View all
  • (2024)Search Result Presentation for Non-Native Language DocumentsCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645224(89-94)Online publication date: 18-Mar-2024
  • (2024)A supervised weeding method to cluster high dimensional predictors with application to job market analysisJournal of Applied Statistics10.1080/02664763.2024.234863451:16(3350-3365)Online publication date: May-2024
  • (2023)Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information RetrievalProceedings of the 30th International Conference on Information Systems Development10.62036/ISD.2022.38Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
July 2004
624 pages
ISBN:1581138814
DOI:10.1145/1008992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document clustering
  2. regression analysis
  3. search result organization

Qualifiers

  • Article

Conference

SIGIR04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Search Result Presentation for Non-Native Language DocumentsCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645224(89-94)Online publication date: 18-Mar-2024
  • (2024)A supervised weeding method to cluster high dimensional predictors with application to job market analysisJournal of Applied Statistics10.1080/02664763.2024.234863451:16(3350-3365)Online publication date: May-2024
  • (2023)Relevance Judgment Convergence Degree – A Measure of Inconsistency among Assessors for Information RetrievalProceedings of the 30th International Conference on Information Systems Development10.62036/ISD.2022.38Online publication date: 2023
  • (2023)Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval DatasetsAdvances in Information Systems Development10.1007/978-3-031-32418-5_9(149-168)Online publication date: 27-Jun-2023
  • (2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-9Online publication date: 11-Feb-2022
  • (2022)Clustering Image Search Results by Entity DisambiguationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_24(369-384)Online publication date: 10-Mar-2022
  • (2021)CoNotate: Suggesting Queries Based on Notes Promotes Knowledge DiscoveryProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445618(1-14)Online publication date: 6-May-2021
  • (2021)ToFM: Topic-specific Facet Mining by Facet Propagation within Clusters2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00060(402-409)Online publication date: Dec-2021
  • (2021)Examining the Historical Development of Techno-Scientific Biomedical Communication in Russia2021 Communication Strategies in Digital Society Seminar (ComSDS)10.1109/ComSDS52473.2021.9422848(108-114)Online publication date: 14-Apr-2021
  • (2020)A semi-hierarchical clustering method for constructing knowledge trees from stackoverflowJournal of Information Science10.1177/016555152096103548:3(393-405)Online publication date: 21-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media