research-article

The opposite of smoothing: a language model approach to ranking query-specific document clusters

Author:

Oren KurlandAuthors Info & Claims

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 171 - 178

https://doi.org/10.1145/1390334.1390366

Published: 20 July 2008 Publication History

Abstract

Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model also exploits information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed; the performance also favorably compares with that of a state-of-the-art pseudo-feedback retrieval method.

References

[1]

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proceedings of the Thirteenth Text Retrieval Conference (TREC-13), 2004.]]

[2]

L. Azzopardi, M. Girolami, and K. van Rijsbergen. Topic based language models for ad hoc information retrieval. In Proceedings of International Conference on Neural Networks and IEEE International Conference on Fuzzy Systems, pages 3281--3286, 2004.]]

[3]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, pages 107--117, 1998.]]

Digital Library

[4]

C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC3. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 69--80, 1994.]]

[5]

http://www.clusty.com.]]

[6]

M. Connell, A. Feng, G. Kumaran, H. Raghavan, C. Shah, and J. Allan. UMass at TDT 2004. TDT2004 System Description, 2004.]]

[7]

W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.]]

[8]

W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]]

Digital Library

[9]

F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of the Fourteenth International Conference on Information and Knowledge Managment (CIKM), pages 672--679, 2005.]]

Digital Library

[10]

F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proceedings of SIGIR, pages 154--161, 2006.]]

Digital Library

[11]

F. Geraci, M. Pellegrini, M. Maggini, and F. Sebastiani. Cluster generation and cluster labeling for Web snippets: A fast and accurate hierarchical solution. In Proceedings of the 13th international conference on string processing and information retrieval (SPIRE), pages 25--37, 2006.]]

Digital Library

[12]

G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, third edition, 1996.]]

[13]

A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 365--373, 1997.]]

Digital Library

[14]

M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, 1996.]]

Digital Library

[15]

N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.]]

[16]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Technical Report Research Report RJ 10076, IBM, May 1997.]]

[17]

O. Kurland. Inter-document similarities, language models, and ad hoc retrieval. PhD thesis, Cornell University, 2006.]]

Digital Library

[18]

O. Kurland and C. Domshlak. A rank-aggregation approach to searching for optimal query-specific clusters. In Proceedings of SIGIR, 2008.]]

Digital Library

[19]

O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]]

Digital Library

[20]

O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, pages 306--313, 2005.]]

Digital Library

[21]

O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR, pages 83--90, 2006.]]

Digital Library

[22]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]]

Digital Library

[23]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]]

Digital Library

[24]

V. Lavrenko, J. Allan, E. DeGuzman, D. LaFlamme, V. Pollard, and S. Thomas. Relevance models for topic detection and tracking. In Proceedings of the Human Language Technology Conference (HLT), pages 104--110, 2002.]]

Digital Library

[25]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.]]

Digital Library

[26]

V. Lavrenko and W. B. Croft. Relevance models in information retrieval. In Croft and Lafferty {8}, pages 11--56.]]

[27]

A. Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Managment (CIKM), pages 33--40, 2001.]]

Digital Library

[28]

A. Leuski and J. Allan. Evaluating a visual navigation system for a digital library. In Proceedings of the Second European conference on research and advanced technology for digital libraries (ECDL), pages 535--554, 1998.]]

Digital Library

[29]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]]

Digital Library

[30]

X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.]]

[31]

X. Liu and W. B. Croft. Representing clusters for retrieval. In Proceedings of SIGIR, pages 671--672, 2006. Poster.]]

Digital Library

[32]

Q. Mei, X. Shen, and C. Zhai. Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference, pages 490--499, 2007.]]

Digital Library

[33]

C. R. Palmer, J. Pesenty, R. Veldes-Perez, M. Christel, A. G. Hauptmann, D. Ng, and H. D. Wactlar. Demonstration of hierarchical document clustering of digital library retrieval results. In Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries, page 451, 2001.]]

Digital Library

[34]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]]

Digital Library

[35]

S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.]]

[36]

J. G. Shanahan, J. Bennett, D. A. Evans, D. A. Hull, and J. Montgomery. Clairvoyance Corporation experiments in the TREC 2003. High accuracy retrieval from documents (HARD) track. In Proceedings of the Twelfth Text Retrieval Conference (TREC-12), pages 152--160, 2003.]]

[37]

L. Si, R. Jin, J. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In Proceedings of the 11th International Conference on Information and Knowledge Managment (CIKM), pages 391--397, 2002.]]

Digital Library

[38]

A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]]

Digital Library

[39]

P. Treeratpituk and J. Callan. Automatically labeling hierarchical clusters. In Proceedings of the sixth national conference on digital government research, pages 167--176, 2006.]]

Digital Library

[40]

C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979.]]

Digital Library

[41]

E. M. Voorhees. The cluster hypothesis revisited. In Proceedings of SIGIR, pages 188--196, 1985.]]

Digital Library

[42]

X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In Proceedings of SIGIR, 2006.]]

Digital Library

[43]

P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]

[44]

J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of SIGIR, pages 4--11, 1996.]]

Digital Library

[45]

O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR, pages 46--54, 1998.]]

Digital Library

[46]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]]

Digital Library

Cited By

Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Levi ORaiber FKurland OGuy IMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Selective Cluster-Based Document RetrievalProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983737(1473-1482)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983737
Show More Cited By

Index Terms

The opposite of smoothing: a language model approach to ranking query-specific document clusters
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Ranking document clusters using markov random fields
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types ...
Re-ranking search results using language models of query-specific clusters
Abstract
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and ...
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results
Abstract
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
681
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Levi ORaiber FKurland OGuy IMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Selective Cluster-Based Document RetrievalProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983737(1473-1482)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983737
Anand RKotov A(2016)Improving Difficult Queries by Leveraging Clusters in Term GraphInformation Retrieval Technology10.1007/978-3-319-28940-3_37(426-432)Online publication date: 22-Jan-2016
https://doi.org/10.1007/978-3-319-28940-3_37
Raiber FKurland ORadlinski FShokouhi MAllan JCroft Bde Vries AZhai C(2015)Learning Asymmetric Co-RelevanceProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809454(281-290)Online publication date: 27-Sep-2015
https://dl.acm.org/doi/10.1145/2808194.2809454
Mao JLu KMu XLi G(2015)Mining document, concept, and term associations for effective biomedical retrieval: introducing MeSH-enhanced retrieval modelsInformation Retrieval Journal10.1007/s10791-015-9264-018:5(413-444)Online publication date: 4-Sep-2015
https://doi.org/10.1007/s10791-015-9264-0
Kurland O(2014)The Cluster Hypothesis in Information RetrievalProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.1007/978-3-319-06028-6_105(823-826)Online publication date: 13-Apr-2014
https://dl.acm.org/doi/10.1007/978-3-319-06028-6_105
Di Marco ANavigli R(2013)Clustering and Diversifying Web Search Results with Graph-Based Word Sense InductionComputational Linguistics10.1162/COLI_a_0014839:3(709-754)Online publication date: Sep-2013
https://doi.org/10.1162/COLI_a_00148
Raiber FKurland OChen XLebanon GWang HZaki M(2012)Exploring the cluster hypothesis, and cluster-based retrieval, over the webProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398678(2507-2510)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2398678
Kurland ORaiber FShtok AChen XLebanon GWang HZaki M(2012)Query-performance prediction and cluster rankingProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398666(2459-2462)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2398666
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten