research-article

A rank-aggregation approach to searching for optimal query-specific clusters

Authors:

Carmel DomshlakAuthors Info & Claims

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 547 - 554

https://doi.org/10.1145/1390334.1390428

Published: 20 July 2008 Publication History

Abstract

To improve the precision at the very top ranks of a document list presented in response to a query, researchers suggested to exploit information induced from clustering of documents highly ranked by some initial search. We propose a novel model for ranking such (query-specific) clusters by the presumed percentage of relevant documents that they contain. The model is based on (i) proposing a palette of "witness" cluster properties that purportedly correlate with this percentage, (ii) devising concrete quantitative measures for these properties, and (iii) ordering the clusters via aggregation of rankings induced by these individual measures. Empirical evaluation shows that our model is consistently more effective than previously suggested methods in detecting clusters containing a high relevant-document percentage. Furthermore, the precision-at-top-ranks performance of this model transcends that of standard document-based retrieval, and competes with that of a state-of-the-art document-based retrieval approach.

References

[1]

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proceedings of TREC-13, 2004.]]

[2]

C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC3. In Proceedings of TREC-3, pages 69--80, 1994.]]

[3]

K. Collins-Thompson and J. Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of SIGIR, pages 303--310, 2007.]]

Digital Library

[4]

W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.]]

[5]

W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]]

Digital Library

[6]

P. Diaconis. Group Theory in Statistics. Harvard Lecture Notes, 1982.]]

[7]

F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of CIKM, pages 672--679, 2005.]]

Digital Library

[8]

F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proceedings of SIGIR, pages 154--161, 2006.]]

Digital Library

[9]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In Proceedings of the World Wide Web Conference, pages 613--622, Hong Kong, 2001.]]

Digital Library

[10]

E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of TREC-2, 1994.]]

[11]

M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, pages 76--84, 1996.]]

Digital Library

[12]

N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.]]

[13]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Technical Report Research Report RJ 10076, IBM, May 1997.]]

[14]

O. Kurland. Inter-document similarities, language models, and ad hoc retrieval. PhD thesis, Cornell University, 2006.]]

Digital Library

[15]

O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR, 2008.]]

Digital Library

[16]

O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]]

Digital Library

[17]

O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, pages 306--313, 2005.]]

Digital Library

[18]

O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR, pages 83--90, 2006.]]

Digital Library

[19]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]]

Digital Library

[20]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.]]

Digital Library

[21]

V. Lavrenko and W. B. Croft. Relevance models in information retrieval. In Croft and Lafferty {5}, pages 11--56.]]

[22]

A. Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of CIKM, pages 33--40, 2001.]]

Digital Library

[23]

A. Leuski and J. Allan. Evaluating a visual navigation system for a digital library. In Proceedings of ECDL, pages 535--554, 1998.]]

Digital Library

[24]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]]

Digital Library

[25]

X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.]]

[26]

X. Liu and W. B. Croft. Representing clusters for retrieval. In Proceedings of SIGIR, pages 671--672, 2006. Poster.]]

Digital Library

[27]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]]

Digital Library

[28]

S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.]]

[29]

I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2):95--145, 2003.]]

Digital Library

[30]

J. G. Shanahan, J. Bennett, D. A. Evans, D. A. Hull, and J. Montgomery. Clairvoyance Corporation experiments in the TREC 2003. High accuracy retrieval from documents (HARD) track. In Proceedings of TREC-12, pages 152--160, 2003.]]

[31]

A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]]

Digital Library

[32]

C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979.]]

Digital Library

[33]

E. M. Voorhees and D. K. Harman. TREC: Experiments and evlautaion in information retrieval. The MIT Press, 2005.]]

Digital Library

[34]

P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]

[35]

J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of SIGIR, pages 4--11, 1996.]]

Digital Library

[36]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of SIGIR, pages 512--519, 2005.]]

Digital Library

[37]

H. P. Young. An axiomatization of Borda's rule. Journal of Economic Theory, 9:43--52, 1974.]]

[38]

O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR, pages 46--54, 1998.]]

Digital Library

[39]

C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM, pages 403--410, 2001.]]

Digital Library

[40]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]]

Digital Library

[41]

H. J. Zimmermann. Fuzzy Set Theory. Kluwer Academic, 3 edition, 1996.]]

Cited By

MacAvaney STonellotto NMacdonald CAl Hasan MXiong L(2022)Adaptive Re-Ranking with a Corpus GraphProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557231(1491-1500)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557231
Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Ben Ayed ABiskri IMeunier J(2021)An End-to-End Efficient Lucene-Based Framework of Document/Information RetrievalInternational Journal of Information Retrieval Research10.4018/IJIRR.28995012:1(1-14)Online publication date: 19-Oct-2021
https://doi.org/10.4018/IJIRR.289950
Show More Cited By

Index Terms

A rank-aggregation approach to searching for optimal query-specific clusters
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Effective rank aggregation for metasearching

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Rank aggregation using ant colony approach for metasearch

Metasearch engines provide a plethora of information to the user through World Wide Web. They are the prominent sources of query-based search and centralized human---world interactions. Metasearch engine shows a list of Web sites to a particular query ...
Image re-ranking and rank aggregation based on similarity of ranked lists

In Content-based Image Retrieval (CBIR) systems, ranking accurately collection images is of great relevance. Users are interested in the returned images placed at the first positions, which usually are the most relevant ones. Collection images are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
607
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

MacAvaney STonellotto NMacdonald CAl Hasan MXiong L(2022)Adaptive Re-Ranking with a Corpus GraphProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557231(1491-1500)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557231
Markovskiy ERaiber FSabach SKurland OAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531819
Ben Ayed ABiskri IMeunier J(2021)An End-to-End Efficient Lucene-Based Framework of Document/Information RetrievalInternational Journal of Information Retrieval Research10.4018/IJIRR.28995012:1(1-14)Online publication date: 19-Oct-2021
https://doi.org/10.4018/IJIRR.289950
Sheetrit EFyodorov YRaiber FKurland OHasibi FFang YAizawa A(2021)Recommending Search Queries in Documents Using Inter N-Gram SimilaritiesProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472252(211-220)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3471158.3472252
Sheetrit EShtok AKurland O(2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
https://doi.org/10.1007/s10791-020-09369-x
Sheetrit EKurland OZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Cluster-Based Focused RetrievalProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358087(2305-2308)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358087
Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Levi ORaiber FKurland OGuy IMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Selective Cluster-Based Document RetrievalProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983737(1473-1482)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983737
Bota HZhou KJose JLalmas MChung CBroder AShim KSuel T(2014)Composite retrieval of heterogeneous web searchProceedings of the 23rd international conference on World wide web10.1145/2566486.2567985(119-130)Online publication date: 7-Apr-2014
https://dl.acm.org/doi/10.1145/2566486.2567985
Kurland O(2014)The Cluster Hypothesis in Information RetrievalProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.1007/978-3-319-06028-6_105(823-826)Online publication date: 13-Apr-2014
https://dl.acm.org/doi/10.1007/978-3-319-06028-6_105
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten