skip to main content
10.1145/1390334.1390428acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A rank-aggregation approach to searching for optimal query-specific clusters

Published: 20 July 2008 Publication History

Abstract

To improve the precision at the very top ranks of a document list presented in response to a query, researchers suggested to exploit information induced from clustering of documents highly ranked by some initial search. We propose a novel model for ranking such (query-specific) clusters by the presumed percentage of relevant documents that they contain. The model is based on (i) proposing a palette of "witness" cluster properties that purportedly correlate with this percentage, (ii) devising concrete quantitative measures for these properties, and (iii) ordering the clusters via aggregation of rankings induced by these individual measures. Empirical evaluation shows that our model is consistently more effective than previously suggested methods in detecting clusters containing a high relevant-document percentage. Furthermore, the precision-at-top-ranks performance of this model transcends that of standard document-based retrieval, and competes with that of a state-of-the-art document-based retrieval approach.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proceedings of TREC-13, 2004.]]
[2]
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC3. In Proceedings of TREC-3, pages 69--80, 1994.]]
[3]
K. Collins-Thompson and J. Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of SIGIR, pages 303--310, 2007.]]
[4]
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.]]
[5]
W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]]
[6]
P. Diaconis. Group Theory in Statistics. Harvard Lecture Notes, 1982.]]
[7]
F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of CIKM, pages 672--679, 2005.]]
[8]
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proceedings of SIGIR, pages 154--161, 2006.]]
[9]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In Proceedings of the World Wide Web Conference, pages 613--622, Hong Kong, 2001.]]
[10]
E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of TREC-2, 1994.]]
[11]
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, pages 76--84, 1996.]]
[12]
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.]]
[13]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Technical Report Research Report RJ 10076, IBM, May 1997.]]
[14]
O. Kurland. Inter-document similarities, language models, and ad hoc retrieval. PhD thesis, Cornell University, 2006.]]
[15]
O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR, 2008.]]
[16]
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]]
[17]
O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, pages 306--313, 2005.]]
[18]
O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR, pages 83--90, 2006.]]
[19]
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]]
[20]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.]]
[21]
V. Lavrenko and W. B. Croft. Relevance models in information retrieval. In Croft and Lafferty {5}, pages 11--56.]]
[22]
A. Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of CIKM, pages 33--40, 2001.]]
[23]
A. Leuski and J. Allan. Evaluating a visual navigation system for a digital library. In Proceedings of ECDL, pages 535--554, 1998.]]
[24]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]]
[25]
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.]]
[26]
X. Liu and W. B. Croft. Representing clusters for retrieval. In Proceedings of SIGIR, pages 671--672, 2006. Poster.]]
[27]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]]
[28]
S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.]]
[29]
I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2):95--145, 2003.]]
[30]
J. G. Shanahan, J. Bennett, D. A. Evans, D. A. Hull, and J. Montgomery. Clairvoyance Corporation experiments in the TREC 2003. High accuracy retrieval from documents (HARD) track. In Proceedings of TREC-12, pages 152--160, 2003.]]
[31]
A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]]
[32]
C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979.]]
[33]
E. M. Voorhees and D. K. Harman. TREC: Experiments and evlautaion in information retrieval. The MIT Press, 2005.]]
[34]
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]
[35]
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of SIGIR, pages 4--11, 1996.]]
[36]
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of SIGIR, pages 512--519, 2005.]]
[37]
H. P. Young. An axiomatization of Borda's rule. Journal of Economic Theory, 9:43--52, 1974.]]
[38]
O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR, pages 46--54, 1998.]]
[39]
C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM, pages 403--410, 2001.]]
[40]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]]
[41]
H. J. Zimmermann. Fuzzy Set Theory. Kluwer Academic, 3 edition, 1996.]]

Cited By

View all
  • (2022)Adaptive Re-Ranking with a Corpus GraphProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557231(1491-1500)Online publication date: 17-Oct-2022
  • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
  • (2021)An End-to-End Efficient Lucene-Based Framework of Document/Information RetrievalInternational Journal of Information Retrieval Research10.4018/IJIRR.28995012:1(1-14)Online publication date: 19-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster properties
  2. clusters
  3. language models
  4. optimal cluster
  5. query-specific clustering
  6. rank aggregation

Qualifiers

  • Research-article

Conference

SIGIR '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Adaptive Re-Ranking with a Corpus GraphProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557231(1491-1500)Online publication date: 17-Oct-2022
  • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
  • (2021)An End-to-End Efficient Lucene-Based Framework of Document/Information RetrievalInternational Journal of Information Retrieval Research10.4018/IJIRR.28995012:1(1-14)Online publication date: 19-Oct-2021
  • (2021)Recommending Search Queries in Documents Using Inter N-Gram SimilaritiesProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472252(211-220)Online publication date: 11-Jul-2021
  • (2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
  • (2019)Cluster-Based Focused RetrievalProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358087(2305-2308)Online publication date: 3-Nov-2019
  • (2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
  • (2016)Selective Cluster-Based Document RetrievalProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983737(1473-1482)Online publication date: 24-Oct-2016
  • (2014)Composite retrieval of heterogeneous web searchProceedings of the 23rd international conference on World wide web10.1145/2566486.2567985(119-130)Online publication date: 7-Apr-2014
  • (2014)The Cluster Hypothesis in Information RetrievalProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.1007/978-3-319-06028-6_105(823-826)Online publication date: 13-Apr-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media