skip to main content
10.1145/2911451.2911511acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines

Published:07 July 2016Publication History

ABSTRACT

A robust retrieval system ensures that user experience is not damaged by the presence of poorly-performing queries. Such robustness can be measured by risk-sensitive evaluation measures, which assess the extent to which a system performs worse than a given baseline system. However, using a particular, single system as the baseline suffers from the fact that retrieval performance highly varies among IR systems across topics. Thus, a single system would in general fail in providing enough information about the real baseline performance for every topic under consideration, and hence it would in general fail in measuring the real risk associated with any given system. Based upon the Chi-squared statistic, we propose a new measure ZRisk that exhibits more promise since it takes into account multiple baselines when measuring risk, and a derivative measure called GeoRisk, which enhances ZRisk by also taking into account the overall magnitude of effectiveness. This paper demonstrates the benefits of ZRisk and GeoRisk upon TREC data, and how to exploit GeoRisk for risk-sensitive learning to rank, thereby making use of multiple baselines within the learning objective function to obtain effective yet risk-averse/robust ranking systems. Experiments using 10,000 topics from the MSLR learning to rank dataset demonstrate the efficacy of the proposed Chi-square statistic-based objective function.

References

  1. A. Agresti. Categorical Data Analysis. Wiley, 2002. 2nd ed.,Google ScholarGoogle ScholarCross RefCross Ref
  2. G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proceedings of ECIR, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: ad-hoc retrieval results since 1998. In Proceedings of ACM CIKM, 2009.łooseness 0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Beitzel, E. Jensen, and O. Frieder. GMAP. In L. Liu and M. Özsu, eds., Encyclopedia of Database Systems, pp 1256--1256, 2009.\pageenlarge2Google ScholarGoogle ScholarCross RefCross Ref
  5. P. N. Bennett, M. Shokouhi, and R. Caruana. Implicit preference labels for learning highly selective personalized rankers. In Proceedings of ACM ICTIR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. J. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010--82, Microsoft Research, 2010.Google ScholarGoogle Scholar
  8. D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of ACM SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of ACM CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. L. A. Clarke, N. Craswell, and E. Voorhees. Overview of the TREC 2012 Web track. In Proceedings of TREC, 2012.Google ScholarGoogle Scholar
  11. K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Proceedings of ACM CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Collins-Thompson, P. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. Overview of the TREC 2013 Web track. In Proceedings of TREC, 2013.Google ScholarGoogle Scholar
  13. B. T. Dinçer, C. Macdonald, and I. Ounis. Hypothesis testing for the risk-sensitive evaluation of retrieval systems. In Proceedings of ACM SIGIR, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. T. Dinçer, I. Ounis, and C. Macdonald. Tackling biased baselines in the risk-sensitive evaluation of retrieval systems. In Proceedings of ECIR, 2014.Google ScholarGoogle Scholar
  15. Y. Ganjisaffar, R. Caruana, and C. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of ACM SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Hoaglin, F. Mosteller, and J. Tukey, eds. Understanding robust & exploratory data analysis. Wiley, 1983.Google ScholarGoogle Scholar
  17. S. Kharazmi, F. Scholer, D. Vallet and M. Sanderson. Examining Additivity and Weak Baselines. TOIS, to appear, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Kocabaş, B. T. Dinçer, and B. Karaoglan. A nonparametric term weighting method for information retrieval based on measuring the divergence from independence. Information Retrieval, 17(2):153--176, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T.-Y. Liu. Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Macdonald, R. L. Santos, and I. Ounis. The whens and hows of learning to rank for web search. Information Retrieval., 16(5):584--628, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. A. Metzler. Automatic feature selection in the markov random field model for information retrieval. In Proceedings of ACM CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Oakes, R. Gaaizauskas, H. Fowkes, A. Jonsson, V. Wan, and M. Beaulieu. A method based on the chi-square test for document classification. In Proceedings of ACM SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Robertson. On GMAP - and other transformations. In Proceedings of ACM CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. M. Voorhees. Overview of the TREC 2003 Robust retrieval track. In Proceedings of TREC, 2003.% NIST Special Publication 500--255.Google ScholarGoogle Scholar
  25. E. M. Voorhees. The TREC Robust retrieval track. SIGIR Forum, 39(1):11--20, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. M. Voorhees and C. Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of ACM SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Wang, P. N. Bennett, and K. Collins-Thompson. Robust ranking models via risk-sensitive optimization. In Proceedings of ACM SIGIR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Q. Wu, C. J. C. Burges, K. M. Svore, and J. Gao. Ranking, boosting, and model adaptation. Technical Report MSR-TR-2008--109, Microsoft, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
        July 2016
        1296 pages
        ISBN:9781450340694
        DOI:10.1145/2911451

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 July 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader