skip to main content
10.1145/3121050.3121072acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Evaluation Measures for Relevance and Credibility in Ranked Lists

Published:01 October 2017Publication History

ABSTRACT

Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.

References

  1. Raju Balakrishnan and Subbarao Kambhampati 2011. SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement. Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM, 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Paul E. Black, Karen A. Scarfone, and Murugiah P. Souppaya (Eds.). 2008. Cyber Security Metrics and Measures. Wiley Handbook of Science and Technology for Homeland Security.Google ScholarGoogle Scholar
  3. Brian Brost, Ingemar J. Cox, Yevgeny Seldin, and Christina Lioma 2016natexlaba. An Improved Multileaving Algorithm for Online Ranker Evaluation Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016, Raffaele Perego, Fabrizio Sebastiani, Javed A. Aslam, Ian Ruthven, and Justin Zobel (Eds.). ACM, 745--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma 2016natexlabb. Multi-Dueling Bandits and Their Application to Online Ranker Evaluation Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016, Snehasis Mukhopadhyay, ChengXiang Zhai, Elisa Bertino, Fabio Crestani, Javed Mostafa, Jie Tang, Luo Si, Xiaofang Zhou, Yi Chang, Yunyao Li, and Parikshit Sondhi (Eds.). ACM, 2161--2166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chris Buckley and Ellen M. Voorhees 2004. Retrieval evaluation with incomplete information. SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25--29, 2004, Mark Sanderson, Kalervo Jarvelin, James Allan, and Peter Bruza (Eds.). ACM, 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009, David Wai-Lok Cheung, Il-Yeol Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin (Eds.). ACM, 621--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Click model-based information retrieval metrics. The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '13, Dublin, Ireland - July 28 - August 01, 2013, Gareth J. F. Jones, Paraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai (Eds.). ACM, 493--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rob Ennals, Dan Byler, John Mark Agosta, and Barbara Rosario. 2010. What is disputed on the web?. In Proceedings of the 4th ACM Workshop on Information Credibility on the Web, WICOW 2010, Raleigh, North Carolina, USA, April 27, 2010, Katsumi Tanaka, Xiaofang Zhou, Min Zhang, and Adam Jatowt (Eds.). ACM, 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Katja Hofmann, Lihong Li, and Filip Radlinski 2016. Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 10, 1 (2016), 1--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christopher Horn, Alisa Zhila, Alexander F. Gelbukh, Roman Kern, and Elisabeth Lex 2013. Using Factual Density to Measure Informativeness of Web Documents Proceedings of the 19th Nordic Conference of Computational Linguistics, NODALIDA 2013, May 22--24, 2013, Oslo University, Norway (Linköping Electronic Conference Proceedings), Stephan Oepen, Kristin Hagen, and Janne Bondi Johannessen (Eds.), Vol. Vol. 85. Linköping University Electronic Press, 227--238. http://www.ep.liu.se/ecp_article/index.en.aspx?issue=085;article=021Google ScholarGoogle Scholar
  11. Zhicong Huang, Alexandra Olteanu, and Karl Aberer. 2013. CredibleWeb: a platform for web credibility evaluation 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27 - May 2, 2013, Extended Abstracts, Wendy E. Mackay, Stephen A. Brewster, and Susanne Bødker (Eds.). ACM, 1887--1892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ravi Kumar and Sergei Vassilvitskii 2010. Generalized distances between rankings. In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26--30, 2010, Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti (Eds.). ACM, 571--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Elisabeth Lex, Inayat Khan, Horst Bischof, and Michael Granitzer 2014. Assessing the Quality of Web Content. CoRR Vol. abs/1406.3188 (2014). http://arxiv.org/abs/1406.3188Google ScholarGoogle Scholar
  15. Christina Lioma, Birger Larsen, Wei Lu, and Yong Huang. 2016. A study of factuality, objectivity and relevance: three desiderata in large-scale information retrieval?. In Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016, Shanghai, China, December 6--9, 2016, Ashiq Anjum and Xinghui Zhao (Eds.). ACM, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Stefano Mizzaro. 2008. The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation?. In Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.), Vol. Vol. 4956. Springer, 642--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Meredith Ringel Morris, Scott Counts, Asta Roseway, Aaron Hoff, and Julia Schwarz. 2012. Tweeting is believing?: understanding microblog credibility perceptions CSCW '12 Computer Supported Cooperative Work, Seattle, WA, USA, February 11--15, 2012, Steven E. Poltrock, Carla Simone, Jonathan Grudin, Gloria Mark, and John Riedl (Eds.). ACM, 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Souneil Park, Seungwoo Kang, Sangyoung Chung, and Junehwa Song 2009. NewsCube: delivering multiple aspects of news to mitigate media bias Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, April 4--9, 2009, Dan R. Olsen Jr., Richard B. Arthur, Ken Hinckley, Meredith Ringel Morris, Scott E. Hudson, and Saul Greenberg (Eds.). ACM, 443--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anne Schuth, Katja Hofmann, and Filip Radlinski. 2015. Predicting Search Satisfaction Metrics with Interleaved Comparisons Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo A. Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 463--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke 2016. Multileave Gradient Descent for Fast Online Learning to Rank Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22--25, 2016. 457--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Julia Schwarz and Meredith Ringel Morris 2011. Augmenting web pages and search results to support credibility assessment Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7--12, 2011, Desney S. Tan, Saleema Amershi, Bo Begole, Wendy A. Kellogg, and Manas Tungare (Eds.). ACM, 1245--1254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. J. Keith van Rijsbergen. 1974. Foundation of evaluation. Journal of Documentation Vol. 30, 4 (1974), 365--373.Google ScholarGoogle ScholarCross RefCross Ref
  23. Janyce Wiebe and Ellen Riloff 2011. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. IEEE Trans. Affective Computing Vol. 2, 4 (2011), 175--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson 2010. Expected browsing utility for web search evaluation Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, Jimmy Huang, Nick Koudas, Gareth J. F. Jones, Xindong Wu, Kevyn Collins-Thompson, and Aijun An (Eds.). ACM, 1561--1564. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluation Measures for Relevance and Credibility in Ranked Lists

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
        October 2017
        348 pages
        ISBN:9781450344906
        DOI:10.1145/3121050

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICTIR '17 Paper Acceptance Rate27of54submissions,50%Overall Acceptance Rate209of482submissions,43%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader