ABSTRACT
Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.
- Raju Balakrishnan and Subbarao Kambhampati 2011. SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement. Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM, 227--236. Google ScholarDigital Library
- Paul E. Black, Karen A. Scarfone, and Murugiah P. Souppaya (Eds.). 2008. Cyber Security Metrics and Measures. Wiley Handbook of Science and Technology for Homeland Security.Google Scholar
- Brian Brost, Ingemar J. Cox, Yevgeny Seldin, and Christina Lioma 2016natexlaba. An Improved Multileaving Algorithm for Online Ranker Evaluation Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016, Raffaele Perego, Fabrizio Sebastiani, Javed A. Aslam, Ian Ruthven, and Justin Zobel (Eds.). ACM, 745--748. Google ScholarDigital Library
- Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma 2016natexlabb. Multi-Dueling Bandits and Their Application to Online Ranker Evaluation Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016, Snehasis Mukhopadhyay, ChengXiang Zhai, Elisa Bertino, Fabio Crestani, Javed Mostafa, Jie Tang, Luo Si, Xiaofang Zhou, Yi Chang, Yunyao Li, and Parikshit Sondhi (Eds.). ACM, 2161--2166. Google ScholarDigital Library
- Chris Buckley and Ellen M. Voorhees 2004. Retrieval evaluation with incomplete information. SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25--29, 2004, Mark Sanderson, Kalervo Jarvelin, James Allan, and Peter Bruza (Eds.). ACM, 25--32. Google ScholarDigital Library
- Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009, David Wai-Lok Cheung, Il-Yeol Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin (Eds.). ACM, 621--630. Google ScholarDigital Library
- Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Click model-based information retrieval metrics. The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '13, Dublin, Ireland - July 28 - August 01, 2013, Gareth J. F. Jones, Paraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai (Eds.). ACM, 493--502. Google ScholarDigital Library
- Rob Ennals, Dan Byler, John Mark Agosta, and Barbara Rosario. 2010. What is disputed on the web?. In Proceedings of the 4th ACM Workshop on Information Credibility on the Web, WICOW 2010, Raleigh, North Carolina, USA, April 27, 2010, Katsumi Tanaka, Xiaofang Zhou, Min Zhang, and Adam Jatowt (Eds.). ACM, 67--74. Google ScholarDigital Library
- Katja Hofmann, Lihong Li, and Filip Radlinski 2016. Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 10, 1 (2016), 1--117. Google ScholarDigital Library
- Christopher Horn, Alisa Zhila, Alexander F. Gelbukh, Roman Kern, and Elisabeth Lex 2013. Using Factual Density to Measure Informativeness of Web Documents Proceedings of the 19th Nordic Conference of Computational Linguistics, NODALIDA 2013, May 22--24, 2013, Oslo University, Norway (Linköping Electronic Conference Proceedings), Stephan Oepen, Kristin Hagen, and Janne Bondi Johannessen (Eds.), Vol. Vol. 85. Linköping University Electronic Press, 227--238. http://www.ep.liu.se/ecp_article/index.en.aspx?issue=085;article=021Google Scholar
- Zhicong Huang, Alexandra Olteanu, and Karl Aberer. 2013. CredibleWeb: a platform for web credibility evaluation 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27 - May 2, 2013, Extended Abstracts, Wendy E. Mackay, Stephen A. Brewster, and Susanne Bødker (Eds.). ACM, 1887--1892. Google ScholarDigital Library
- Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarDigital Library
- Ravi Kumar and Sergei Vassilvitskii 2010. Generalized distances between rankings. In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26--30, 2010, Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti (Eds.). ACM, 571--580. Google ScholarDigital Library
- Elisabeth Lex, Inayat Khan, Horst Bischof, and Michael Granitzer 2014. Assessing the Quality of Web Content. CoRR Vol. abs/1406.3188 (2014). http://arxiv.org/abs/1406.3188Google Scholar
- Christina Lioma, Birger Larsen, Wei Lu, and Yong Huang. 2016. A study of factuality, objectivity and relevance: three desiderata in large-scale information retrieval?. In Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016, Shanghai, China, December 6--9, 2016, Ashiq Anjum and Xinghui Zhao (Eds.). ACM, 107--117. Google ScholarDigital Library
- Stefano Mizzaro. 2008. The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation?. In Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.), Vol. Vol. 4956. Springer, 642--646. Google ScholarDigital Library
- Meredith Ringel Morris, Scott Counts, Asta Roseway, Aaron Hoff, and Julia Schwarz. 2012. Tweeting is believing?: understanding microblog credibility perceptions CSCW '12 Computer Supported Cooperative Work, Seattle, WA, USA, February 11--15, 2012, Steven E. Poltrock, Carla Simone, Jonathan Grudin, Gloria Mark, and John Riedl (Eds.). ACM, 441--450. Google ScholarDigital Library
- Souneil Park, Seungwoo Kang, Sangyoung Chung, and Junehwa Song 2009. NewsCube: delivering multiple aspects of news to mitigate media bias Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, April 4--9, 2009, Dan R. Olsen Jr., Richard B. Arthur, Ken Hinckley, Meredith Ringel Morris, Scott E. Hudson, and Saul Greenberg (Eds.). ACM, 443--452. Google ScholarDigital Library
- Anne Schuth, Katja Hofmann, and Filip Radlinski. 2015. Predicting Search Satisfaction Metrics with Interleaved Comparisons Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo A. Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 463--472. Google ScholarDigital Library
- Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke 2016. Multileave Gradient Descent for Fast Online Learning to Rank Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22--25, 2016. 457--466. Google ScholarDigital Library
- Julia Schwarz and Meredith Ringel Morris 2011. Augmenting web pages and search results to support credibility assessment Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7--12, 2011, Desney S. Tan, Saleema Amershi, Bo Begole, Wendy A. Kellogg, and Manas Tungare (Eds.). ACM, 1245--1254. Google ScholarDigital Library
- C. J. Keith van Rijsbergen. 1974. Foundation of evaluation. Journal of Documentation Vol. 30, 4 (1974), 365--373.Google ScholarCross Ref
- Janyce Wiebe and Ellen Riloff 2011. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. IEEE Trans. Affective Computing Vol. 2, 4 (2011), 175--191. Google ScholarDigital Library
- Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson 2010. Expected browsing utility for web search evaluation Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, Jimmy Huang, Nick Koudas, Gareth J. F. Jones, Xindong Wu, Kevyn Collins-Thompson, and Aijun An (Eds.). ACM, 1561--1564. Google ScholarDigital Library
Index Terms
- Evaluation Measures for Relevance and Credibility in Ranked Lists
Recommendations
Current Status of the Evaluation of Information Retrieval
This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...
Sentence-based relevance flow analysis for high accuracy retrieval
Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond ...
Factors and effects of information credibility
ICEC '07: Proceedings of the ninth international conference on Electronic commerceWebsite success hinges on how credible the consumers consider the information on the website. Unless consumers believe the website's information is credible, they are not likely to be willing to act on the advice and will not develop loyalty to the ...
Comments