research-article

Evaluation Measures for Relevance and Credibility in Ranked Lists

Authors:
Christina Lioma

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Jakob Grue Simonsen

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Birger Larsen

University of Aalborg in Copenhagen, Copenhagen, Denmark

University of Aalborg in Copenhagen, Copenhagen, Denmark
View Profile

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information RetrievalOctober 2017Pages 91–98https://doi.org/10.1145/3121050.3121072

Published:01 October 2017Publication History

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

Pages 91–98

ABSTRACT

Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.

References

Raju Balakrishnan and Subbarao Kambhampati 2011. SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement. Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM, 227--236. Google ScholarDigital Library
Paul E. Black, Karen A. Scarfone, and Murugiah P. Souppaya (Eds.). 2008. Cyber Security Metrics and Measures. Wiley Handbook of Science and Technology for Homeland Security.Google Scholar
Brian Brost, Ingemar J. Cox, Yevgeny Seldin, and Christina Lioma 2016natexlaba. An Improved Multileaving Algorithm for Online Ranker Evaluation Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016, Raffaele Perego, Fabrizio Sebastiani, Javed A. Aslam, Ian Ruthven, and Justin Zobel (Eds.). ACM, 745--748. Google ScholarDigital Library
Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma 2016natexlabb. Multi-Dueling Bandits and Their Application to Online Ranker Evaluation Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24--28, 2016, Snehasis Mukhopadhyay, ChengXiang Zhai, Elisa Bertino, Fabio Crestani, Javed Mostafa, Jie Tang, Luo Si, Xiaofang Zhou, Yi Chang, Yunyao Li, and Parikshit Sondhi (Eds.). ACM, 2161--2166. Google ScholarDigital Library
Chris Buckley and Ellen M. Voorhees 2004. Retrieval evaluation with incomplete information. SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25--29, 2004, Mark Sanderson, Kalervo Jarvelin, James Allan, and Peter Bruza (Eds.). ACM, 25--32. Google ScholarDigital Library
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009, David Wai-Lok Cheung, Il-Yeol Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin (Eds.). ACM, 621--630. Google ScholarDigital Library
Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Click model-based information retrieval metrics. The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '13, Dublin, Ireland - July 28 - August 01, 2013, Gareth J. F. Jones, Paraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai (Eds.). ACM, 493--502. Google ScholarDigital Library
Rob Ennals, Dan Byler, John Mark Agosta, and Barbara Rosario. 2010. What is disputed on the web?. In Proceedings of the 4th ACM Workshop on Information Credibility on the Web, WICOW 2010, Raleigh, North Carolina, USA, April 27, 2010, Katsumi Tanaka, Xiaofang Zhou, Min Zhang, and Adam Jatowt (Eds.). ACM, 67--74. Google ScholarDigital Library
Katja Hofmann, Lihong Li, and Filip Radlinski 2016. Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 10, 1 (2016), 1--117. Google ScholarDigital Library
Christopher Horn, Alisa Zhila, Alexander F. Gelbukh, Roman Kern, and Elisabeth Lex 2013. Using Factual Density to Measure Informativeness of Web Documents Proceedings of the 19th Nordic Conference of Computational Linguistics, NODALIDA 2013, May 22--24, 2013, Oslo University, Norway (Linköping Electronic Conference Proceedings), Stephan Oepen, Kristin Hagen, and Janne Bondi Johannessen (Eds.), Vol. Vol. 85. Linköping University Electronic Press, 227--238. http://www.ep.liu.se/ecp_article/index.en.aspx?issue=085;article=021Google Scholar
Zhicong Huang, Alexandra Olteanu, and Karl Aberer. 2013. CredibleWeb: a platform for web credibility evaluation 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27 - May 2, 2013, Extended Abstracts, Wendy E. Mackay, Stephen A. Brewster, and Susanne Bødker (Eds.). ACM, 1887--1892. Google ScholarDigital Library
Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446. Google ScholarDigital Library
Ravi Kumar and Sergei Vassilvitskii 2010. Generalized distances between rankings. In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26--30, 2010, Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti (Eds.). ACM, 571--580. Google ScholarDigital Library
Elisabeth Lex, Inayat Khan, Horst Bischof, and Michael Granitzer 2014. Assessing the Quality of Web Content. CoRR Vol. abs/1406.3188 (2014). http://arxiv.org/abs/1406.3188Google Scholar
Christina Lioma, Birger Larsen, Wei Lu, and Yong Huang. 2016. A study of factuality, objectivity and relevance: three desiderata in large-scale information retrieval?. In Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016, Shanghai, China, December 6--9, 2016, Ashiq Anjum and Xinghui Zhao (Eds.). ACM, 107--117. Google ScholarDigital Library
Stefano Mizzaro. 2008. The Good, the Bad, the Difficult, and the Easy: Something Wrong with Information Retrieval Evaluation?. In Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.), Vol. Vol. 4956. Springer, 642--646. Google ScholarDigital Library
Meredith Ringel Morris, Scott Counts, Asta Roseway, Aaron Hoff, and Julia Schwarz. 2012. Tweeting is believing?: understanding microblog credibility perceptions CSCW '12 Computer Supported Cooperative Work, Seattle, WA, USA, February 11--15, 2012, Steven E. Poltrock, Carla Simone, Jonathan Grudin, Gloria Mark, and John Riedl (Eds.). ACM, 441--450. Google ScholarDigital Library
Souneil Park, Seungwoo Kang, Sangyoung Chung, and Junehwa Song 2009. NewsCube: delivering multiple aspects of news to mitigate media bias Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, April 4--9, 2009, Dan R. Olsen Jr., Richard B. Arthur, Ken Hinckley, Meredith Ringel Morris, Scott E. Hudson, and Saul Greenberg (Eds.). ACM, 443--452. Google ScholarDigital Library
Anne Schuth, Katja Hofmann, and Filip Radlinski. 2015. Predicting Search Satisfaction Metrics with Interleaved Comparisons Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo A. Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 463--472. Google ScholarDigital Library
Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke 2016. Multileave Gradient Descent for Fast Online Learning to Rank Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22--25, 2016. 457--466. Google ScholarDigital Library
Julia Schwarz and Meredith Ringel Morris 2011. Augmenting web pages and search results to support credibility assessment Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7--12, 2011, Desney S. Tan, Saleema Amershi, Bo Begole, Wendy A. Kellogg, and Manas Tungare (Eds.). ACM, 1245--1254. Google ScholarDigital Library
C. J. Keith van Rijsbergen. 1974. Foundation of evaluation. Journal of Documentation Vol. 30, 4 (1974), 365--373.Google ScholarCross Ref
Janyce Wiebe and Ellen Riloff 2011. Finding Mutual Benefit between Subjectivity Analysis and Information Extraction. IEEE Trans. Affective Computing Vol. 2, 4 (2011), 175--191. Google ScholarDigital Library
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson 2010. Expected browsing utility for web search evaluation Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26--30, 2010, Jimmy Huang, Nick Koudas, Gareth J. F. Jones, Xindong Wu, Kevyn Collins-Thompson, and Aijun An (Eds.). ACM, 1561--1564. Google ScholarDigital Library

Index Terms

Evaluation Measures for Relevance and Credibility in Ranked Lists
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
      2. Test collections

Recommendations

Current Status of the Evaluation of Information Retrieval

This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...
Read More
Sentence-based relevance flow analysis for high accuracy retrieval

Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond ...
Read More
Factors and effects of information credibility
ICEC '07: Proceedings of the ninth international conference on Electronic commerce

Website success hinges on how credible the consumers consider the information on the website. Unless consumers believe the website's information is credible, they are not likely to be willing to act on the advice and will not develop loyalty to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
October 2017
348 pages
ISBN:9781450344906
DOI:10.1145/3121050
General Chairs:
Jaap Kamps
University of Amsterdam, The Netherlands
,
Evangelos Kanoulas
University of Amsterdam, The Netherlands
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Program Chairs:
Hui Fang
University of Delaware, USA
,
Emine Yilmaz
University College London, UK
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
credibility
evaluation measures
relevance
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '17 Paper Acceptance Rate27of54submissions,50%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 296
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluation Measures for Relevance and Credibility in Ranked Lists

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Current Status of the Evaluation of Information Retrieval

Sentence-based relevance flow analysis for high accuracy retrieval

Factors and effects of information credibility