skip to main content
10.1145/3184558.3188723acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Relevant Document Discovery for Fact-Checking Articles

Published: 23 April 2018 Publication History

Abstract

With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation. In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.

References

[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings ICLR 2017.
[2]
Sean Baird, Doug Sibley, and Yuxi Pan. 2017. Talos Targets Disinformation with Fake News Challenge Victory. deftempurl%https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html tempurl
[3]
Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and Noam Slonim. 2017. Stance Classification of Context-Dependent Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 1: Long Papers. 251--261.
[4]
Microsoft Bing. 2017. Bing adds Fact Check label in SERP to support the ClaimReview markup. https://blogs.bing.com/Webmaster-Blog/September-2017/Bing-adds-Fact-Check-label-in-SERP-to-support-the-ClaimReview-markup
[5]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 a. A large annotated corpus for learning natural language inference 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
[6]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 b. A large annotated corpus for learning natural language inference Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 632--642. deftempurl%http://aclweb.org/anthology/D/D15/D15--1075.pdf tempurl
[7]
Samuel R Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding ACL 2016.
[8]
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In ACL 2017.
[9]
Nick Craswell and Martin Szummer. 2007. Random Walks on the Click Graph. In SIGIR.
[10]
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine The Annals of Statistics, Volume 29, Number 5 (2001), 1189--1232.
[11]
FullFact. 2016. The State of Automated Factchecking.
[12]
Richard Gingras. 2016. Labeling fact-check articles in Google News. https://www.blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/
[13]
Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural Language Inference over Interaction Space. In arXiv:1709.04348.
[14]
Alan Greenblatt. 2017. The Future of Fact-Checking: Moving ahead in political accountability journalism. https://www.americanpressinstitute.org/publications/reports/white-papers/future-of-fact-checking/single-page/
[15]
Naeemul Hassan, Bill Adair, James T. Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. 2015. The Quest to Automate Fact-Checking. In Proceedings of the 2015 Computation
[16]
Journalism Symposium.
[17]
IFCN. {n. d.}. International Fact-Checking Network fact-checkers' code of principles. https://www.poynter.org/international-fact-checking-network-fact-checkers-code-principles
[18]
Justin Kosslyn and Cong Yu. 2017. Fact Check now available in Google Search and News around the world. https://www.blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/
[19]
Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents ICML 2014.
[20]
Michelle Ye Hee Lee. 2017. Fighting falsehoods around the world: A dispatch on the growing global fact-checking movement. https://www.washingtonpost.com/news/fact-checker/wp/2017/07/14/fighting-falsehoods-around-the-world-a-dispatch-on-the-global-fact-checking-movement/
[21]
Pablo N. Mendes, Max Jakob, Andres Garcia-Silva, and Christian Bizer. 2011. DBpedia Spotlight: Shedding Light on the Web of Documents Proceedings of the 7th International Conference on Semantic Systems (I-Semantics).
[22]
Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. 2010. Twitter Under Crisis: Can we trust what we RT. In Proceedings of the first workshop on social media analytics. ACM, 71--79.
[23]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 a. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR, 2013.
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 b. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013.
[25]
Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiao-Dan Zhu, and Colin Cherry. 2016. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16--17, 2016. 31--41. deftempurl%http://aclweb.org/anthology/S/S16/S16--1003.pdf tempurl
[26]
Ankur P. Parikh, Oscar T"ackström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference EMNLP.
[27]
Dean Pomerleau and Delip Rao. {n. d.}. Fake News Challenge. deftempurl%http://www.fakenewschallenge.org/ tempurl
[28]
Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying Misinformation in Microblogs Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27--31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL. 1589--1599. deftempurl%http://www.aclweb.org/anthology/D11--1147 tempurl
[29]
Dafna Shahaf and Carlos Guestrin. 2010. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 623--632.
[30]
Shuohang Wang and Jing Jiang. 2016. Learning Natural Language Inference with LSTM. In HLT-NAACL.
[31]
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward Computational Fact-Checking. PVLDB Vol. 7, 7 (2014), 589--600. deftempurl%http://www.vldb.org/pvldb/vol7/p589-wu.pdf tempurl
[32]
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2017. Computational Fact Checking through Query Perturbations. ACM Trans. Database Syst. Vol. 42, 1 (2017), 4:1--4:41.
[33]
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth discovery with multiple conflicting information providers on the web Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007. 1048--1052.

Cited By

View all
  • (2025)DRIVE: An adjustable parallel architecture based on evidence awareness for fake news detectionExpert Systems with Applications10.1016/j.eswa.2024.126043266(126043)Online publication date: Mar-2025
  • (2024)Beyond Text: Multimodal Credibility Assessment Approaches for Online User-Generated ContentACM Transactions on Intelligent Systems and Technology10.1145/367323615:5(1-33)Online publication date: 5-Nov-2024
  • (2023)Fake news stance detection using selective features and FakeNETPLOS ONE10.1371/journal.pone.028729818:7(e0287298)Online publication date: 31-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. claim-relevance discovery
  2. digital misinformation
  3. fact checking

Qualifiers

  • Research-article

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)288
  • Downloads (Last 6 weeks)53
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)DRIVE: An adjustable parallel architecture based on evidence awareness for fake news detectionExpert Systems with Applications10.1016/j.eswa.2024.126043266(126043)Online publication date: Mar-2025
  • (2024)Beyond Text: Multimodal Credibility Assessment Approaches for Online User-Generated ContentACM Transactions on Intelligent Systems and Technology10.1145/367323615:5(1-33)Online publication date: 5-Nov-2024
  • (2023)Fake news stance detection using selective features and FakeNETPLOS ONE10.1371/journal.pone.028729818:7(e0287298)Online publication date: 31-Jul-2023
  • (2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
  • (2022)BeyondFacts’22: 2nd International Workshop on Knowledge Graphs for Online Discourse AnalysisCompanion Proceedings of the Web Conference 202210.1145/3487553.3524863(423-425)Online publication date: 16-Aug-2022
  • (2022)Monant Medical Misinformation DatasetProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531726(2949-2959)Online publication date: 6-Jul-2022
  • (2022)Why does the president tweet this? Discovering reasons and contexts for politicians’ tweets from news articlesInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10289259:3Online publication date: 1-May-2022
  • (2022)Exploiting stance hierarchies for cost-sensitive stance detection of Web documentsJournal of Intelligent Information Systems10.1007/s10844-021-00642-z58:1(1-19)Online publication date: 1-Feb-2022
  • (2022)Studying effectiveness of Web search for fact checkingJournal of the Association for Information Science and Technology10.1002/asi.2457773:5(738-751)Online publication date: 1-Apr-2022
  • (2021)A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing: A SurveyACM Computing Surveys10.1145/347713855:1(1-33)Online publication date: 23-Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media