ABSTRACT
We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, 2015, Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations .Google Scholar
- Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni, 2007, Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2670--2676. Google ScholarDigital Library
- Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. Della Pietra, and Jennifer C. Lai. 1992, An Estimate of an Upper Bound for the Entropy of English, Computational Linguistics, Vol. 18, 1 (March 1992), 31--40. Google ScholarDigital Library
- Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li, 2017, Faithful to the Original: Fact Aware Neural Abstractive Summarization, CoRR, Vol. abs/1711.04434 (2017). arxiv: 1711.04434 http://arxiv.org/abs/1711.04434Google Scholar
- Jason Chiu and Eric Nichols. 2016, Named Entity Recognition with Bidirectional LSTM-CNNs, Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 357--370.Google ScholarCross Ref
- Kevin Clark and Christopher D. Manning. 2016, Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 643--653.Google Scholar
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning, 2005, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), 363--370. Google ScholarDigital Library
- Eduard Hovy, Chin-Yew Lin, Liang Zhou, and Junichi Fukumoto. 2006, Automated Summarization Evaluation with Basic Elements. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), European Language Resources Association (ELRA).Google Scholar
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer, 2016, Neural Architectures for Named Entity Recognition, In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270.Google ScholarCross Ref
- Alon Lavie and Abhaya Agarwal. 2007, Meteor: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. In Proceedings of the Second Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, USA, 228--231. Google ScholarDigital Library
- Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011, Stanford's Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In In Proceedings of the CoNLL-2011 Shared Task . Google ScholarDigital Library
- Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky, 2017, Adversarial Learning for Neural Dialogue Generation. In Conference on Empirical Methods in Natural Language Processing. 2157--2169.Google Scholar
- Chin-Yew Lin, 2004, ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Stan Szpakowicz Marie-Francine Moens (Ed.). Association for Computational Linguistics, Barcelona, Spain, 74--81.Google Scholar
- Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016, Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2124--2133.Google ScholarCross Ref
- Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Łukasz Kaiser, and Noam Shazeer. 2018, Generating Wikipedia by Summarizing Long Sequences. In Proceedings of the 2018 International Conference on Learning Representations .Google Scholar
- Andrew Mccallum and David Jensen. 2003, A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models. In In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data .Google Scholar
- Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky, 2009, Distant Supervision for Relation Extraction Without Labeled Data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1003--1011. Google ScholarDigital Library
- Makoto Miwa and Mohit Bansal. 2016, End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1105--1116.Google ScholarCross Ref
- Makoto Miwa and Yutaka Sasaki. 2014, Modeling Joint Entity and Relation Extraction with Table Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1858--1869.Google ScholarCross Ref
- Thahir P. Mohamed, Estevam R. Hruschka, Jr., and Tom M. Mitchell. 2011, Discovering Relations Between Noun Categories. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 1447--1455. Google ScholarDigital Library
- Ramesh Nallapati, Bowen Zhou, Cícero Nogueira dos Santos, Çaglar Gülçehre, and Bing Xiang, 2016, Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of the 2016 SIGNLL Conference on Computational Natural Language Learning .Google ScholarCross Ref
- Ani Nenkova and Rebecca J. Passonneau. 2004, Evaluating Content Selection in Summarization: The Pyramid Method.. In Proceedings of the 2005 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 145--152.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, 2002, BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 311--318. Google ScholarDigital Library
- Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher Manning, 2010, A Multi-pass Sieve for Coreference Resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 492--501. Google ScholarDigital Library
- Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. 2013, The Life and Death of Discourse Entities: Identifying Singleton Mentions. In Proceedings of the 2013 North American Chapter of the Association for Computational Linguistics .Google Scholar
- Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013, Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 74--84.Google Scholar
- Alexander M. Rush, Sumit Chopra, and Jason Weston, 2015, A Neural Attention Model for Abstractive Sentence Summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing .Google ScholarCross Ref
- Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, and Aaron C. Courville, 2017. Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation. In Proceedings of the 2017 AAAI Conference on Artificial Intelligence. 3288--3294. Google ScholarDigital Library
- Iulian Vlad Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Mudumba, Alexandre de Bré bisson, Jose Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, and Yoshua Bengio, 2017. A Deep Reinforcement Learning Chatbot, CoRR, Vol. abs/1709.02349 (2017). arxiv: 1709.02349 http://arxiv.org/abs/1709.02349Google Scholar
- Noam Shazeer and Mitchell Stern. 2018, Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, 4603--4611.Google Scholar
- Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, and Animashree Anandkumar. 2017, Deep Active Learning for Named Entity Recognition, CoRR, Vol. abs/1707.05928 (2017). arxiv: 1707.05928 http://arxiv.org/abs/1707.05928Google ScholarCross Ref
- Daniil Sorokin and Iryna Gurevych. 2017, Context-Aware Representations for Knowledge Base Relation Extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1784--1789.Google ScholarCross Ref
- Josef Steinberger and Karel Jezek. 2009, Evaluation Measures for Text Summarization, Computing and Informatics, Vol. 28 (2009), 251--275.Google Scholar
- Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012, Multi-instance Multi-label Learning for Relation Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, USA, 455--465. Google ScholarDigital Library
- Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francc ois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018, Tensor2Tensor for Neural Machine Translation, arXiv preprint, Vol. arXiv:1803.07416 (2018), http://arxiv.org/abs/1803.07416Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017, Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 5998--6008. Google ScholarDigital Library
- Denny Vrandevcić and Markus Krötzsch. 2014, Wikidata: A Free Collaborative Knowledgebase, Commun. ACM, Vol. 57 (2014), 78--85. Issue 10. Google ScholarDigital Library
- Sam Wiseman, Stuart M. Shieber, and Alexander M. Rush, 2017, Challenges in Data-to-Document Generation, CoRR, Vol. abs/1707.08052 (2017). arxiv: 1707.08052 http://arxiv.org/abs/1707.08052Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016, Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, CoRR, Vol. abs/1609.08144 (2016). arxiv: 1609.08144 http://arxiv.org/abs/1609.08144Google Scholar
- Hengtong Zhang, Yaliang Li, Fenglong Ma, Jing Gao, and Lu Su. 2018, TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 2729--2737. Google ScholarDigital Library
Index Terms
- Assessing The Factual Accuracy of Generated Text
Recommendations
FAR-ASS: Fact-aware reinforced abstractive sentence summarization
Highlights- For natural language generation tasks, fact fabrication is a serious problem.
- An automatic fact extraction scheme leveraging open information extraction and dependency parser tools to extract the structured fact tuples.
- A factual ...
AbstractAutomatic summarization systems provide an effective solution to today's unprecedented growth of textual data. For real-world tasks, such as data mining and information retrieval, the factual correctness of generated summary is critical. However, ...
Evaluating factual accuracy in complex data-to-text
AbstractIt is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means ...
Highlights- Factual accuracy problems limit the usefulness of neural solutions for complex data-to-text.
Reducing the Need for Manual Annotated Datasets in Aspect Sentiment Classification by Transfer Learning and Weak-Supervision
Agents and Artificial IntelligenceAbstractUsers’ opinions can be greatly beneficial in developing and providing products and services and improving marketing techniques for customer recommendation and retention. For this reason, sentiment analysis algorithms that automatically extract ...
Comments