skip to main content
10.1145/3178876.3186166acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Towards Automatic Numerical Cross-Checking: Extracting Formulas from Text

Authors Info & Claims
Published:10 April 2018Publication History

ABSTRACT

Verbal descriptions over the numerical relationships among some objective measures widely exist in the published documents on Web, especially in the financial fields. However, due to large volumes of documents and limited time for manual cross-check, these claims might be inconsistent with the original structured data of the related indicators even after official publishing. Such errors can seriously affect investors' assessment of the company and may cause them to undervalue the firm even if the mistakes are made unintentionally instead of deliberately. It creates an opportunity for automated Numerical Cross-Checking (NCC) systems. This paper introduces the key component of such a system, formula extractor, which extracts formulas from verbal descriptions of numerical claims. Specifically, we formulate this task as a DAG-structure prediction problem, and propose an iterative relation extraction model to address it. In our model, we apply a bi-directional LSTM followed by a DAG-structured LSTM to extract formulas layer by layer iteratively. Then, the model is built using a human-labeled dataset of tens of thousands of sentences. The evaluation shows that this model is effective in formula extraction. At the relation level, the model achieves a 97.78% precision and 98.33% recall. At the sentence level, the predictions over 92.02% of sentences are perfect. Overall, the project for NCC has received wide recognition in the Chinese financial community.

References

  1. Charu C Aggarwal and ChengXiang Zhai. 2012. Mining text data. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR (2015).Google ScholarGoogle Scholar
  3. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs.. In EMNLP.Google ScholarGoogle Scholar
  4. Phil Blunsom, Nando de Freitas, Edward Grefenstette, and Karl Moritz Hermann. 2014. A deep architecture for semantic parsing. In ACL Workshop on Semantic Parsing.Google ScholarGoogle Scholar
  5. Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding. In ACL.Google ScholarGoogle Scholar
  6. Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder. In ACL.Google ScholarGoogle Scholar
  7. Preeti Choudhary, Kenneth J Merkley, and Katherine Schipper. 2016. Qualitative characteristics of financial reporting errors deemed immaterial by managers. (2016).Google ScholarGoogle Scholar
  8. Vivian W Fang, Allen H Huang, and Wenyu Wang. 2017. Imperfect accounting and reporting bias. Journal of Accounting Research (2017).Google ScholarGoogle Scholar
  9. Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU).Google ScholarGoogle Scholar
  10. Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel Hershcovich, Omri Abend, and Ari Rappoport. 2017. A Transition-Based Directed Acyclic Graph Parser for UCCA. ACL (2017).Google ScholarGoogle Scholar
  12. Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL (2016).Google ScholarGoogle Scholar
  13. Rik Koncel-Kedziorski, Hannaneh Hajishirzi, Ashish Sabharwal, Oren Etzioni, and Siena Dumas Ang. 2015. Parsing algebraic word problems into equations. ACL (2015).Google ScholarGoogle Scholar
  14. Alastair Lawrence. 2013. Individual investors and financial disclosure. Journal of Accounting and Economics (2013).Google ScholarGoogle Scholar
  15. Chen Liang, Jonathan Berant, Quoc Le, Kenneth D Forbus, and Ni Lao. 2017. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. ACL (2017).Google ScholarGoogle Scholar
  16. Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, and Peter Norvig. 2017. Deep Learning with Dynamic Computation Graphs. ICLR (2017).Google ScholarGoogle Scholar
  17. Aman Madaan, Ashish Mittal, G Ramakrishnan Mausam, Ganesh Ramakrishnan, and Sunita Sarawagi. 2016. Numerical Relation Extraction with Minimal Supervision.. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mike Mintz, Steven Bills, Rion Snow, and Jurafsky Dan. 2009. Distant supervision for relation extraction without labeled data. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. ACL (2016).Google ScholarGoogle Scholar
  20. Michèle B. Nuijten, Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp, and Jelte M. Wicherts. 2016. The prevalence of statistical reporting errors in psychology (1985--2013). Behavior Research Methods (2016).Google ScholarGoogle Scholar
  21. Subhro Roy, Shyam Upadhyay, and Dan Roth. 2016. Equation Parsing: Mapping Sentences to Grounded Equations. In EMNLP.Google ScholarGoogle Scholar
  22. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bing Shuai, Zhen Zuo, Bing Wang, and Gang Wang. 2016. Dag-recurrent neural networks for scene labeling. In CVPR.Google ScholarGoogle Scholar
  24. Richard Socher, Cliff C Lin, Chris Manning, and Andrew Y Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. ACL (2015).Google ScholarGoogle Scholar
  26. Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv (2016).Google ScholarGoogle Scholar
  27. Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. arXiv (2012).Google ScholarGoogle Scholar
  28. Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2016. DAG-Structured Long Short-Term Memory for Semantic Compositionality. In NAACL HLT.Google ScholarGoogle Scholar
  29. Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Automatic Numerical Cross-Checking: Extracting Formulas from Text

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '18: Proceedings of the 2018 World Wide Web Conference
          April 2018
          2000 pages
          ISBN:9781450356398

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          International World Wide Web Conferences Steering Committee

          Republic and Canton of Geneva, Switzerland

          Publication History

          • Published: 10 April 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format