ABSTRACT
Verbal descriptions over the numerical relationships among some objective measures widely exist in the published documents on Web, especially in the financial fields. However, due to large volumes of documents and limited time for manual cross-check, these claims might be inconsistent with the original structured data of the related indicators even after official publishing. Such errors can seriously affect investors' assessment of the company and may cause them to undervalue the firm even if the mistakes are made unintentionally instead of deliberately. It creates an opportunity for automated Numerical Cross-Checking (NCC) systems. This paper introduces the key component of such a system, formula extractor, which extracts formulas from verbal descriptions of numerical claims. Specifically, we formulate this task as a DAG-structure prediction problem, and propose an iterative relation extraction model to address it. In our model, we apply a bi-directional LSTM followed by a DAG-structured LSTM to extract formulas layer by layer iteratively. Then, the model is built using a human-labeled dataset of tens of thousands of sentences. The evaluation shows that this model is effective in formula extraction. At the relation level, the model achieves a 97.78% precision and 98.33% recall. At the sentence level, the predictions over 92.02% of sentences are perfect. Overall, the project for NCC has received wide recognition in the Chinese financial community.
- Charu C Aggarwal and ChengXiang Zhai. 2012. Mining text data. Springer Science & Business Media. Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR (2015).Google Scholar
- Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs.. In EMNLP.Google Scholar
- Phil Blunsom, Nando de Freitas, Edward Grefenstette, and Karl Moritz Hermann. 2014. A deep architecture for semantic parsing. In ACL Workshop on Semantic Parsing.Google Scholar
- Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding. In ACL.Google Scholar
- Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder. In ACL.Google Scholar
- Preeti Choudhary, Kenneth J Merkley, and Katherine Schipper. 2016. Qualitative characteristics of financial reporting errors deemed immaterial by managers. (2016).Google Scholar
- Vivian W Fang, Allen H Huang, and Wenyu Wang. 2017. Imperfect accounting and reporting bias. Journal of Accounting Research (2017).Google Scholar
- Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU).Google Scholar
- Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster. In KDD. Google ScholarDigital Library
- Daniel Hershcovich, Omri Abend, and Ari Rappoport. 2017. A Transition-Based Directed Acyclic Graph Parser for UCCA. ACL (2017).Google Scholar
- Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL (2016).Google Scholar
- Rik Koncel-Kedziorski, Hannaneh Hajishirzi, Ashish Sabharwal, Oren Etzioni, and Siena Dumas Ang. 2015. Parsing algebraic word problems into equations. ACL (2015).Google Scholar
- Alastair Lawrence. 2013. Individual investors and financial disclosure. Journal of Accounting and Economics (2013).Google Scholar
- Chen Liang, Jonathan Berant, Quoc Le, Kenneth D Forbus, and Ni Lao. 2017. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. ACL (2017).Google Scholar
- Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, and Peter Norvig. 2017. Deep Learning with Dynamic Computation Graphs. ICLR (2017).Google Scholar
- Aman Madaan, Ashish Mittal, G Ramakrishnan Mausam, Ganesh Ramakrishnan, and Sunita Sarawagi. 2016. Numerical Relation Extraction with Minimal Supervision.. In AAAI. Google ScholarDigital Library
- Mike Mintz, Steven Bills, Rion Snow, and Jurafsky Dan. 2009. Distant supervision for relation extraction without labeled data. In ACL. Google ScholarDigital Library
- Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. ACL (2016).Google Scholar
- Michèle B. Nuijten, Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp, and Jelte M. Wicherts. 2016. The prevalence of statistical reporting errors in psychology (1985--2013). Behavior Research Methods (2016).Google Scholar
- Subhro Roy, Shyam Upadhyay, and Dan Roth. 2016. Equation Parsing: Mapping Sentences to Grounded Equations. In EMNLP.Google Scholar
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing (1997). Google ScholarDigital Library
- Bing Shuai, Zhen Zuo, Bing Wang, and Gang Wang. 2016. Dag-recurrent neural networks for scene labeling. In CVPR.Google Scholar
- Richard Socher, Cliff C Lin, Chris Manning, and Andrew Y Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In ICML. Google ScholarDigital Library
- Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. ACL (2015).Google Scholar
- Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv (2016).Google Scholar
- Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. arXiv (2012).Google Scholar
- Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2016. DAG-Structured Long Short-Term Memory for Semantic Compositionality. In NAACL HLT.Google Scholar
- Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In ICML. Google ScholarDigital Library
Index Terms
- Towards Automatic Numerical Cross-Checking: Extracting Formulas from Text
Recommendations
Experiments on pattern-based relation learning
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementRelation extraction is the task of extracting semantic relations - such as synonymy or hypernymy - between word pairs from corpus data. Past work in relation extraction has concentrated on manually creating templates to use in directly extracting word ...
Towards Large-Scale Unsupervised Relation Extraction from the Web
The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on ...
Review of entity relation extraction
In today’s big data era, there are a large number of unstructured information resources on the web. Natural language processing researchers have been working hard to figure out how to extract useful information from them. Entity Relation Extraction is a ...
Comments