ABSTRACT
In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains 48,248 sentences and the corresponding facts in the SAOKE format labeled by crowdsourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequence-to-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation By Jointly Learning To Align and Translate. In Proceedings of ICLR.Google Scholar
- Michele Banko, Mj Cafarella, and Stephen Soderland. 2007. Open information extraction for the web. In IJCAI. 2670--2676. Google ScholarDigital Library
- Qingqing Cai and Alexander Yates. 2013. Large-scale Semantic Parsing via Schema Matching and Lexicon Extension. In Proceedings of the 51st Annual Meeting of ACL. 423--433.Google Scholar
- Kaushik Chakrabarti, Surajit Chaudhuri, Tao Cheng, and Dong Xin. 2011. Entity-tagger: automatically tagging entities with descriptive phrases. In Proceedings of the 20th International Conference Companion on WWW. 19--20. Google ScholarDigital Library
- Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: A Chinese Language Technology Platform. In Proceedings of COLING. 13--16. Google ScholarDigital Library
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on EMNLP. 1724--1734.Google ScholarCross Ref
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling. In Proceedings of the sixth International Conference on Knowledge Capture. 113--120. Google ScholarDigital Library
- Janara Christensen, Mausam, Stephen Soderland, Oren Etzioni, Mausam, Stephen Soderland, and Oren Etzioni. 2013. Towards Coherent Multi-Document Summarization. In Proceedings of the 2013 Conference of NAACL: HLT. 1163--1173.Google Scholar
- Janara Christensen, Stephen Soderland, and Gagan Bansal. 2014. Hierarchical Summarization: Scaling Up Multi-Document Summarization. In Proceedings of the 52nd Annual Meeting of ACL. 902--912.Google ScholarCross Ref
- Luciano Del Corro and Rainer Gemulla. 2013. ClausIE: Clause-Based Open Information Extraction. In Proceedings of the 22nd International Conference on WWW. 355--366. Google ScholarDigital Library
- Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Attention. In In Proceedings of the Annual Meeting of ACL. 33--43. arXiv:1601.01280Google ScholarCross Ref
- Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. 2011. Open information extraction: The second generation. In Proceed- ings of IJCAI. 3--10. Google ScholarDigital Library
- Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the Conference on EMNLP. 1535--1545. Google ScholarDigital Library
- Anthony Fader, Luke S. Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering Over Curated and Extracted Knowledge Bases. In Proceedings of the 20th ACM SIGKDD. 1156--1165. Google ScholarDigital Library
- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of ACL. 1631--1640.Google Scholar
- Rahul Gupta and A. Halevy. 2014. Biperpedia: An Ontology for Search Applica- tions. In Proceedings of the VLDB Endowment. 505--516. Google ScholarDigital Library
- Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep Semantic Role Labeling: What Works and What's Next. In Proceedings of the 55th Annual Meeting of the ACL. 473--483.Google ScholarCross Ref
- Luheng He, Mike Lewis, and Luke Zettlemoyer. 2015. Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language. In Proceedings of the 2015 Conference on EMNLP. 643--653.Google ScholarCross Ref
- Marti A. Hearst. 1992. Automatic Acquisition of Hyponyms ftom Large Text Corpora. In Proceedings of the 14th conference on Computational Linguistics. 23--28. Google ScholarDigital Library
- Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. 2012. Overview of mini-batch gradient descent. Technical Report.Google Scholar
- Nanda Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL: Interactive Poster and Demonstration Sessions. Google ScholarDigital Library
- Rohit J. Kate, Yuk Wah, and Wong Raymond. 2005. Learning to Transform Natural to Formal Languages. In Proceedings of the 20th AAAI. 1062--1068. Google ScholarDigital Library
- Tushar Khot, Ashish Sabharwal, and Peter Clark. 2017. Answering Complex Questions Using Open Information Extraction. In Proceedings of the 55th Annual Meeting of the ACL. 311--316. arXiv:1704.05572Google ScholarCross Ref
- Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. 2013. Scaling Semantic Parsers with On-the-fly Ontology Matching. In Proceedings of the 2013 Conference on EMNLP. 1545--1556.Google Scholar
- Jinyang Li, Chengyu Wang, Xiaofeng He, Rong Zhang, and Ming Gao. 2015. User Generated Content Oriented Chinese Taxonomy Construction. In Lecture Notes in Computer Science. Vol. 9313. 623--634.Google Scholar
- Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of ACL. 2124--2133.Google ScholarCross Ref
- Christopher D. Manning, John Bauer, Jenny Finkel, Steven J Bethard, Mihai Surdeanu, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of ACL: System Demon- strations. 55--60. arXiv:arXiv:1011.1669v3Google ScholarCross Ref
- Mausam. 2016. Open Information Extraction Systems and Downstream Applications. In Proceedings of the 25th IJCAI. 4074--4077. Google ScholarDigital Library
- Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP, Vol. 2. 1003. Google ScholarDigital Library
- Makoto Miwa and Mohit Bansal. 2016. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of ACL. 1105--1116.Google ScholarCross Ref
- Harinder Pal and Mausam. 2016. Demonyms and Compound Relational Nouns in Nominal Open IE. In Proceedings of the 5th Workshop on AKBC. 35--39.Google ScholarCross Ref
- Likun Qiu and Yue Zhang. 2014. ZORE : A Syntax-based System for Chinese Open Relation Extraction. In Proceedings of the 2014 Conference on EMNLP. 1870--1880.Google ScholarCross Ref
- John W. Ratcliff and David E. Metzener. 1988. Pattern Matching: The Gestalt Approach. Dr Dobb's 13, 7 (1988).Google Scholar
- Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. Proceedings of the 2013 Conference of NAACL: HLT June (2013), 74--84.Google Scholar
- Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of NAACL: HLT. 74--84.Google Scholar
- Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. 2012. Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on EMNLP and CoNLL. 523--534. Google ScholarDigital Library
- Stephen Soderland, Brendan Roof, Bo Qin, and Shi Xu. 2010. Adapting Open Information Extraction to Domain-Specific Relations. AI Magazine 31, 3 (2010), 93--102.Google ScholarCross Ref
- Gabriel Stanovsky and Ido Dagan. 2016. Creating a Large Benchmark for Open Information Extraction. In Proceedings of the 2016 Conference on EMNLP. 2300--2305.Google ScholarCross Ref
- Gabriel Stanovsky, Ido Dagan, and Mausam. 2015. Open IE as an Intermediate Structure for Semantic Tasks. In Proceedings of the 53rd Annual Meeting of ACL and the 7th IJCNLP. 303--308.Google ScholarCross Ref
- Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling Coverage for Neural Machine Translation. In Proceedings of the Annual Meeting of ACL (2016), 76--85.Google ScholarCross Ref
- Vered Shwartz, Yoav Goldberg, Ido Dagan, Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. In Proceedings of the 54th Annual Meeting of ACL. 2389--2398. arXiv:1603.06076Google ScholarCross Ref
- Chengyu Wang and Xiaofeng He. 2017. A Short Survey on Taxonomy Learning from Text Corpora : Issues, Resources and Recent Advances. In Proceedings of the Conference on EMNLP.Google ScholarCross Ref
- Wikipedia. 2017. Assignment problem-Wikipedia, The Free Encyclopedia. (2017).Google Scholar
- Fei Wu and Daniel S. Weld. 2010. Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of ACL. 118--127. Google ScholarDigital Library
- Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD. 481--492. Google ScholarDigital Library
- Mohamed Yahya, Steven Euijong Whang, Rahul Gupta, and Alon Halevy. 2014. ReNoun : Fact Extraction for Nominal Attributes. In Proceedings of the Conference on EMNLP 2014, Doha, Qatar. 325--335.Google ScholarCross Ref
- Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. 2016. Neural Enquirer: Learning to Query Tables. In In Proceedings of the Annual Meeting of ACL. 29--35.Google Scholar
- Dmitry Zelenko, Chinatsu Aone, Anthony Richardella, Jaz Kandola, Thomas Hofmann, Tomaso Poggio, and John Shawe-Taylor. 2003. Kernel Methods for Relation Extraction. Journal of Machine Learning Research 3 (2003), 1083--1106. Google ScholarDigital Library
- Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form : Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the 21st Conference on UAI. 658--666. Google ScholarDigital Library
- Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. 2017. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. Proceedings of the 55th Annual Meeting of the ACL (2017), 1227--1236.Google ScholarCross Ref
Index Terms
- Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction
Recommendations
ClausIE: clause-based open information extraction
WWW '13: Proceedings of the 22nd international conference on World Wide WebWe propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``...
Disambiguating Open IE: Identifying Semantic Similarity in Relation Extraction by Word Embeddings
Computational Processing of the Portuguese LanguageAbstractOpen Information Extraction (Open IE) methods enable the extraction of structured relations from domain-independent unstructured sources. However, due to lexical variation and polysemy, we argue it is necessary to understand the meaning of an ...
Vietnamese Open Information Extraction
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyOpen information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language ...
Comments