ABSTRACT
Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- M. Benedikt, L. Libkin, T. Schwentick, and L. Segoufin. Definable relations and first-order query languages over strings. J. ACM, 50(5):694--751, 2003. Google ScholarDigital Library
- A. J. Bonner and G. Mecca. Sequences, datalog, and transducers. J. CSS, 57(3):234--259, 1998. Google ScholarDigital Library
- L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128--137, 2010. Google ScholarDigital Library
- H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Ver. 6). 2011. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- P. M. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009. Google ScholarDigital Library
- R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164--175, 2014. Google ScholarDigital Library
- R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015. Google ScholarDigital Library
- D. D. Freydenberger and M. Holldack. Document spanners: From expressive power to decision problems. In 19th International Conference on Database Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, pages 17:1--17:17, 2016.Google Scholar
- S. Ginsburg and X. S. Wang. Regular sequence operations and their use in database queries. J. Comput. Syst. Sci., 56(1):1--26, 1998. Google ScholarDigital Library
- G. Grahne, M. Nykänen, and E. Ukkonen. Reasoning about strings in databases. J. Comput. Syst. Sci., 59(1):116--162, 1999. Google ScholarDigital Library
- B. Kimelfeld. Extending datalog intelligence. In RR, pages 1--10, 2015.Google ScholarCross Ref
- C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL, pages 55--60, 2014.Google ScholarCross Ref
- W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In VLDB, pages 1033--1044, 2007. Google ScholarDigital Library
Index Terms
- Incorporating information extraction in the relational database model
Recommendations
Modeling MongoDB with Relational Model
EIDWT '13: Proceedings of the 2013 Fourth International Conference on Emerging Intelligent Data and Web TechnologiesRelational databases have been prevailing for the last two decades, with features of clear semantics and ease of use with SQL supported by the underlying theory, relational algebra. Relational databases provide good support for structural data ...
Extraction of timeER model from a relational database
ACIIDS'11: Proceedings of the Third international conference on Intelligent information and database systems - Volume Part IRelated to the problem of temporal database design, we can design the relational target model from the TimeER model. Extraction of the TimeER model from a relational model is called reverse engineering of the relational model. Solving this problem will ...
Using UML class diagrams for a comparative analysis of relational, object-oriented, and object-relational database mappings
SIGCSE '03: Proceedings of the 34th SIGCSE technical symposium on Computer science educationThis paper illustrates the manner in which UML can be used to study mappings to different types of database systems. After introducing UML through a comparison to the EER model, UML diagrams are used to teach different approaches for mapping conceptual ...
Comments