skip to main content
10.1145/2932194.2932200acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebdbConference Proceedingsconference-collections
research-article

Incorporating information extraction in the relational database model

Authors Info & Claims
Published:26 June 2016Publication History

ABSTRACT

Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Benedikt, L. Libkin, T. Schwentick, and L. Segoufin. Definable relations and first-order query languages over strings. J. ACM, 50(5):694--751, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. J. Bonner and G. Mecca. Sequences, datalog, and transducers. J. CSS, 57(3):234--259, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128--137, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Ver. 6). 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. M. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164--175, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. D. Freydenberger and M. Holldack. Document spanners: From expressive power to decision problems. In 19th International Conference on Database Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, pages 17:1--17:17, 2016.Google ScholarGoogle Scholar
  11. S. Ginsburg and X. S. Wang. Regular sequence operations and their use in database queries. J. Comput. Syst. Sci., 56(1):1--26, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Grahne, M. Nykänen, and E. Ukkonen. Reasoning about strings in databases. J. Comput. Syst. Sci., 59(1):116--162, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Kimelfeld. Extending datalog intelligence. In RR, pages 1--10, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL, pages 55--60, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  15. W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In VLDB, pages 1033--1044, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incorporating information extraction in the relational database model

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  WebDB '16: Proceedings of the 19th International Workshop on Web and Databases
                  June 2016
                  59 pages
                  ISBN:9781450343107
                  DOI:10.1145/2932194

                  Copyright © 2016 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 26 June 2016

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  WebDB '16 Paper Acceptance Rate9of29submissions,31%Overall Acceptance Rate30of100submissions,30%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader