skip to main content
article

Incremental validation of XML documents

Published:12 December 2004Publication History
Skip Abstract Section

Abstract

We investigate the incremental validation of XML documents with respect to DTDs, specialized DTDs, and XML Schemas, under updates consisting of element tag renamings, insertions, and deletions. DTDs are modeled as extended context-free grammars. "Specialized DTDs" allow the decoupling of element types from element tags. XML Schemas are abstracted as specialized DTDs with limitations on the type assignment. For DTDs and XML Schemas, we exhibit an O(m log n) incremental validation algorithm using an auxiliary structure of size O(n), where n is the size of the document and m the number of updates. The algorithm does not handle the incremental validation of XML Schema wrt renaming of internal nodes, which is handled by the specialized DTDs incremental validation algorithm. For specialized DTDs, we provide an O(m log2 n) incremental algorithm, again using an auxiliary structure of size O(n). This is a significant improvement over brute-force re-validation from scratch.We exhibit a restricted class of DTDs called local that arise commonly in practice and for which incremental validation can be done in practically constant time by maintaining only a list of counters. We present implementations of both general incremental validation and local validation on an XML database built on top of a relational database.Our experimentation includes a study of the applicability of local validation in practice, results on the calibration of parameters of the auxiliary data structure, and results on the performance comparison between the general incremental validation technique, the local validation technique, and brute-force validation from scratch.

References

  1. Balmin, A. and Papakonstantinou, Y. 2004. Storing and querying XML data using denormalized relational databases. VLDB J. (Online First section). Available online at http://springerlink.metapress.com/openurl.asp?genre=article&id=doi:10.1007/s00778-003-0113-1. Also available online at http://www.db.ucsd.edu/people/andrey. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barbosa, D., Mendelzon, A., Libkin, L., Mignet, L., and Arenas, M. 2004. Efficient incremental validation of xml documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE'04). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Beeri, C. and Milo, T. 1999. Schemas for integration and translation of structured and semi-structured data. In Proceedings of the 7th International Conference on Database Theory. Lecture Notes in Computer Science, vol. 1540. Springer, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bruggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over non-ranked alphabets. Tech. rep. HKUST-TCSC-2001-05. Hong Kong University of Science and Technology Computer Science Center, Hong Kong. Available online at http://www.cs.ust.hk/tcsc/RR/2001-05.ps.gz.Google ScholarGoogle Scholar
  5. Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inform. Computat. 142, 2, 182--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Choi, B. 2002. What are real DTDs like? In Proceedings of the Fifth International Workshop on the Web and Databases, WebDB. Informal proceedings, 43--48.Google ScholarGoogle Scholar
  7. Cluet, S., Delobel, C., Simeon, J., and Smaga, K. 1998. Your mediators need data conversion! In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press New York, NY, 177--188. Google ScholarGoogle Scholar
  8. Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dong, G. and Su, J. 1995. Space-bounded foies. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (May 22--25, San Jose, CA). ACM Press, New York, NY, 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Garcia-Molina, H., Ullman, J., and Widom, J. 2001. Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ghezzi, C. and Mandrioli, D. 1980. Augmenting parsers to support incrementality. J. Assoc. Comput. Mach. 27, 3, 564--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hesse, B. and Immerman, N. 2002. Complete problems for dynamic complexity classes. In Proceedings of the 17th IEEE Symposium on Logic in Computer Science (LICS 2002). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarCross RefCross Ref
  13. Ipedo. Ipedo XML Database: Technical overview. Available online at http://www.ipedo.com/downloads/products_ixd_technical_overview.pdf.Google ScholarGoogle Scholar
  14. Jalili, F. and Gallier, J. 1982. Building friendly parsers. In Proceedings of the Ninth Annual ACM Symposium on Principles of Programming Languages. ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Larcheveque, J. 1995. Optimal incremental parsing. ACM Trans. Programm. Lang. Syst. 17, 1, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Levine, C. 1997. Standard benchmarks for database systems. Presented at Sigmod 97. Available online at http://www.tpc.org/information/sessions/sigmod/indexc.htm.Google ScholarGoogle Scholar
  17. Li, W. 1995. A simple and efficient incremental LL(1) parsing. In Proceedings of the 22nd Seminar on Current Trends in Theory and Practice of Informatics(SOFSEM '95). Lecture Notes in Computer Science, vol. 1012. Springer, New York, NY. Google ScholarGoogle Scholar
  18. Linden, G. 1993. Incremental updates in structured documents. Licentiate thesis, Report C-1993-19. Department of Computer Science, University of Helsinki, Helsinki, Finland.Google ScholarGoogle Scholar
  19. Lohrey, M. 2001. On the parallel complexity of tree automata. In Proceedings of the 12th International Conference on Rewriting Techniques and Applications (RTA 2001). Lecture Notes in Computer Science, vol. 2051. Springer, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Miltersen, P., Subramanian, S., Vitter, J., and Tamassia, R. 1994. Complexity models for incremental computation. Theor. Comput. Sci. 130, 1, 203--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Murching, A., Prasant, Y., and Srikant, Y. 1990. Incremental recursive descent parsing. Comput. Lang. 15, 4, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Neven, F. 2002. Automata, logic and XML. In Computer Science Logic, 16th International Workshop (CSL 2002). Lecture Notes in Computer Science, vol. 2471. Springer, New York, NY. Available online at http://alpha.luc.ac.be/lucg5503/publs.html. Google ScholarGoogle Scholar
  23. Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of XML data. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2000, Dallas, TX). ACM Press, New York, NY, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Patnaik, S. and Immerman, N. 1997. Dyn-FO: A parallel, dynamic complexity class. J. Comput. Syst. Sci. 55, 2, 199--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Petrone, L. 1995. Reusing batch parsers as incremental parsers. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1026. Springer, New York, NY. Google ScholarGoogle Scholar
  26. RELAX NG. Available online at http://www.relaxng.org.Google ScholarGoogle Scholar
  27. Segoufin, L. 2002. Personal communication.Google ScholarGoogle Scholar
  28. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99), Edinburgh, Scotland, UK. Morgan Kaufmann, San Francisco, CA, 302--314. Google ScholarGoogle Scholar
  29. Sur, G., Hammer, J., and Simeon, J. 2004. UpdateX---an XQuery-based language for processing updates in XML. In International Workshop on Programming Language Technologies for XML (PLAN-X 2004). Informal proceedings.Google ScholarGoogle Scholar
  30. Tatarinov, I., Ives, Z., Halevy, A., and Weld, D. 2001. Updating XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarGoogle Scholar
  31. TPC-C. Benchmark. Available online at http://www.tpc.org/tpcc/.Google ScholarGoogle Scholar
  32. Vollmer, H. 1999. Introduction to Circuit Complexity. Springer Verlag, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W3C. 1998. The extensible markup language (XML). W3C Recomendation. Available online at http://www.w3c.org/XML.Google ScholarGoogle Scholar
  34. W3C. 2001. XML schema definition. W3C Recomendation. Available online at http://www. w3c.org/XML/Schema.Google ScholarGoogle Scholar
  35. Wagner, T. and Graham, S. 1998. Efficient and flexible incremental parsing. ACM Trans. Programm. Lang. Syst. 20, 2, 980--1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Well. WellLogML DTD. Available online at http://www.posc.org/ebiz/WellLogML/.Google ScholarGoogle Scholar
  37. XML Edt. XML editor products. Available online at http://www.perfectxml.com/soft.asp?cat=6.Google ScholarGoogle Scholar
  38. XMLmind. XMLmind XML Editor. Available online at http://www.xmlmind.com/xmleditor/.Google ScholarGoogle Scholar
  39. XMLspy document editor. Available online at http://www.xmlspy.com/products_doc.html.Google ScholarGoogle Scholar

Index Terms

  1. Incremental validation of XML documents

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Database Systems
                  ACM Transactions on Database Systems  Volume 29, Issue 4
                  December 2004
                  250 pages
                  ISSN:0362-5915
                  EISSN:1557-4644
                  DOI:10.1145/1042046
                  Issue’s Table of Contents

                  Copyright © 2004 ACM

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 12 December 2004
                  Published in tods Volume 29, Issue 4

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader