Abstract
We investigate the incremental validation of XML documents with respect to DTDs, specialized DTDs, and XML Schemas, under updates consisting of element tag renamings, insertions, and deletions. DTDs are modeled as extended context-free grammars. "Specialized DTDs" allow the decoupling of element types from element tags. XML Schemas are abstracted as specialized DTDs with limitations on the type assignment. For DTDs and XML Schemas, we exhibit an O(m log n) incremental validation algorithm using an auxiliary structure of size O(n), where n is the size of the document and m the number of updates. The algorithm does not handle the incremental validation of XML Schema wrt renaming of internal nodes, which is handled by the specialized DTDs incremental validation algorithm. For specialized DTDs, we provide an O(m log2 n) incremental algorithm, again using an auxiliary structure of size O(n). This is a significant improvement over brute-force re-validation from scratch.We exhibit a restricted class of DTDs called local that arise commonly in practice and for which incremental validation can be done in practically constant time by maintaining only a list of counters. We present implementations of both general incremental validation and local validation on an XML database built on top of a relational database.Our experimentation includes a study of the applicability of local validation in practice, results on the calibration of parameters of the auxiliary data structure, and results on the performance comparison between the general incremental validation technique, the local validation technique, and brute-force validation from scratch.
- Balmin, A. and Papakonstantinou, Y. 2004. Storing and querying XML data using denormalized relational databases. VLDB J. (Online First section). Available online at http://springerlink.metapress.com/openurl.asp?genre=article&id=doi:10.1007/s00778-003-0113-1. Also available online at http://www.db.ucsd.edu/people/andrey. Google ScholarDigital Library
- Barbosa, D., Mendelzon, A., Libkin, L., Mignet, L., and Arenas, M. 2004. Efficient incremental validation of xml documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE'04). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarDigital Library
- Beeri, C. and Milo, T. 1999. Schemas for integration and translation of structured and semi-structured data. In Proceedings of the 7th International Conference on Database Theory. Lecture Notes in Computer Science, vol. 1540. Springer, New York, NY. Google ScholarDigital Library
- Bruggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over non-ranked alphabets. Tech. rep. HKUST-TCSC-2001-05. Hong Kong University of Science and Technology Computer Science Center, Hong Kong. Available online at http://www.cs.ust.hk/tcsc/RR/2001-05.ps.gz.Google Scholar
- Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inform. Computat. 142, 2, 182--206. Google ScholarDigital Library
- Choi, B. 2002. What are real DTDs like? In Proceedings of the Fifth International Workshop on the Web and Databases, WebDB. Informal proceedings, 43--48.Google Scholar
- Cluet, S., Delobel, C., Simeon, J., and Smaga, K. 1998. Your mediators need data conversion! In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press New York, NY, 177--188. Google Scholar
- Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Dong, G. and Su, J. 1995. Space-bounded foies. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (May 22--25, San Jose, CA). ACM Press, New York, NY, 139--150. Google ScholarDigital Library
- Garcia-Molina, H., Ullman, J., and Widom, J. 2001. Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Ghezzi, C. and Mandrioli, D. 1980. Augmenting parsers to support incrementality. J. Assoc. Comput. Mach. 27, 3, 564--579. Google ScholarDigital Library
- Hesse, B. and Immerman, N. 2002. Complete problems for dynamic complexity classes. In Proceedings of the 17th IEEE Symposium on Logic in Computer Science (LICS 2002). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarCross Ref
- Ipedo. Ipedo XML Database: Technical overview. Available online at http://www.ipedo.com/downloads/products_ixd_technical_overview.pdf.Google Scholar
- Jalili, F. and Gallier, J. 1982. Building friendly parsers. In Proceedings of the Ninth Annual ACM Symposium on Principles of Programming Languages. ACM Press, New York, NY. Google ScholarDigital Library
- Larcheveque, J. 1995. Optimal incremental parsing. ACM Trans. Programm. Lang. Syst. 17, 1, 1--15. Google ScholarDigital Library
- Levine, C. 1997. Standard benchmarks for database systems. Presented at Sigmod 97. Available online at http://www.tpc.org/information/sessions/sigmod/indexc.htm.Google Scholar
- Li, W. 1995. A simple and efficient incremental LL(1) parsing. In Proceedings of the 22nd Seminar on Current Trends in Theory and Practice of Informatics(SOFSEM '95). Lecture Notes in Computer Science, vol. 1012. Springer, New York, NY. Google Scholar
- Linden, G. 1993. Incremental updates in structured documents. Licentiate thesis, Report C-1993-19. Department of Computer Science, University of Helsinki, Helsinki, Finland.Google Scholar
- Lohrey, M. 2001. On the parallel complexity of tree automata. In Proceedings of the 12th International Conference on Rewriting Techniques and Applications (RTA 2001). Lecture Notes in Computer Science, vol. 2051. Springer, New York, NY. Google ScholarDigital Library
- Miltersen, P., Subramanian, S., Vitter, J., and Tamassia, R. 1994. Complexity models for incremental computation. Theor. Comput. Sci. 130, 1, 203--236. Google ScholarDigital Library
- Murching, A., Prasant, Y., and Srikant, Y. 1990. Incremental recursive descent parsing. Comput. Lang. 15, 4, 193--204. Google ScholarDigital Library
- Neven, F. 2002. Automata, logic and XML. In Computer Science Logic, 16th International Workshop (CSL 2002). Lecture Notes in Computer Science, vol. 2471. Springer, New York, NY. Available online at http://alpha.luc.ac.be/lucg5503/publs.html. Google Scholar
- Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of XML data. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2000, Dallas, TX). ACM Press, New York, NY, 35--46. Google ScholarDigital Library
- Patnaik, S. and Immerman, N. 1997. Dyn-FO: A parallel, dynamic complexity class. J. Comput. Syst. Sci. 55, 2, 199--209. Google ScholarDigital Library
- Petrone, L. 1995. Reusing batch parsers as incremental parsers. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1026. Springer, New York, NY. Google Scholar
- RELAX NG. Available online at http://www.relaxng.org.Google Scholar
- Segoufin, L. 2002. Personal communication.Google Scholar
- Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99), Edinburgh, Scotland, UK. Morgan Kaufmann, San Francisco, CA, 302--314. Google Scholar
- Sur, G., Hammer, J., and Simeon, J. 2004. UpdateX---an XQuery-based language for processing updates in XML. In International Workshop on Programming Language Technologies for XML (PLAN-X 2004). Informal proceedings.Google Scholar
- Tatarinov, I., Ives, Z., Halevy, A., and Weld, D. 2001. Updating XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google Scholar
- TPC-C. Benchmark. Available online at http://www.tpc.org/tpcc/.Google Scholar
- Vollmer, H. 1999. Introduction to Circuit Complexity. Springer Verlag, New York, NY. Google ScholarDigital Library
- W3C. 1998. The extensible markup language (XML). W3C Recomendation. Available online at http://www.w3c.org/XML.Google Scholar
- W3C. 2001. XML schema definition. W3C Recomendation. Available online at http://www. w3c.org/XML/Schema.Google Scholar
- Wagner, T. and Graham, S. 1998. Efficient and flexible incremental parsing. ACM Trans. Programm. Lang. Syst. 20, 2, 980--1013. Google ScholarDigital Library
- Well. WellLogML DTD. Available online at http://www.posc.org/ebiz/WellLogML/.Google Scholar
- XML Edt. XML editor products. Available online at http://www.perfectxml.com/soft.asp?cat=6.Google Scholar
- XMLmind. XMLmind XML Editor. Available online at http://www.xmlmind.com/xmleditor/.Google Scholar
- XMLspy document editor. Available online at http://www.xmlspy.com/products_doc.html.Google Scholar
Index Terms
- Incremental validation of XML documents
Recommendations
Efficient Revalidation of XML Documents
We study the problem of schema revalidation where XML data known to conform to one schema must be validated with respect to another schema. Such revalidation algorithms have applications in schema evolution, query processing, XML-based programming ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Semantic validation for XML updates
ICUIMC '08: Proceedings of the 2nd international conference on Ubiquitous information management and communicationFor data centric XML data, updates bring the problem of validation that the updated data must conform to both structural and semantic constraints according to XML schemas. Most existing XML validation works today are concentrated on structural ...
Comments