ABSTRACT
XML schema validation can be performed in constant memory in the streaming model if and only if the schema admits only trees of bounded depth - an acceptable assumption from the practical view-point. In this paper we refine this analysis by taking into account that data can be streamed block-by-block, rather then letter-by-letter, which provides opportunities to speed up the computation by parallelizing the processing of each block.
For this purpose we introduce the model of streaming circuits, which process words of arbitrary length in blocks of fixed size, passing constant amount of information between blocks.
This model allows us to transfer fundamental results about the circuit complexity of regular languages to the setting of streaming schema validation, which leads to effective constructions of streaming circuits of depth logarithmic in the block size, or even constant under certain assumptions on the input schema.
For nested-relational DTDs, a practically motivated class of bounded-depth XML schemas, we provide an efficient construction yielding constant-depth streaming circuits with particularly good parameters.
- Serge Abiteboul, Luc Segoufin, and Victor Vianu. Representing and querying XML with incomplete information. ACM Trans. Database Syst., 31(1):208--254, 2006. Google ScholarDigital Library
- Marcelo Arenas and Leonid Libkin. XML data exchange: Consistency and query answering. J. ACM, 55(2), 2008. Google ScholarDigital Library
- David A. Mix Barrington, Kevin Compton, Howard Straubing, and Denis Thérien. Regular languages in NC1. J. Computer and System Sciences, 44(3):478--499, 1992. Google ScholarDigital Library
- G. J. Bex, F. Neven, and J. Van den Bussche. DTDs versus XML Schema: a practical study. In WEBDB 2004, pages 79--84, 2004. Google ScholarDigital Library
- David Black-Schaffer. Block-Parallel Programming for Real-Time Applications on Multi-Core Processors. PhD thesis, Humboldt-Universitat zu Berlin, 2008. Available at http://cva.stanford.edu/people/davidbbs/black-schaffer_thesis_final.pdf. Google ScholarDigital Library
- David Black-Schaffer and William J. Dally. Block-parallel programming for real-time embedded applications. In ICPP 2010, pages 297--306. IEEE Computer Society, 2010. Google ScholarDigital Library
- Mikołaj Bojanczyk, Howard Straubing, and Igor Walukiewicz. Wreath products of forest algebras, with applications to tree logics. Logical Methods in Computer Science, 8(3), 2012.Google Scholar
- Vince Bárány, Christof Löding, and Olivier Serre. Regularity problems for visibly pushdown languages. In Bruno Durand and Wolfgang Thomas, editors, STACS 2006, volume 3884 of Lecture Notes in Computer Science, pages 420--431. Springer Berlin Heidelberg, 2006. Google ScholarDigital Library
- Ashok K. Chandra, Steven Fortune, and Richard J. Lipton. Lower bounds for constant depth circuits for prefix problems. In Josep Dıaz, editor, ICALP 1983, volume 154 of Lecture Notes in Computer Science, pages 109--117. Springer, 1983. Google ScholarDigital Library
- Sang Cho and Dung T. Huynh. Finite-automaton aperiodicity is PSPACE-complete. Theoretical Computer Science, 88(1):99 -- 116, 1991. Google ScholarDigital Library
- Luc Dartois and Charles Paperman. Alternation hierarchies of first order logic with regular predicates. In Adrian Kosowski and Igor Walukiewicz, editors, FCT 2015, volume 9210 of Lecture Notes in Computer Science, pages 160--172. Springer, 2015.Google Scholar
- Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. Adaptive stream processing using dynamic batch sizing. In Ed Lazowska, Doug Terry, Remzi H. Arpaci-Dusseau, and Johannes Gehrke, editors, SoCC 2014, pages 16:1--16:13. ACM, 2014. Google ScholarDigital Library
- Patrick Dymond. Input-driven languages are in log n depth. Inf. Process. Lett., 26(5):247--250, January 1988. Google ScholarDigital Library
- Merrick Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Theory of Computing Systems, 17:13--27, 1984.Google Scholar
- Karsten Henckell. Pointlike sets: the finest aperiodic cover of a finite semigroup. Journal of Pure and Applied Algebra, 55(1):85 -- 126, 1988.Google ScholarCross Ref
- Neil Immerman. Languages that capture complexity classes. SIAM J. Computing, 16(4):760--778, 1987. Google ScholarDigital Library
- Rashid Khogali, Olivia Das, and Kaamran Raahemifar. Mobile parallel computing algorithms for single-buffered, speed-scalable processors. In TrustCom 2013 / ISPA-13 / IUCC-2013, pages 1832--1839. IEEE Computer Society, 2013. Google ScholarDigital Library
- Eryk Kopczynski. Invisible pushdown languages. In LICS 2016 (to appear), 2016. Available at http://arxiv.org/abs/1511.00289\,.Google ScholarDigital Library
- Michal Koucký, Pavel Pudlák, and Denis Thérien. Bounded-depth circuits: separating wires from gates. In Harold N. Gabow and Ronald Fagin, editors, STOC 2005, pages 257--265. ACM, 2005. Google ScholarDigital Library
- Michal Koucký. Circuit complexity of regular languages. Theory of Computing Systems, 45(4):865--879, 2009. Google ScholarDigital Library
- Michal Koucký, Clemens Lautemann, Sebastian Poloczek, and Denis Thérien. Circuit lower bounds via Ehrenfeucht-Fraissé games. In CCC 2006, pages 190--201, 2006. Google ScholarDigital Library
- Viraj Kumar, P. Madhusudan, and Mahesh Viswanathan. Visibly pushdown automata for streaming XML. In WWW 2007, pages 1053--1062. ACM, 2007. Google ScholarDigital Library
- Dongwon Lee, Murali Mani, Frank Chiu, and Wesley W. Chu. Net & cot: translating relational schemas to XML schemas using semantic constraints. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 4--9, 2002, pages 282--291. ACM, 2002. Google ScholarDigital Library
- Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung, and Bongki Moon. Parallel data processing with MapReduce: a survey. SIGMOD Record, 40(4):11--20, 2011. Google ScholarDigital Library
- Robert McNaughton and Seymour Papert. Counter-Free Automata. The MIT Press, Cambridge, Mass., 1971.Google Scholar
- Charles Paperman. Circuits booléens, prédicats modulaires et langages réguliers. PhD dissertation, Université Paris Diderot, 2014. In French.Google Scholar
- Thomas Place and Marc Zeitoun. Separating regular languages with first-order logic. In Thomas A. Henzinger and Dale Miller, editors, CSL-LICS 2014, pages 75:1--75:10. ACM, 2014. Google ScholarDigital Library
- Thomas Place and Marc Zeitoun. Separation and the successor relation. In Ernst W. Mayr and Nicolas Ollinger, editors, STACS 2015, volume 30 of Leibniz International Proceedings in Informatics (LIPIcs), pages 662--675, Dagstuhl, Germany, 2015. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik.Google Scholar
- Marcel-Paul Schützenberger. On finite monoids having only trivial subgroups. Information and Control, 8(2):190--194, 1965.Google ScholarCross Ref
- Luc Segoufin and Cristina Sirangelo. Constant-memory validation of streaming XML documents against DTDs. In Thomas Schwentick and Dan Suciu, editors, ICDT 2007, volume 4353 of Lecture Notes in Computer Science, pages 299--313. Springer, 2007. Google ScholarDigital Library
- Luc Segoufin and Victor Vianu. Validating streaming XML documents. In Lucian Popa, Serge Abiteboul, and Phokion G. Kolaitis, editors, PODS 2002, pages 53--64. ACM, 2002. Google ScholarDigital Library
- Howard Straubing. Finite Automata, Formal Logic, and Circuit Complexity. Birkh\"auser, Boston, 1994. Google ScholarDigital Library
Index Terms
- Schema Validation via Streaming Circuits
Recommendations
Reformulating XPath queries and XSLT queries on XSLT views
Applications using XML for data representation very often use different XML formats and thus require the transformation of XML data. The common approach transforms entire XML documents from one format into another, e.g. by using an XSLT stylesheet. ...
XML-based XML schema access
WWW '07: Proceedings of the 16th international conference on World Wide WebXML Schema's abstract data model consists of components, which are the structures that eventually define a schema as a whole. XML Schema's XML syntax, on the other hand, is not a direct representation of the schema components, and it proves to be ...
Constraint Preserving Transformation from Relational Schema to XML Schema
XML has become the standard for publishing and exchanging data on the Web. However, most business data is managed and will remain to be managed by relational database management systems. As such, there is an increasing need to efficiently and accurately ...
Comments