skip to main content
10.1145/1247480.1247512acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Efficient algorithms for evaluating xpath over streams

Published: 11 June 2007 Publication History

Abstract

In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|).
In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.

References

[1]
S. Al--Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: a primitive for efficient XML query pattern matching. ICDE Conference, 2002.
[2]
Apache. Apache Xerces2-Java XML Parser. http://xerces.apache.org/xerces2-j/.
[3]
Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the memory requirements of XPath evaluation over XML streams. Proceedings of PODS, 2004.
[4]
Z. Bar-Yossef, M. Fontoura, and V. Josifovski. Buffering in query evaluation over XML streams. Proceedings of PODS, 2005.
[5]
C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski. Streaming XPath processing with forward and backward axes. ICDE Conference, 2003.
[6]
S. Boag, D. Chamberlin, M. F. Ferandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. W3C, http://www.w3.org/TR/xquery/, 2003.
[7]
D. Brownell and D. Megginson. SAX: Simple API for XML. SAX Project Organization, http://www.saxproject.org/.
[8]
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002.
[9]
C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPat expressions. ICDE Conference, 2002.
[10]
Y. Chen, S. B. Davidson, and Y. Zheng. An efficient XPath query processor for XML streams. ICDE, 2006.
[11]
B. Choi. What are real DTDs like? WebDB Workshop, 2002.
[12]
J. Clark. XML Transformations (XSLT) Version 1.0. W3C, http://www.w3.org/TR/xslt/, 1999.
[13]
J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C, http://www.w3.org/TR/xpath/, 1999.
[14]
Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. M. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM Transactions on Database Systems (TODS), 28:467--516, 2003.
[15]
A. L. Diaz and D. Lovell. IBM's XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.
[16]
M. Fernandez and et al. Galax: an implementation of XQuery. http://www.galaxquery.org/.
[17]
D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan,and G. Agrawal. The BEA/XQRL streaming XQuery processor. VLDB Conference, 2003.
[18]
G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. VLDB Conference, 2002.
[19]
G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. Proceedings of PODS, 2003.
[20]
T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata and stream indexes. ACM Transactions on Database Systems (TODS), 29:752--788, 2004.
[21]
A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. SIGMOD Conference, 2003.
[22]
V. Josifovski, M. Fontoura, and A. Barta. Querying XML streams. VLDB Journal, 14(2):197--210, 2005.
[23]
M. Kay. Saxon: the XSLT and XQuery processor. http://saxon.sourceforge.net/.
[24]
C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. VLDB Conference, 2004.
[25]
B. Ludascher, P. Mukhopadhyay, and Y. Papakonstantinou. A transducer-based XML query processor. VLDB, 2002.
[26]
F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. http://www.cs.umd.edu/projects/xsq/.
[27]
F. Peng and S. S. Chawathe. XSQ: A streaming XPath engine. ACM Transactions on Database Systems (TODS), 30:577--623, 2005.
[28]
A. Schmidt and et al. XMark: an XML benchmark project. http://monetdb.cwi.nl/xml/.
[29]
L. Segoufin. Typing and querying XML documents: some complexity bounds. Proceedings of PODS, 2003.
[30]
D. Suciu. XML data repository. http://www.cs.washington.edu/research/xmldatasets/.
[31]
W3C. Section 1.2.2, XML query use cases. http://www.w3.org/TR/xquery-use-cases/, 2006.

Cited By

View all
  • (2024)Streaming enumeration on nested documentsACM Transactions on Database Systems10.1145/3701557Online publication date: 25-Oct-2024
  • (2017)Multi-query processing of XML data streams on multicoreThe Journal of Supercomputing10.1007/s11227-016-1919-073:6(2339-2368)Online publication date: 1-Jun-2017
  • (2014)A study on parallelizing XML path filtering using acceleratorsACM Transactions on Embedded Computing Systems10.1145/256004013:4(1-28)Online publication date: 10-Mar-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. XPath
  3. query processing
  4. streams

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Streaming enumeration on nested documentsACM Transactions on Database Systems10.1145/3701557Online publication date: 25-Oct-2024
  • (2017)Multi-query processing of XML data streams on multicoreThe Journal of Supercomputing10.1007/s11227-016-1919-073:6(2339-2368)Online publication date: 1-Jun-2017
  • (2014)A study on parallelizing XML path filtering using acceleratorsACM Transactions on Embedded Computing Systems10.1145/256004013:4(1-28)Online publication date: 10-Mar-2014
  • (2014)Forward XPath stream processing: End-to-end confidentiality and scalability2014 10th International Conference on Innovations in Information Technology (IIT)10.1109/INNOVATIONS.2014.6987556(24-29)Online publication date: Nov-2014
  • (2013)Path stream group level encoding: Efficient wireless XML streaming2013 International Conference on Recent Trends in Information Technology (ICRTIT)10.1109/ICRTIT.2013.6844267(582-589)Online publication date: Jul-2013
  • (2013)Syntax Analyzer & Selectivity Estimation Technique Applied on Wikipedia XML Data SetProceedings of the 2013 Sixth International Conference on Developments in eSystems Engineering10.1109/DeSE.2013.10(3-8)Online publication date: 16-Dec-2013
  • (2013)A survey on XML streaming evaluation techniquesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-012-0281-y22:2(177-202)Online publication date: 1-Apr-2013
  • (2013)A Research Survey on Large XML Data: Streaming, Selectivity Estimation and ParallelismInter-cooperative Collective Intelligence: Techniques and Applications10.1007/978-3-642-35016-0_7(167-202)Online publication date: 14-Aug-2013
  • (2012)Processing and Evaluating Partial Tree Pattern Queries on XML DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.13724:12(2244-2259)Online publication date: 1-Dec-2012
  • (2012)A stream-based selectivity estimation technique for forward XPath2012 International Conference on Innovations in Information Technology (IIT)10.1109/INNOVATIONS.2012.6207734(209-214)Online publication date: Mar-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media