skip to main content
10.1145/1055558.1055584acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

On the memory requirements of XPath evaluation over XML streams

Published: 14 June 2004 Publication History

Abstract

The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past two years, A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proof is based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.

References

[1]
M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. 26th VLDB, pages 53--64, 2000.
[2]
A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing memory requirements for queries over continuous data streams. In Proc. 21st PODS, pages 221--232, 2002.
[3]
I. Avila-Campillo, T. J. Green, A. Gupta, M. Onizuka, D. Raven, and D. Suciu. XMLTK: An XML toolkit for scalable XML stream processing. In Proc. 1st Workshop on Programming Languages for XML (PLAN-X), 2002.
[4]
S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, and J. Siméosn. XQuery 1.0: An XML Query Language. W3C, http://www.w3.org/TR/xquery, 2003.
[5]
T. Bray, J. Paoli, and C. M. Sperbeg-McQueen. Extensible Markup Language (XML) 1.0. W3C, http://www.w3.org/TR/1998/REC-xml-19980210, 1998.
[6]
C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. In Proc. 18th ICDE, pages 235--244, 2002.
[7]
B. Choi, M. Mahoui, and D. Wood. On the optimality of holistic algorithms for twig queries. In Proc. 14th DEXA, pages 28--37, 2003.
[8]
B. Choi, M. Mahoui, and D. Wood. The optimality of holistic algorithms for XPath. http://www.cis.upenn.edu/~kkchoi/xpath.pdf, 2003.
[9]
J. Clark. XSL Transformations (XSLT) Version 1.0. W3C, http://www.w3.org/TR/xslt, 1999.
[10]
J. Clark and S. DeRose. XML Path Language (XPath), Version 1.0. W3C, http://www.w3.org/TR/xpath, 1999.
[11]
Y. Diao, P. M. Fischer, M. J. Franklin, and R. To. YFilter: Efficient and scalable filtering of XML documents. In Proc. 18th ICDE, pages 341--342, 2002.
[12]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4):614--656, 2003.
[13]
G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. In Proc. 22nd PODS, pages 179--190, 2003.
[14]
T. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata. In Processing of the 9th ICDT, pages 173--189, 2003.
[15]
A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. In Proc. 22nd SIGMOD, pages 419--430, 2003.
[16]
Z. Ives, A. Levy, and D. Weld. Efficient evaluation of regular path expressions on streaming XML data. Technical report, University of Washington, 2000.
[17]
V. Josifovski, M. Fontoura, and A. Barta. Querying XML streams. The VLDB Journal, 2004. to appear.
[18]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
[19]
D. Olteanu, T. Kiesling, and F. Bry. An evaluation of regular path expressions with qualifiers against XML streams. In Proc. 19th ICDE, pages 702--704, 2003.
[20]
F. Peng and S. S. Chawathe. XPath queries on streaming data. In Proc. 22nd SIGMOD, pages 431--442, 2003.
[21]
L. Segoufin. Typing and querying XNL documents: Some complexity bounds. In Proc. 22nd PODS, pages 167--178, 2003.
[22]
M. Y. Vardi. The complexity of relational query languages. In Proc. 14th STOC, pages 137--146, 1982.
[23]
A. C.-C. Yao. Some complexity questions related to distributive computing. In Proc. 11th STOC, pages 209--213, 1979.

Cited By

View all
  • (2013)JetXSLTProceedings of the Twenty-Fourth Australasian Database Conference - Volume 13710.5555/2525416.2525424(71-80)Online publication date: 29-Jan-2013
  • (2013)Querying Streaming XML Big Data with Multiple Filters on CloudProceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering10.1109/CSE.2013.163(1121-1127)Online publication date: 3-Dec-2013
  • (2013)A survey on XML streaming evaluation techniquesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-012-0281-y22:2(177-202)Online publication date: 1-Apr-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)JetXSLTProceedings of the Twenty-Fourth Australasian Database Conference - Volume 13710.5555/2525416.2525424(71-80)Online publication date: 29-Jan-2013
  • (2013)Querying Streaming XML Big Data with Multiple Filters on CloudProceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering10.1109/CSE.2013.163(1121-1127)Online publication date: 3-Dec-2013
  • (2013)A survey on XML streaming evaluation techniquesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-012-0281-y22:2(177-202)Online publication date: 1-Apr-2013
  • (2013)A Research Survey on Large XML Data: Streaming, Selectivity Estimation and ParallelismInter-cooperative Collective Intelligence: Techniques and Applications10.1007/978-3-642-35016-0_7(167-202)Online publication date: 14-Aug-2013
  • (2012)Processing and Evaluating Partial Tree Pattern Queries on XML DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.13724:12(2244-2259)Online publication date: 1-Dec-2012
  • (2010)Evaluation Techniques for Generalized Path Pattern Queries on XML DataWorld Wide Web10.1007/s11280-010-0092-213:4(441-474)Online publication date: 1-Dec-2010
  • (2010)Efficient evaluation of generalized tree-pattern queries on XML streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-010-0184-819:5(661-686)Online publication date: 1-Oct-2010
  • (2009)Fast XML document filtering by sequencing twig patternsACM Transactions on Internet Technology10.1145/1592446.15924479:4(1-51)Online publication date: 14-Oct-2009
  • (2009)Runtime monitoring of web service choreographies using streaming XMLProceedings of the 2009 ACM symposium on Applied Computing10.1145/1529282.1529752(2118-2125)Online publication date: 8-Mar-2009
  • (2009)Lower bounds for processing data with few random accesses to external memoryJournal of the ACM10.1145/1516512.151651456:3(1-58)Online publication date: 19-May-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media