skip to main content
10.1145/1379272.1379281acmotherconferencesArticle/Chapter ViewAbstractPublication PagessspsConference Proceedingsconference-collections
research-article

Automaton in or out: run-time plan optimization for XML stream processing

Published: 29 March 2008 Publication History

Abstract

Many systems such as Tukwila and YFilter combine automaton and algebra techniques to process queries over tokenized XML streams. Typically in this architecture, an automaton is first used to locate all query patterns in the input stream and compose the matched tokens into XML element nodes. These XML nodes are then passed to the tuple-based algebraic operators for further filtering or restructuring. This common processing style is however not always optimal. At times it is more efficient to retrieve only a subset of the patterns in the automaton while retrieving the rest of the patterns on the XML element nodes. In this paper, we use a cost-based solution to explore this novel optimization opportunity. We design three plan optimization algorithms, namely, MinExhaust, GreedyBasic and FastPrune. We also study how to migrate from a currently running plan to a new plan in a safe and efficient manner. Our experimentations have shown that the GreedyBasic or FastPrune algorithm can quickly find a plan that is close to optimal in most scenarios. Also we illustrate that the overhead in our approach for run-time statistics collection and plan migration are very lightweight.

References

[1]
A. Halverson, J. Burger and L. Galanis et al. Mixed Mode XML Query Processing. In Proceedings of VLDB, pages 225--236, 2003.
[2]
A. Schmidt, F. Waas and M. L. Kersten et al. XMark: A Benchmark for XML Data Management. In Proceedings of VLDB, pages 974--985, 2002.
[3]
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD, pages 261--272, June 2000.
[4]
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, pages 310--321, 2002.
[5]
C. Koch, S. Scherzinger and N. Scheweikardt et al. FluxQuery: An Optimizing XQuery Processor for Streaming XML Data. In VLDB, pages 228--239, 2004.
[6]
Y. Chen, S. B. Davidson, and Y. Zheng. An Efficient XPath Query Processor for XML Streams. In ICDE, page 79, 2006.
[7]
D. Barbosa, A. Mendelzon, and J. Keenleyside et al. ToXgene: a Template-Based Data Generator for XML. In Proceedings of WebDB, pages 49--54, 2002.
[8]
D. Florescu, C. Hillery and D. Kossmann et al. The BEA streaming XQuery processor. In VLDB Journal 13(3), pages 294--315, 2004.
[9]
Y. Diao and M. Franklin. Query Processing for High-Volume XML Message Brokering. In VLDB, pages 261--272, 2003.
[10]
G. Russell, M. Neumuller and R. Connor. Stream-based XML Processing with Tuple Filtering. In WebDB, pages 55--63, 2003.
[11]
T. J. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing XML Streams with Deterministic Automata. In ICDT, pages 173--189, 2003.
[12]
A. Gupta and D. Suciu. Stream Processing of XPath Queries with Predicates. In Proceedings of SIGMOD, pages 419--430, 2003.
[13]
H. Jiang, H. Lu and W. Wang. Holistic twig joins on indexed XML documents. In VLDB, pages 273--284, 2003.
[14]
H. Su, E. A. Rundensteiner and M. Mani. Automaton In or Out: Run-time Plan Optimization for XML Stream Processing. In Technical Report, Worcester Polytechnic Institute, http://davis.wpi.edu/dsrg/raindrop/publication.html, 2005.
[15]
H. Su, E. A. Rundensteiner and M. Mani. Semantic Query Optimization for XQuery over XML Streams. In VLDB, pages 277--288, 2005.
[16]
H. Su, E. A. Rundensteiner and Murali Mani. Automaton Meets Algebra: A Hybrid Paradigm for XML Stream Processings. DKE Journal, 2006.
[17]
H. Su, J. Jian and E. A. Rundensteiner. Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams. In CIKM, pages 279--286, Nov 2003.
[18]
J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In SIGMOD, pages 267--276. ACM Press, 1993.
[19]
Z. Ives, A. Halevy, and D. Weld. An XML Query Engine for Network-Bound Data. VLDB Journal, 11 (4): 380--402, 2002.
[20]
B. Ludascher, P. Mukhopadhyay, and Y. Papakonstantinou. A Transducer-Based XML Query Processor. In Proceedings of VLDB, pages 227--238, 2002.
[21]
J. McHugh and J. Widom. Query Optimization for XML. In Proceedings of VLDB, pages 315--326, 1999.
[22]
F. Peng and S. Chawathe. XPath Queries on Streaming Data. In Proceedings of SIGMOD, pages 431--442, 2003.
[23]
V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In ICDE, page 353, March 2003.
[24]
University of Washington. Xml data repository, 2002.
[25]
Y. Wu, J. M. Patel and H. V. Jagadish. Structural Join Order Selection for XML Query Optimization. In ICDE, pages 443--454, 2003.
[26]
Z. Ives and A. Halevy and D. Weld. Adapting to source properties in processing data integration queries. In SIGMOD Conference, pages 395--406, 2004.

Cited By

View all
  • (2015)A Streaming Real-Time Web Observatory Architecture for Monitoring the Health of Social MachinesProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2743977(1149-1154)Online publication date: 18-May-2015
  • (2011)Efficient Event Stream Processing: Handling Ambiguous Events and Patterns with NegationDatabase Systems for Adanced Applications10.1007/978-3-642-20244-5_40(415-426)Online publication date: 2011
  • (2008)Utility-driven load shedding for xml stream processingProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367613(855-864)Online publication date: 21-Apr-2008

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing system
March 2008
99 pages
ISBN:9781595939630
DOI:10.1145/1379272
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. optimization
  3. query rewriting
  4. stream processing

Qualifiers

  • Research-article

Conference

EDBT '08

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)A Streaming Real-Time Web Observatory Architecture for Monitoring the Health of Social MachinesProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2743977(1149-1154)Online publication date: 18-May-2015
  • (2011)Efficient Event Stream Processing: Handling Ambiguous Events and Patterns with NegationDatabase Systems for Adanced Applications10.1007/978-3-642-20244-5_40(415-426)Online publication date: 2011
  • (2008)Utility-driven load shedding for xml stream processingProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367613(855-864)Online publication date: 21-Apr-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media