skip to main content
article

XSQ: A streaming XPath engine

Published: 01 June 2005 Publication History

Abstract

We have implemented and released the XSQ system for evaluating XPath queries on streaming XML data. XSQ supports XPath features such as multiple predicates, closures, and aggregation, which pose interesting challenges for streaming evaluation. Our implementation is based on using a hierarchical arrangement of augmented finite state automata. A design goal of XSQ is buffering data for the least amount of time possible. We present a detailed experimental study that characterizes the performance of XSQ and related systems, and that illustrates the performance implications of XPath features such as closures.

References

[1]
Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. 1996. The Lorel query language for semistructured data. J. Dig. Lib. 1, 1 (Nov.), 68--88.
[2]
Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the International Conference on Very Large Data Bases (VLDB) (Cairo, Egypt), 53--64.
[3]
Avila-Campillo, I., Raven, D., Green, T., Gupta, A., Kadiyska, Y., Onizuka, M., and Suciu, D. 2002. An XML toolkit for light-weight XML stream processing. http://www.cs.washington.edu/homes/suciu/XMLTK/.
[4]
Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (Madison,Wisc.), ACM, New York, 1--16.
[5]
Barbosa, D., Mendelzon, A., Keenleyside, J., and Lyons, K. 2002. ToXgene: A template-based data generator for XML. In Proceedings of the 5th International Workshop on the Web and Databases (Madison, Wisc.), 49--54.
[6]
Barton, C., Charles, P., Fontoura, M., Goyal, D., Josifovski, V., and Raghavachari, M. 2002. An algorithm for streaming XPath processing with forward and backward axes. In Proceedings of the PLAN-X Workshop on Programming Language Technologies for XML (Pittsburgh, Pa).
[7]
Barton, C. M., Charles, P. G., Goyal, D., Raghavachari, M., Josifovski, V., and Fontoura, M. F. 2003. Streaming XPath processing with forward and backward axes. In Proceedings of the International Conference on Data Engineering (ICDE) (Bangalore, India). 455--466.
[8]
Becker, O. 2002. Joost is ollie's original streaming transformer. http://joost.sourceforge.net/.
[9]
Becker, O., Cimprich, P., and Nentwich, C. 2002. Streaming transformations for XML. http://www.gingerall.cz/stx.
[10]
Benedikt, M., Fan, W., and Kuper, G. 2003. Structural properties of XPath fragments. In Proceedings of the International Conference on Database Theory (ICDT) (Siena, Italy).
[11]
Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Siméon, J. 2003. XQuery 1.0: An XML query language 1.0. W3C Working Draft, W3C, http://www.w3.org/TR/xquery/. August.
[12]
Borne, K. D. 2002. ADC dataset, GSFC/NASA XML project. http://xml.gsfc.nasa.gov/archive/.
[13]
Bray, T., Paoli, J., Sperberg-McQueen, C., and Maler, E. 2000. Extensible markup language (XML) 1.0 (2nd Edition). World Wide Web Consortium Recommendation. |http://www.w3.org/TR/REC-xml|.
[14]
Buneman, P., Davidson, S., Hillebrand, G., and Suciu, D. 1996. A query language and optimization techniques for unstructured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (Montréal, Québec, Ont. Canada). ACM, New York, 505--516.
[15]
Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002. Efficient filtering of XML documents with XPath expressions. In Proceedings of the International Conference on Data Engineering (ICDE) (San Jose, Calif.). 235--244.
[16]
Chandra, A. K., Kozen, D. C., and Stockmeyer, L. J. 1981. Alternation. J. ACM 28, 1, 114--133.
[17]
Chen, Z., Jagadish, H. V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R. T., and Srivastava, D. 2001. Counting twig matches in a tree. In Proceedings of the International Conference on Data Engineering (ICDE) (Heidelberg, Germany). 595--604.
[18]
Clark, J. and DeRose, S. 1999. XML path language (XPath) version 1.0. W3C Recommedation, W3C, http://www.w3.org/TR/xpath. Nov.
[19]
Deutsch, A., Fernández, M. F., Florescu, D., Levy, A., and Suciu, D. 1998. XML-QL: A query language for XML. http://www.w3.org/xml/.
[20]
Diao, Y., Fischer, P., and Franklin, M. J. 2002. YFilter: Efficient and scalable filtering of XML documents. In Proceedings of the International Conference on Data Engineering (ICDE) (San Jose, Calif.). 341--344.
[21]
Fernández, M. F., Florescu, D., Kang, J., Levy, A. Y., and Suciu, D. 1997. STRUDEL: A web-site management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (Tucson, Az.), J. Peckham, Ed. 549--552.
[22]
Fernández, M. F. and Siméon, J. 2002. Galax. http://db.bell-labs.com/galax/.
[23]
Gottlob, G., Koch, C., and Pichler, R. 2002. Efficient algorithms for processing XPath queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB) (Hong Kong, China).
[24]
Gottlob, G., Koch, C., and Pichler, R. 2003. The complexity of XPath query evaluation. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (San Diego, Calif.). 179--190.
[25]
Green, T. J., Miklau, G., Onizuka, M., and Suciu, D. 2003. Processing XML streams with deterministic automata. In Proceedings of the International Conference on Database Theory (ICDT) (Siena, Italy). 173--189.
[26]
Gupta, A. K. and Suciu, D. 2003. Streaming processing of XPath queries with predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (San Diego, Calif.). ACM, New York, 419--430.
[27]
Hoffmann, C. M. and O'Donnell, M. J. 1982. Pattern matching in trees. J. ACM 29, 1, 68--95.
[28]
Hors, A. L., Hgaret, P. L., Wood, L., Nicol, G., Robie, J., Champion, M., and Byrne, S. 2000. Document object model level 2 core specification. W3C Recommendation, W3C, http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113. November.
[29]
Katz, H. 2002. XQEngine. http://www.fatdog.com.
[30]
Kay, M. H. 2002. SAXON: An XSLT processor. http://saxon.sourceforge.net/.
[31]
Kay, M. 2003. XSL transformations (XSLT) version 2.0. W3C Working Draft, W3C, http://www.w3.org/TR/xslt20/. November.
[32]
Kilpel, P. 1992. Tree matching problems with applications to structured text databases. Ph.D. dissertation. Dept. of Computer Science, University of Helsink.
[33]
Lakshmanan, L. V. and Sailaja, P. 2002. On efficient matching of streaming XML documents and queries. In Proceedings of the 8th International Conference on Extending Database Technology (Prague, Czech Republic). 142--160.
[34]
Ley, M. 2003. Computer science bibliography. http://dblp.uni-trier.de/xml/.
[35]
Ludascher, B., Mukhopadhayn, P., and Papakonstantinou, Y. 2002. A transducer-based XML query processor. In Proceedings of the International Conference on Very Large Data Bases (VLDB) (Hong Kong, China). 227--238.
[36]
Miklau, G. and Suciu, D. 2002. Containment and equivalence for an XPath fragment. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (Madison,Wisc.). ACM, New York, 65--76.
[37]
Olteanu, D., Kiesling, T., and Bry, F. 2002. An evaluation of regular path expressions with qualifiers against XML streams. Tech. Rep. PMS-FB-2002-12, Institute for Computer Science, Ludwig-Maximilians University, Munich, Germany, May.
[38]
Peng, F. and Chawathe, S. S. 2003. XPath queries on streaming data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (San Diego, Calif.). ACM, New York, 431--442.
[39]
Peng, F. and Chawathe, S. S. 2004. XPaSS: A multi-query streaming XPath engine. Tech. Rep. CS-TR-4565 (UMIACS-TR-2004-10), Department of Computer Science, University of Maryland. May.
[40]
Sax Project Organization. 2001. SAX: Simple API for XML. http://www.saxproject.org/.
[41]
Segoufin, L. and Vianu, V. 2002. Validating streaming XML documents. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (Madison, Wisc.). ACM, New York, 53--64.
[42]
Tucker, P. A., Maier, D., and Sheard, T. 2003. Applying punctuation schemes to queries over continuous data streams. Bull. Tech. Comm. Data Eng. 26, 1 (Mar.), 33--40.
[43]
Wu, C. H., Huang, H., Arminski, L., Catro-Alvear, J., Chen, Y., Hu, Z. Z., Ledley, R. S., Lewis, K. C., Mewes, H. W., Orcutt, B. C., Suzek, B. E., Tsugita, A., Vinayaka, C. R., Yeh, L. S., Zhang, J., and Barker, W. C. 2002. The protein information resource: An integrated public resource of functional annotation of protein. Nuc. Acids Res. 30, 35--37.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 30, Issue 2
June 2005
328 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1071610
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2005
Published in TODS Volume 30, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XPath
  2. streaming processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media