skip to main content
10.1145/1242572.1242719acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

A high-performance interpretive approach to schema-directed parsing

Published: 08 May 2007 Publication History

Abstract

XML delivers key advantages in interoperability due to its flexibility, expressiveness, and platform-neutrality. As XML has become a performance-critical aspect of the next generation of business computing infrastructure, however, it has become increasingly clear that XML parsing often carries a heavy performance penalty, and that current, widely-used parsing technologies are unable to meet the performance demands of an XML-based computing infrastructure. Several efforts have been made to address this performance gap through the use of grammar-based parser generation. While the performance of generated parsers has been significantly improved, adoption of the technology has been hindered by the complexity of compiling and deploying the generated parsers. Through careful analysis of the operations required for parsing and validation, we have devised a set of specialized byte codes, designed for the task of XML parsing and validation. These byte codes are designed to engender the benefits of fine-grained composition of parsing and validation that make existing compiled parsers fast, while being coarse-grained enough to minimize interpreter overhead. This technique of using an interpretive,validating parser balances the need for performance against the requirements of simple tooling and robust scalable infrastructure. Our approach is demonstrated with a specialized schema compiler, used to generate byte codes which in turn drive an interpretive parser. With almost as little tooling and deployment complexity as a traditional interpretive parser, the byte code-driven parser usually demonstrates performance within 20% of the fastest fully compiled solutions.

References

[1]
The Apache Foundation. Xerces. http://xml.apache.org.
[2]
R. J. Bayardo, D. Gruhl, V. Josifovski, and J. Myllymaki. An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing. In World Wide Web Conference, May 2004.
[3]
K. Chiu and W. Lu. A Compiler-Based Approach to Schema-Specific XML Parsing. In First International Workshop on High Performance XML Processing, May 2004.
[4]
J. Clark. Expat XML parser. http://expat.sourceforge.net/.
[5]
D. C. Fallside and P. Walmsley, editors. XML Schema Part 0: Primer Second Edition. W3C, second edition, Oct 2004. http://www.w3.org/TR/xmlschema-0.
[6]
The GNU Project. Flex. http://www.gnu.org/software/flex/.
[7]
M. Kostoulas, M. Matsa, N. Mendelsohn, E. Perkins, A. Heifets, and M. Mercaldi. XML Screamer: An Integrated Approach to High Performance XML Parsing, Validation and Deserialization. In 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23-26, 2006), pages 93--102, New York, NY, May 2006. ACM Press.
[8]
J. K. Ousterhout. Tool Command Language. http://www.tcl.tk/
[9]
E. Perkins, M. Matsa, M. Kostoulas, A. Heifets, and N. Mendelsohn. Generation of Efficient Parsers through Direct Compilation of XML Schema. IBM Systems Journal, 45(2):225--244, 2006.
[10]
F. Reuter and N. Luttenberger. Cardinality constraint automata: A core technology for efficient XML schema-aware parsers. http://www.swarms.de/publications/cca.pdf, 2003.
[11]
Sarvega, Inc. XML Validation Benchmark. http://www.sarvega.com/xml-validation-benchmark.html
[12]
saxproject.org. SAX: Simple API For XML. http://www.saxproject.org/.
[13]
Sun Microsystems, Inc. Java Technology. http://java.sun.com/.
[14]
H. Thomson, D. Beech, M. Maloney, and N. Mendelsohn, editors. XML Schema Part 1: Structures. W3C, second edition, Oct 2004. http://www.w3.org/TR/REC-xmlschema
[15]
R. van Engelen. Constructing Finite State Automata for High-Performance XML Web Services. In International Conference on Internet Computing, 2004.
[16]
L. Wall. Practical Extraction and Report Language. http://www.perl.org/

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. compiler
  3. interpreter
  4. parsing
  5. performance
  6. schema

Qualifiers

  • Article

Conference

WWW'07
Sponsor:
WWW'07: 16th International World Wide Web Conference
May 8 - 12, 2007
Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)AMEDS-toolProceedings of the 2014 Summer Simulation Multiconference10.5555/2685617.2685637(1-8)Online publication date: 6-Jul-2014
  • (2012)Application of XML in the Remote Temperature Monitoring SystemAdvanced Materials Research10.4028/www.scientific.net/AMR.433-440.6509433-440(6509-6513)Online publication date: Jan-2012
  • (2011)A Generic Parser to parse and reconfigure XML files2011 IEEE Recent Advances in Intelligent Computational Systems10.1109/RAICS.2011.6069424(823-827)Online publication date: Sep-2011
  • (2008)An Adaptive XML Parser for Developing High-Performance Web ServicesProceedings of the 2008 Fourth IEEE International Conference on eScience10.1109/eScience.2008.87(672-679)Online publication date: 7-Dec-2008
  • (2008)High-Performance XML Parsing and Validation with Permutation Phrase Grammar ParsersProceedings of the 2008 IEEE International Conference on Web Services10.1109/ICWS.2008.101(286-294)Online publication date: 23-Sep-2008
  • (2008)Exploiting Structure Recurrence in XML ProcessingProceedings of the 2008 Eighth International Conference on Web Engineering10.1109/ICWE.2008.46(311-324)Online publication date: 14-Jul-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media