ACM Home Page
Please provide us with feedback. Feedback
Compiler support for efficient processing of XML datasets
Full text pdf formatPdf (189 KB)
Source International Conference on Supercomputing archive
Proceedings of the 17th annual international conference on Supercomputing table of contents
San Francisco, CA, USA
SESSION: Compilers I table of contents
Pages: 42 - 52  
Year of Publication: 2003
ISBN:1-58113-733-8
Authors
Xiaogang Li  Ohio State University, Columbus, OH
Renato Ferreira  Universidade Federal de Minas Gerais, Brasil
Gagan Agrawal  Ohio State University, Columbus, OH
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 29,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/782814.782823
What is a DOI?

ABSTRACT

Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets.Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of generalized reduction functions, which particularly requires the synthesis of global reduction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level functionality.This paper describes our solutions towards these problems. By implementing the techniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident datasets and parallelization on a cluster of machines. Our experimental results show that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2) parallel versions achieve good load-balance, and incur no significant overheads besides communication.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.
 
2
Ole Agesen. Constrained- based type inference and parametric polymorphism. In Proceedings of Static Analysis Symposium (SAS), appears as the volume no. 864 of the Springer Lecture Notes in Computer Science Series, pages 78--100, September 1994.
 
3
D. Beech, S. Lawrence, M. Maloney, N. Mendelsohn, and H. Thompson. XML Schema part 1: Structures, W3C working draft. Available at http://www.w3.org/TR/1999/xmlschema-1, May 1999.
 
4
P. Biron and A. Malhotra. XML Schema part 2: Datatypes, W3C working draft. Available at http://www.w3.org/TR/1999/xmlschema-2, May 1999.
 
5
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. W3C Working Draft, available from http://www.w3.org/TR/xquery/, November 2002.
 
6
D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winer. Simple object access protocol (soap) 1.1. World Wide Web Consortium (W3C) Note, 08 May 2000.
 
7
T. Bray, J. Paoli, and C. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. Available at http://www.w3.org/TR/REC-xml, February 1998.
 
8
 
9
 
10
 
11
Byron Choi, Mary Fernandez, and Jerome Simeon. The XQuery Formal Semantics: A Foundation for Implementation and Opitmization. May 2002.
12
 
13
D. Draper, P. Fankhauser, M. Fernandez, A. Malhotra, K. Rose, M. Rys, J. Simion, and P. Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C Working Draft, available from http://www.w3.org/TR/query-semantics/, November 2002.
 
14
15
16
17
 
18
High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.
19
 
20
21
 
22
Tahsin M. Kurc, Alan Sussman, and Joel Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.
23
24
25
 
26
NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/LAND BIO/origins. html.
 
27
Chang-Won Park, Jun-Ki Min, and Chin-Wan Chung. Structural Function Inlining Techniques for Structurally Recursive XML Queries. In Proceedings of Conference on Very Large Databases (VLDB), September 2002.
28
29
 
30
Ambuj Shatdal. Architectural considerations for parallel query evaluation algorithms. Technical Report CS-TR-1996-1321, University of Wisconsin, 1999.
 
31
 
32
Jennifer Widom. Data management for XML: Research directions. IEEE Data Engineering Bulletin, 22(3):44--52, 1999.
 
33


Collaborative Colleagues:
Xiaogang Li: colleagues
Renato Ferreira: colleagues
Gagan Agrawal: colleagues

Peer to Peer - Readers of this Article have also read: