|
ABSTRACT
Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets.Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of generalized reduction functions, which particularly requires the synthesis of global reduction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level functionality.This paper describes our solutions towards these problems. By implementing the techniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident datasets and parallelization on a cluster of machines. Our experimental results show that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2) parallel versions achieve good load-balance, and incur no significant overheads besides communication.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.
|
| |
2
|
Ole Agesen. Constrained- based type inference and parametric polymorphism. In Proceedings of Static Analysis Symposium (SAS), appears as the volume no. 864 of the Springer Lecture Notes in Computer Science Series, pages 78--100, September 1994.
|
| |
3
|
D. Beech, S. Lawrence, M. Maloney, N. Mendelsohn, and H. Thompson. XML Schema part 1: Structures, W3C working draft. Available at http://www.w3.org/TR/1999/xmlschema-1, May 1999.
|
| |
4
|
P. Biron and A. Malhotra. XML Schema part 2: Datatypes, W3C working draft. Available at http://www.w3.org/TR/1999/xmlschema-2, May 1999.
|
| |
5
|
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. W3C Working Draft, available from http://www.w3.org/TR/xquery/, November 2002.
|
| |
6
|
D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winer. Simple object access protocol (soap) 1.1. World Wide Web Consortium (W3C) Note, 08 May 2000.
|
| |
7
|
T. Bray, J. Paoli, and C. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. Available at http://www.w3.org/TR/REC-xml, February 1998.
|
| |
8
|
|
| |
9
|
|
| |
10
|
Chialin Chang , Bongki Moon , Anurag Acharya , Carter Shock , Alan Sussman , Joel H. Saltz, Titan: A High-Performance Remote Sensing Database, Proceedings of the Thirteenth International Conference on Data Engineering, p.375-384, April 07-11, 1997
|
| |
11
|
Byron Choi, Mary Fernandez, and Jerome Simeon. The XQuery Formal Semantics: A Foundation for Implementation and Opitmization. May 2002.
|
 |
12
|
|
| |
13
|
D. Draper, P. Fankhauser, M. Fernandez, A. Malhotra, K. Rose, M. Rys, J. Simion, and P. Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C Working Draft, available from http://www.w3.org/TR/query-semantics/, November 2002.
|
| |
14
|
Renato Ferreira , Bongki Moon , Jim Humphries , Alan Sussman , Joel Saltz , Robert Miller , Angelo Demarzo, The virtual microscope, University of Maryland at College Park, College Park, MD, 1997
|
 |
15
|
Renato Ferreira , Gagan Agrawal , Joel Saltz, Compiling object-oriented data intensive applications, Proceedings of the 14th international conference on Supercomputing, p.11-21, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335233]
|
 |
16
|
|
 |
17
|
|
| |
18
|
High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.
|
 |
19
|
Jens Palsberg , Michael I. Schwartzbach, Object-oriented type inference, Conference proceedings on Object-oriented programming systems, languages, and applications, p.146-161, October 06-11, 1991, Phoenix, Arizona, United States
|
| |
20
|
Mahmut Kandemir , Alok Choudhary , J. Ramanujam , Meenakshi A. Kandaswamy, A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations, IEEE Transactions on Parallel and Distributed Systems, v.11 n.7, p.648-668, July 2000
[doi> 10.1109/71.877759
]
|
 |
21
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
22
|
Tahsin M. Kurc, Alan Sussman, and Joel Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.
|
 |
23
|
|
 |
24
|
|
 |
25
|
Todd C. Mowry , Angela K. Demke , Orran Krieger, Automatic compiler-inserted I/O prefetching for out-of-core applications, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.3-17, October 29-November 01, 1996, Seattle, Washington, United States
|
| |
26
|
NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/LAND BIO/origins. html.
|
| |
27
|
Chang-Won Park, Jun-Ki Min, and Chin-Wan Chung. Structural Function Inlining Techniques for Structurally Recursive XML Queries. In Proceedings of Conference on Very Large Databases (VLDB), September 2002.
|
 |
28
|
|
 |
29
|
Klaus E. Schauser , David E. Culler , Seth C. Goldstein, Separation constraint partitioning: a new algorithm for partitioning non-strict programs into sequential threads, Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.259-271, January 23-25, 1995, San Francisco, California, United States
[doi> 10.1145/199448.199511]
|
| |
30
|
Ambuj Shatdal. Architectural considerations for parallel query evaluation algorithms. Technical Report CS-TR-1996-1321, University of Wisconsin, 1999.
|
| |
31
|
|
| |
32
|
Jennifer Widom. Data management for XML: Research directions. IEEE Data Engineering Bulletin, 22(3):44--52, 1999.
|
| |
33
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|