|
ABSTRACT
In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a global address space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and global memory, and between global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation of the proposed approach using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests, Proceedings of the 14th international conference on Supercomputing, p.141-152, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335245]
|
| |
2
|
Gerald Baumgartner , David E. Bernholdt , Daniel Cociorva , Robert Harrison , So Hirata , Chi-Chung Lam , Marcel Nooijen , Russell Pitzer , J. Ramanujam , P. Sadayappan, A high-level approach to synthesis of high-performance codes for quantum chemistry, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-10, November 16, 2002, Baltimore, Maryland
|
| |
3
|
|
| |
4
|
|
| |
5
|
Chang, C., Kurc, T., Sussman, A., Çatalyürek, U. V., And Saltz, J. 2001. A hypergraph-based workload partitioning strategy for parallel data aggregation. In Proceedings of the Eleventh SIAM Conference on Parallel Processing for Scientific Computing, SIAM.
|
| |
6
|
Crawford, T., And III, H. S. 2000. An Introduction to Coupled Cluster Theory for Computational Chemists. In Reviews in Computational Chemistry, K. Lipkowitz and D. Boyd, Ed., vol. 14. John Wiley & Sons, Ltd., 33--136.
|
 |
7
|
|
| |
8
|
Hendrickson, B., And Leland, R. 1994. The Chaco user's guide: Version 2.0. Tech. Rep. SAND94-2692, Sandia National Laboratories.
|
| |
9
|
High Performance Computational Chemistry Group. 2004. NWChem, A Computational Chemistry Package for Parallel Computers, Version 4.6. Pacific Northwest National Laboratory.
|
 |
10
|
Laxmikant V. Kale , Sanjeev Krishnan, CHARM++: a portable concurrent object oriented system based on C++, Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, p.91-108, September 26-October 01, 1993, Washington, D.C., United States
|
 |
11
|
George Karypis , Rajat Aggarwal , Vipin Kumar , Shashi Shekhar, Multilevel hypergraph partitioning: application in VLSI domain, Proceedings of the 34th annual conference on Design automation, p.526-529, June 09-13, 1997, Anaheim, California, United States
[doi> 10.1145/266021.266273]
|
| |
12
|
Gaurav Khanna , Nagavijayalakshmi Vydyanathan , T. Kurc , U. Catalyurek , P. Wyckoff , J. Saltz , P. Sadayappan, A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, p.792-799, May 09-12, 2005
|
 |
13
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
14
|
Krishnamoorthy, S., Catalyurek, U., Nieplocha, J., Rountev, A., And Sadayappan, P. 2006. An extensible global address space frame-work with decoupled task and data abstractions. In Proc. IPDPS Workshop on Next Generation Software.
|
| |
15
|
Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C., Sadayappan, P., Ramanujam, J., Bernholdt, D., And Choppella, V. 2003. Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms. In Proc. 10th Annual International Conference on High Performance Computing (HiPC), Springer Verlag, 406--417.
|
| |
16
|
Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Lam, C.-C., Ramanujam, J., Sadayappan, P., And Choppella, V. 2004. Efficient synthesis of out-of-core algorithms for tensor contractions using a nonlinear optimization solver. In The 18th International Parallel and Distributed Processing Symposium.
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Joel Saltz , Ravi Ponnusamy , Shamik D. Sharma , Bongki Moon , Yuan-Shin Hwang , Mustafa Uysal , Raja Das, A manual for the CHAOS runtime library, University of Maryland at College Park, College Park, MD, 1995
|
| |
23
|
Sinha, A., And Kalé, L. 1993. A load balancing strategy for prioritized execution of tasks. In Seventh International Parallel Processing Symposium, 230--237.
|
| |
24
|
Tuminaro, R. S., Heroux, M., Hutchinson, S. A., And Shadid, J. N. 1999. Official Aztec user's guide: Version 2.1. Tech. rep., Sandia National Laboratories.
|
|