ABSTRACT
Demand for programming environments to exploit clusters of symmetric multiprocessors (SMPs) is increasing. In this paper, we present a new programming environment, called ParADE, to enable easy, portable, and high-performance programming on SMP clusters. It is an OpenMP programming environment on top of a multi-threaded software distributed shared memory (SDSM) system with a variant of home-based lazy release consistency protocol. To boost performance, the runtime system provides explicit message-passing primitives to make it a hybrid-programming environment. Collective communication primitives are used for the synchronization and work-sharing directives associated with small data structures, lessening the synchronization overhead and avoiding the implicit barriers of work-sharing directives. The OpenMP translator bridges the gap between the OpenMP abstraction and the hybrid programming interfaces of the runtime system. The experiments with several NAS benchmarks and applications on a Linux-based cluster show promising results that ParADE overcomes the performance problem of the conventional SDSM-based OpenMP environment.
- {1} David E. Culler, Jaswinder Pal Singh, and Anoop Gupta, Parallel Computer Architecture, Morgan Kaufmann, San Francisco, CA, 1999. Google ScholarDigital Library
- {2} Message-passing Interface Forum, "MPI: A Message-Passing Interface Standard," International Journal of Supercomputer Applications and High Performance Computing, vol. 8, no. 3/4, Fall/Winter 1994, pp. 159-416.Google Scholar
- {3} OpenMP C and C++ Application Programming Interface, Version 1.0, http://www.openmp.org, Oct. 1998.Google Scholar
- {4} Kai Li and Paul Hudak, "Memory coherence in shared virtual memory systems," ACM Transactions on Computer Systems, vol. 7, no. 4, Nov. 1989, pp. 321-359. Google ScholarDigital Library
- {5} Mitsuhisa Sato, Shigehisa Satoh, Kazuhiro Kusano, and Yoshio Tanaka, Design of OpenMP Compiler for an SMP Cluster, In Proceedings of European Workshop on OpenMP (EWOMP'99), Sep. 1999.Google Scholar
- {6} Ayon Basumallik, Seung-Jai Min, and Rudolf Eigenmann, "Towards OpenMP execution on software distributed shared memory systems," Int'l Workshop on OpenMP: Experiences and Implementations (WOMPEI'02), Lecture Notes in Computer Science, #2327, Springer Verlag, May, 2002, pp. 457-468. Google ScholarDigital Library
- {7} Y. Charlie Hu, Honghui Lu, Alan L. Cox, and Willy Zwaenepoel, "OpenMP for Networks of SMPs," Journal of Parallel and Distributed Computing, vol. 60, no.12, Dec. 2000, pp. 1512-1530. Google ScholarDigital Library
- {8} Hongzhang Shan, Jaswinder P. Singh, Leonid Oliker, and Rupak Biswas, "Message Passing and Shared Address Space Parallelism on an SMP Cluster," Parallel Computing, vol. 29, no. 2, Feb. 2003, pp. 167-186. Google ScholarDigital Library
- {9} Franck Cappello and Daniel Etiemble, "MPI versus MPI + OpenMP on IBM SP for the NAS benchmarks," In proceedings of ACM/IEEE Conference on Supercomputing, Nov. 2000. Google ScholarDigital Library
- {10} Lorna Smith and Paul Kent, "Development and Performance of a Mixed OpenMP/MPI Quantum Monte Carlo Code," Concurrency: Practice and Experience, vol. 12, no. 12, Dec. 2000, pp. 1121-1129.Google ScholarCross Ref
- {11} Dave Dunning, Greg Regnier, Gary McAlpine, Don Cameron, Bill Shubert, Frank Berry, Anne Marie Merritt, Ed Gronke, Chris Dodd, "The Virtual Interface Architecture," IEEE Micro, vol. 18, no. 2, Mar./Apr. 1998, pp. 66-76. Google ScholarDigital Library
- {12} http://www.mpi-softtech.comGoogle Scholar
- {13} L. Iftode. "Home-based Shared Virtual Memory". Ph.D. thesis, Princeton Univ., Aug. 1998. Google ScholarDigital Library
- {14} Frank Mueller, "Distributed Shared-Memory Threads: DSM-Threads," Workshop on RunTime systems for Parallel Programming, Apr. 1997, pp. 31-40.Google Scholar
- {15} Markus Pizka and Christian Rehn, "Murks-A POSIX Threads Based DSM System," In Proceedings of The International Conference on Parallel and Distributed Computing Systems, Aug. 2001. pp. 642-648.Google Scholar
- {16} Yang-Suk Kee, Jin-Soo Kim, and Soonhoi Ha, "Atomic Page Update Methods for OpenMP-Aware Software DSM," submitted for publication.Google Scholar
- {17} Brian N. Bershad, Matthew J. Zekauskas, and Wayne A. Sawdon, The Midway Distributed Shared Memory System, CMU Technical Report CMU-CS-93-119, School of Computer Science, Carnegie Mellon Univ., 1993. Google Scholar
- {18} Liviu Iftode, Jaswinder Pal Singh, and Kai Li, "Scope Consistency: A Bridge Between Release Consistency and Entry Consistency," ACM Symposium on Parallel Algorithms and Architectures (SPAA'96), Jun. 1996, pp. 277-287. Google ScholarDigital Library
- {19} J. M. Bull, "Measuring Synchronization and Scheduling Overheads in OpenMP," In Proceedings of European Workshop on OpenMP (EWOMP'99), Sep. 1999.Google Scholar
- {20} Hee-Chul Yun, Sang-Kwon Lee, Joonwon Lee, Seungryoul Maeng, "An Efficient Lock Protocol for Home-based Lazy Release Consistency," International Workshop on Software Distributed Shared Memory System, May 2001, pp. 527-532. Google ScholarDigital Library
- {21} David Bailey, TimHarris, William Saphir, Rob van der Wijngaart, Alex Woo, and Maurice Yarrow, "The NAS Parallel Benchmarks". Technical Report, NAS-95-020, 1995.Google Scholar
- {22} Joseph Robicheaux, http://www.openmp.org/samples/jacobi.f, 1998.Google Scholar
- {23} Bill Magro, Kuck, and Associates, http://www.openmp.org/samples/md.f, 1998.Google Scholar
Recommendations
Hybrid bulk synchronous parallelism library for clustered smp architectures
HLPP '10: Proceedings of the fourth international workshop on High-level parallel programming and applicationsThis paper presents the design and implementation of BSP++, a C++ parallel programming library based on the Bulk Synchronous Parallelism model to perform high performance computing on both SMP and SPMD architectures using OpenMPI and MPI. We show how C++...
Overlapping communication and computation with OpenMP and MPI
Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is ...
MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition
SC '02: Proceedings of the 2002 ACM/IEEE conference on SupercomputingWe investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-...
Comments