- 1.Laudon, J. and Lenoski, D., "The SGI Origin: A ccNUMA Highly Scalable Server", Proc. of the 24th Int. Symposium on Computer Architecture (ISCA), pp. 241-251, June 1997. Google ScholarDigital Library
- 2.Charlesworth, A., "STARFIRE: Extending the SMP Envelope'', IEEE Micro, Jan/Feb 1998. Google ScholarDigital Library
- 3.OpenMP Organization, www. openmp, org.Google Scholar
- 4.Ayguad6, E., Martorell, X., Labarta, J., Gonzfilez, M. and Navarro, N., "Exploiting Parallelism Through Directives in the Nano-Threads Programming Model", 10th Workshop on Programming Languages and Compilers for Parallel Computing, Minneapolis (USA), August 1997. Google ScholarDigital Library
- 5.MIPS Technologies Inc., "MIPS R10000 Microprocessor User's Manual", version 2.0, January 1997.Google Scholar
- 6.Cortesi, D., Raithel, J., Tuthill, B., "IRIX Device Driver Programmer's Guide", doe. 007-0911-120, SGI, 1998.Google Scholar
- 7.Hall, M. W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.W., Bugnion, E. and Lam, M.S., "Maximizing Multiprocessor Performance with the SUIF Compiler", IEEE Computer, December 1996. Google ScholarDigital Library
- 8.Girkar, M., Haghighat, M.R., Grey, P., Saito, H., Stavrakos, N., Polychronopoulos, C.D., "Illinois-Intel Multithreading Library: Multithreading Support for Intel Architecture Based Multiprocessor Systems", Intel Technology Journal, Q1 issue, February 1998.Google Scholar
- 9.Foster, I., Kohr, D.R., Krishnaiyer, R., Choudhary, A., "Double Standards: Bringing Task Parallelism to HPF Via the Message Passing Interface". Supercomputing'96, November 1996. Google ScholarDigital Library
- 10.Gross, T., O'Halloran, D. and Subhlok, J., "Task Parallelism in a High Performance Fortran framework", IEEE Parallel and Distributed Technology, vol. 2, no. 3, Fall 1994. Google ScholarDigital Library
- 11.Ramaswamy, S., "Simultaneous Exploitation of Task and Data Parallelism in Regular Scientific Computations", Ph.D. Thesis, Univ. of Illinois at Urbana-Champaign, 1996. Google ScholarDigital Library
- 12.Martorell, X., Labarta, J., Navarro, N. and Ayguad6, E., "A Library Implementation of the Nano-Threads Programming Model", In Europar'96, August 1996. Google ScholarDigital Library
- 13.Polychronopoulos, C.D., Girkar, M. and Kleiman, S., "nano- Threads: A user-level threads architecture", technical report, CSRD, Univ. of Illinois at Urbana-Charnpaign, 1993.Google Scholar
- 14.Polychronopoulos, C.D., "Nano-threads: Compiler Driven Multithreading", In 4th Int. Workshop on Compilers for Parallel Computing, Delft (The Netherlands), December 1993.Google Scholar
- 15.Polychronopoulos, C.D., Girkar, M., Haghighat, M., Lee, C., Leung, B. and Schouten, D., "Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing and Scheduling Programs on Multiprocessors", Int. Conference on Parallel Processing (iCPP), St. Charles, Illinois, 1989.Google ScholarDigital Library
- 16.Silicon Graphics Computer Systems SGI, "Origin 200 and Origin 2000 Technical Report", 1996.Google Scholar
- 17.Waheed, A., Yah, J., "Parallel izati on of NAS Benchmarks for Shared Memory Multiprocessors", Technical Report NAS-98- 010, NASA, March 1998.Google Scholar
- 18.Bailey, D., Harris, T., Saphir, W., Wijngaart, R., Woo, A. and Yarrow, M., "The NAS Parallel Benchmarks 2.0", Technical Report NAS-95-020, NASA, December 1995.Google Scholar
- 19.SPEC Organization, "The Standard Performance Evaluation Corporation", www. spec.org.Google Scholar
Index Terms
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
Recommendations
Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators
Specialized implementations of ILUPACK's iterative solver for NUMA platforms.Specialized implementations of ILUPACK's iterative solver for many-core accelerators.Exploitation of task parallelism via OmpSs runtime (dynamic schedule).Exploitation of task ...
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingGeneral-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-...
Comments