ABSTRACT
Modern microprocessors are becoming increasingly parallel devices, and GPUs are at the leading edge of this trend. Designing parallel algorithms for manycore chips like the GPU can present interesting challenges, particularly for computations on sparse data structures. One particularly common example is the collection of sparse matrix solvers and combinatorial graph algorithms that form the core of many physical simulation techniques. Although seemingly irregular, these operations can often be implemented with data parallel operations that map very well to massively parallel processors.
- P. Alliez, G. Ucelli, C. Gotsman, and M. Attene. Recent advances in remeshing of surfaces. In L. D. Floriani and M. Spagnulo, editors, Shape Analysis and Structuring, pages 53--82. Springer, 2007.Google Scholar
- A.-L. Barabási and E. Bonabeau. Scale-free networks. Scientific American, 288:60--69, 2003.Google ScholarCross Ref
- G. E. Blelloch. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, Second edition, Sept. 2001. Google ScholarDigital Library
- CUDPP: CUDA data parallel primitives library. http://www.gpgpu.org/developer/cudpp/.Google Scholar
- G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins Univ. Press, Baltimore, Third edition, 1996.Google Scholar
- W. D. Hillis and J. Guy L. Steele. Data parallel algorithms. Commun. ACM, 29(12):1170--1183, 1986. Google ScholarDigital Library
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), Mar/Apr 2008. Google ScholarDigital Library
- M. E. J. Newman. The structure and function of complex networks. In SIAM Review, volume 45, pages 167--256, 2003.Google ScholarDigital Library
- J. Nickolls, I. Buck, K. Skadron, and M. Garland. CUDA: Scalable parallel programming. ACM Queue, Mar/Apr 2008. Google ScholarDigital Library
- NVIDIA Corporation. NVIDIA CUDA Programming Guide, Nov. 2007. Version 1.1.Google Scholar
- S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Graphics Hardware 2007, pages 97--106. ACM, Aug. 2007. Google ScholarDigital Library
- J. A. Stratton, S. S. Stone, and W. W. Hwu. M-CUDA: An efficient implementation of CUDA kernels on multi-cores. IMPACT Technical Report IMPACT-08-01, UIUC, Feb. 2008.Google Scholar
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. Supercomputing 2007, Nov. 2007. Google ScholarDigital Library
Index Terms
Sparse matrix computations on manycore GPU's
Recommendations
CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix ...
Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture
HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies ConferenceSpMV is the core algorithm in solving the sparse linear equations, which is widely used in many research and engineering application field. GPU is the most common coprocessor in high-performance computing domain, and has already been proven to ...
Model-driven autotuning of sparse matrix-vector multiply on GPUs
PPoPP '10We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts.
First, we describe several ...
Comments