ABSTRACT
Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. SMAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multigrid solver from Hypre library with above 20% performance improvement reported.
- Intel Math Kernel Library. URL http://software.intel.com/en-us/intel-mkl.Google Scholar
- Data Mining Tools See5 and C5.0. URL http://www.rulequest.com/see5-info.html.Google Scholar
- M. D. Adam Hill. The international thermonuclear experimental reactor. Technical report, 2005.Google Scholar
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. Petabricks: a language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 38--49. ACM, 2009. ISBN 978-1-60558-392-1. Google ScholarDigital Library
- W. Armstrong and A. Rendell. Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection. In Cluster Computing, 2008 IEEE International Conference on, pages 411--420, 2008.Google ScholarCross Ref
- S. Balay, J. Brown, , K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.3, Argonne National Laboratory, 2012.Google Scholar
- N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.Google Scholar
- G. Belter, E. R. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 59:1--59:12, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-744-8. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international conference on World Wide Web 7, WWW7, pages 107--117. Elsevier Science Publishers B. V., 1998. Google ScholarDigital Library
- A. Buluc, S. Williams, L. Oliker, and J. Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 721--733. IEEE Computer Society, 2011. Google ScholarDigital Library
- J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on gpus. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 115--126. ACM, 2010. Google ScholarDigital Library
- T. A. Davis. The university of florida sparse matrix collection. NA DIGEST, 92, 1994.Google Scholar
- R. Falgout. An introduction to algebraic multigrid computing. Computing in Science Engineering, 8(6):24--33, nov.-dec. 2006. ISSN 1521-9615. Google ScholarDigital Library
- R. D. Falgout and U. M. Yang. hypre: a library of high performance preconditioners. In Preconditioners, Lecture Notes in Computer Science, pages 632--641, 2002. Google ScholarDigital Library
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".Google ScholarCross Ref
- D. Grewe and A. Lokhmotov. Automatically generating and tuning gpu code for sparse matrix-vector multiplication from a high-level representation. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 12:1--12:8. ACM, 2011. Google ScholarDigital Library
- J. Harris. poski: An extensible autotuning framework to perform optimized spmvs on multicore architectures. 2009.Google Scholar
- M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the trilinos project. ACM Trans. Math. Softw., 31(3):397--423, Sept. 2005. ISSN 0098-3500. Google ScholarDigital Library
- E. jin Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18:2004, 2004. Google ScholarDigital Library
- M. F. Khairoutdinov and D. A. Randall. A cloud resolving model as a cloud parameterization in the ncar community climate system model: Preliminary results. Geophys. Res. Lett., 28(18):3617C3620, 2001.Google ScholarCross Ref
- K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. Csx: an extended compression format for spmv on shared memory systems. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 247--256. ACM, 2011. Google ScholarDigital Library
- K. Nagar and J. Bakos. A sparse matrix personality for the convey hc-1. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pages 1--8, may 2011. Google ScholarDigital Library
- CUDA CUSPARSE Library. NVIDIA, 2010.Google Scholar
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarCross Ref
- Y. Saad. Sparskit : a basic tool kit for sparse matrix computations. Technical Report, 1994.Google Scholar
- N. Spirin and J. Han. Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl., 13(2):50--64, May 2012. ISSN 1931-0145. Google ScholarDigital Library
- A. Srinivasa and M. Sosonkina. Nonuniform memory affinity strategy in multithreaded sparse matrix computations. In Proceedings of the 2012 Symposium on High Performance Computing, HPC '12, pages 9:1--9:8, 2012. Google ScholarDigital Library
- J. P. Stevenson, A. Firoozshahian, A. Solomatnikov, M. Horowitz, and D. Cheriton. Sparse matrix-vector multiply on the hicamp architecture. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 195--204, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1316-2. Google ScholarDigital Library
- B.-Y. Su and K. Keutzer. clspmv: A cross-platform opencl spmv framework on gpus. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 353--364. ACM, 2012. Google ScholarDigital Library
- X. Sun, Y. Zhang, T. Wang, G. Long, X. Zhang, and Y. Li. Crsd: application specific auto-tuning of spmv for diagonal sparse matrices. In Proceedings of the 17th international conference on Parallel processing - Volume Part II, Euro-Par'11, pages 316--327, 2011. Google ScholarDigital Library
- R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521--530, 2005.Google ScholarCross Ref
- R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the First international conference on High Performance Computing and Communications, HPCC'05, pages 807--816. Springer-Verlag, 2005. Google ScholarDigital Library
- R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '98, pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society. ISBN0-89791-984-X. Google ScholarDigital Library
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12. ACM, 2007. Google ScholarDigital Library
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput., 35(3):178--194, Mar. 2009. Google ScholarDigital Library
- X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: implications for graph mining. Proc. VLDB Endow., 4(4):231--242, Jan. 2011. Google ScholarDigital Library
- K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance blas?, 2005.Google Scholar
Index Terms
- SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Recommendations
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...
SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a ...
A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs
TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital DiscoverySparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an ...
Comments