research-article

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

Authors:
Jiajia Li

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Guangming Tan

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Mingyu Chen

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Ninghui Sun

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2013Pages 117–126https://doi.org/10.1145/2491956.2462181

Published:16 June 2013Publication History

PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 117–126

ABSTRACT

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. SMAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multigrid solver from Hypre library with above 20% performance improvement reported.

References

Intel Math Kernel Library. URL http://software.intel.com/en-us/intel-mkl.Google Scholar
Data Mining Tools See5 and C5.0. URL http://www.rulequest.com/see5-info.html.Google Scholar
M. D. Adam Hill. The international thermonuclear experimental reactor. Technical report, 2005.Google Scholar
J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. Petabricks: a language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 38--49. ACM, 2009. ISBN 978-1-60558-392-1. Google ScholarDigital Library
W. Armstrong and A. Rendell. Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection. In Cluster Computing, 2008 IEEE International Conference on, pages 411--420, 2008.Google ScholarCross Ref
S. Balay, J. Brown, , K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.3, Argonne National Laboratory, 2012.Google Scholar
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.Google Scholar
G. Belter, E. R. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 59:1--59:12, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-744-8. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international conference on World Wide Web 7, WWW7, pages 107--117. Elsevier Science Publishers B. V., 1998. Google ScholarDigital Library
A. Buluc, S. Williams, L. Oliker, and J. Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 721--733. IEEE Computer Society, 2011. Google ScholarDigital Library
J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on gpus. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 115--126. ACM, 2010. Google ScholarDigital Library
T. A. Davis. The university of florida sparse matrix collection. NA DIGEST, 92, 1994.Google Scholar
R. Falgout. An introduction to algebraic multigrid computing. Computing in Science Engineering, 8(6):24--33, nov.-dec. 2006. ISSN 1521-9615. Google ScholarDigital Library
R. D. Falgout and U. M. Yang. hypre: a library of high performance preconditioners. In Preconditioners, Lecture Notes in Computer Science, pages 632--641, 2002. Google ScholarDigital Library
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".Google ScholarCross Ref
D. Grewe and A. Lokhmotov. Automatically generating and tuning gpu code for sparse matrix-vector multiplication from a high-level representation. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 12:1--12:8. ACM, 2011. Google ScholarDigital Library
J. Harris. poski: An extensible autotuning framework to perform optimized spmvs on multicore architectures. 2009.Google Scholar
M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the trilinos project. ACM Trans. Math. Softw., 31(3):397--423, Sept. 2005. ISSN 0098-3500. Google ScholarDigital Library
E. jin Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18:2004, 2004. Google ScholarDigital Library
M. F. Khairoutdinov and D. A. Randall. A cloud resolving model as a cloud parameterization in the ncar community climate system model: Preliminary results. Geophys. Res. Lett., 28(18):3617C3620, 2001.Google ScholarCross Ref
K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. Csx: an extended compression format for spmv on shared memory systems. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 247--256. ACM, 2011. Google ScholarDigital Library
K. Nagar and J. Bakos. A sparse matrix personality for the convey hc-1. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pages 1--8, may 2011. Google ScholarDigital Library
CUDA CUSPARSE Library. NVIDIA, 2010.Google Scholar
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarCross Ref
Y. Saad. Sparskit : a basic tool kit for sparse matrix computations. Technical Report, 1994.Google Scholar
N. Spirin and J. Han. Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl., 13(2):50--64, May 2012. ISSN 1931-0145. Google ScholarDigital Library
A. Srinivasa and M. Sosonkina. Nonuniform memory affinity strategy in multithreaded sparse matrix computations. In Proceedings of the 2012 Symposium on High Performance Computing, HPC '12, pages 9:1--9:8, 2012. Google ScholarDigital Library
J. P. Stevenson, A. Firoozshahian, A. Solomatnikov, M. Horowitz, and D. Cheriton. Sparse matrix-vector multiply on the hicamp architecture. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 195--204, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1316-2. Google ScholarDigital Library
B.-Y. Su and K. Keutzer. clspmv: A cross-platform opencl spmv framework on gpus. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 353--364. ACM, 2012. Google ScholarDigital Library
X. Sun, Y. Zhang, T. Wang, G. Long, X. Zhang, and Y. Li. Crsd: application specific auto-tuning of spmv for diagonal sparse matrices. In Proceedings of the 17th international conference on Parallel processing - Volume Part II, Euro-Par'11, pages 316--327, 2011. Google ScholarDigital Library
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521--530, 2005.Google ScholarCross Ref
R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the First international conference on High Performance Computing and Communications, HPCC'05, pages 807--816. Springer-Verlag, 2005. Google ScholarDigital Library
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '98, pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society. ISBN0-89791-984-X. Google ScholarDigital Library
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12. ACM, 2007. Google ScholarDigital Library
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput., 35(3):178--194, Mar. 2009. Google ScholarDigital Library
X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: implications for graph mining. Proc. VLDB Endow., 4(4):231--242, Jan. 2011. Google ScholarDigital Library
K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance blas?, 2005.Google Scholar

Index Terms

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...
Read More
SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms

The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a ...
Read More
A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs
TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery

Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2013
546 pages
ISBN:9781450320146
DOI:10.1145/2491956
General Chair:
Hans-J. Boehm
HP Labs
,
Program Chair:
Cormac Flanagan
University of California, Santa Cruz
ACM SIGPLAN Notices Volume 48, Issue 6
PLDI '13
June 2013
515 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499370
Issue’s Table of Contents
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SpMV
algebraic multi-grid
auto-tuning
data mining
sparse matrix-vector multiplication
Qualifiers
- research-article
Conference

Acceptance Rates
PLDI '13 Paper Acceptance Rate46of267submissions,17%Overall Acceptance Rate406of2,067submissions,20%
More
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 959
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs