skip to main content
research-article
Public Access

Loop and data transformations for sparse matrix code

Published:03 June 2015Publication History
Skip Abstract Section

Abstract

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.

References

  1. H. M. Aktulga, A. Buluç, S. Williams, and C. Yang. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Arnold, J. Hölzl, A. S. Köksal, R. Bodík, and M. Sagiv. Specifying and verifying sparse matrix codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP), ICFP ’10, pages 249–260, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.1, Argonne National Laboratory, 2010.Google ScholarGoogle Scholar
  6. A. Basumallik and R. Eigenmann. Optimizing irregular sharedmemory applications for distributed-memory systems. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Belgin, G. Back, and C. J. Ribbens. Pattern-based Sparse Matrix Representation for Memory-Efficient SMVM Kernels. In ICS ’09, pages 100–109, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of SC ’09, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Bik and H. Wijshoff. On automatic data structure selection and code generation for sparse computations. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 768 of Lecture Notes in Computer Science, pages 57–75. Springer Berlin Heidelberg, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Bik and H. A. Wijshoff. Advanced compiler optimizations for sparse computations. In Supercomputing ’93 Proceedings, pages 430– 439, Nov 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. J. C. Bik. Compiler Support for Sparse Matrix Computations. PhD thesis, Leiden University, May 1996.Google ScholarGoogle Scholar
  12. A. J. C. Bik and H. A. G. Wijshoff. Automatic data structure selection and transformation for sparse matrix computations. IEEE Trans. Parallel Distrib. Syst., 7(2):109–126, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Buluç and J. R. Gilbert. Highly parallel sparse matrix-matrix multiplication. CoRR, abs/1006.2183, 2010.Google ScholarGoogle Scholar
  14. A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications, 2010.Google ScholarGoogle Scholar
  15. C. Chen. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’12, pages 499–508, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Davis. The University of Florida Sparse Matrix Collection. NA Digest, 97, 1997.Google ScholarGoogle Scholar
  17. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229–241, New York, NY, USA, May 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79–103, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Hall, J. Chame, J. Shin, C. Chen, G. be Rudy, and M. M. Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, October, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7): 606–618, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. A. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, University of Maryland, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Kotlyar and K. Pingali. Sparse code generation for imperfectly nested loops with dependences. In Proceedings of the 11th International Conference on Supercomputing, ICS ’97, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Kotlyar, K. Pingali, and P. Stodghill. A relational approach to the compilation of sparse matrix programs. In C. Lengauer, M. Griebl, and S. Gorlatch, editors, Euro-Par’97 Parallel Processing, volume 1300 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 273–282, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716–727, Apr. 2012. ISSN 2150-8097. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Lowell, J. Godwin, J. Holewinski, D. Karthik, C. Choudary, A. Mametjanov, B. Norris, G. Sabin, P. Sadayappan, and J. Sarich. Stencil-aware gpu optimization of iterative solvers. SIAM J. Scientific Computing, pages –1–1, 2013.Google ScholarGoogle Scholar
  28. N. Mateev, K. Pingali, P. Stodghill, and V. Kotlyar. Next-generation generic programming and its application to sparse matrix computations. In Proceedings of the 14th International Conference on Supercomputing, pages 88–99, Santa Fe, New Mexico, USA, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225–236, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 29(3):217–247, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputing, pages 140–152, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192–202, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Oancea and L. Rauchwerger. A hybrid approach to proving memory reference monotonicity. In S. Rajopadhye and M. Mills Strout, editors, Languages and Compilers for Parallel Computing, volume 7146 of Lecture Notes in Computer Science, pages 61–75. Springer Berlin Heidelberg, 2013.Google ScholarGoogle Scholar
  34. W. Pugh and T. Shpeisman. Sipr: A new framework for generating efficient code for sparse matrix computations. In Proceedings of the Eleventh International Workshop on Languages and Compilers for Parallel Computing, Chapel Hill, North Carolina, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. Quilleré and S. Rajopadhye. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28 (5):469–498, Oct. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Rauchwerger and D. Padua. The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, PLDI ’95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of SC’12, November 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Rudy, M. M. Khan, M. Hall, C. Chen, and C. Jacqueline. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd international conference on Languages and com pilers for parallel computing, LCPC’10, pages 136–150, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Saltz, C. Chang, G. Edjlali, Y.-S. Hwang, B. Moon, R. Ponnusamy, S. Sharma, A. Sussman, M. Uysal, G. Agrawal, R. Das, and P. Havlak. Programming irregular applications: Runtime support, compilation and tools. Advances in Computers, 45:105–153, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. Shantharam, A. Chatterjee, and P. Raghavan. Exploiting dense substructures for fast sparse matrix vector multiplication. Int. J. High Perform. Comput. Appl., 25(3):328–341, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An approach for code generation in the sparse polyhedral framework. Technical Report CS-13-109, Colorado State University, December 2013.Google ScholarGoogle Scholar
  42. H. van der Spek and H. Wijshoff. Sublimation: Expanding data structures to enable data instance specific optimizations. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, pages 106–120. Springer Berlin / Heidelberg, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proceedings of the 15th International Conference on Compiler Construction, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Vuduc and H. Moon. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the High Performance Computing and Communications, volume 3726 of LNCS, pages 807–816. Springer, 2005. ISBN 978-3-540-29031-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521–530, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  47. R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178 – 194, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Williams, N. Bell, J. Choi, M. Garland, L. Oliker, and R. Vuduc. Sparse matrix vector multiplication on multicore and accelerator systems. In J. Kurzak, D. A. Bader, and J. Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  50. B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize noncoalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Loop and data transformations for sparse matrix code

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 6
        PLDI '15
        June 2015
        630 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2813885
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2015
          630 pages
          ISBN:9781450334686
          DOI:10.1145/2737924

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 June 2015

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader