research-article

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

Authors:
Konstantinos Kanellopoulos

ETH Zürich

ETH Zürich
View Profile

,
Nandita Vijaykumar

Carnegie Mellon University and ETH Zürich

Carnegie Mellon University and ETH Zürich
View Profile

,
Christina Giannoula

ETH Zürich and National Technical University of Athens

ETH Zürich and National Technical University of Athens
View Profile

,
Roknoddin Azizi

ETH Zürich

ETH Zürich
View Profile

,
Skanda Koppula

ETH Zürich

ETH Zürich
View Profile

,
Nika Mansouri Ghiasi

ETH Zürich

ETH Zürich
View Profile

,
Taha Shahroodi

ETH Zürich

ETH Zürich
View Profile

,
Juan Gomez Luna

ETH Zürich

ETH Zürich
View Profile

,
Onur Mutlu

ETH Zürich and Carnegie Mellon University

ETH Zürich and Carnegie Mellon University
View Profile

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2019Pages 600–614https://doi.org/10.1145/3352460.3358286

Published:12 October 2019Publication History

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 600–614

ABSTRACT

Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing unnecessary computation on zero elements. However, compression techniques like Compressed Sparse Row (CSR) that are widely used today introduce significant instruction overhead and expensive pointer-chasing operations to discover the positions of the non-zero elements. In this paper, we identify the discovery of the positions (i.e., indexing) of non-zero elements as a key bottleneck in sparse matrix-based workloads, which greatly reduces the benefits of compression.

We propose SMASH, a hardware-software cooperative mechanism that enables highly-efficient indexing and storage of sparse matrices. The key idea of SMASH is to explicitly enable the hardware to recognize and exploit sparsity in data. To this end, we devise a novel software encoding based on a hierarchy of bitmaps. This encoding can be used to efficiently compress any sparse matrix, regardless of the extent and structure of sparsity. At the same time, the bitmap encoding can be directly interpreted by the hardware. We design a lightweight hardware unit, the Bitmap Management Unit (BMU), that buffers and scans the bitmap hierarchy to perform highly-efficient indexing of sparse matrices. SMASH exposes an expressive and rich ISA to communicate with the BMU, which enables its use in accelerating any sparse matrix computation.

We demonstrate the benefits of SMASH on four use cases that include sparse matrix kernels and graph analytics applications. Our evaluations show that SMASH provides average performance improvements of 38% for Sparse Matrix Vector Multiplication and 44% for Sparse Matrix Matrix Multiplication, over a state-of-the-art CSR implementation, on a wide variety of matrices with different characteristics. SMASH incurs a very modest hardware area overhead of up to 0.076% of an out-of-order CPU core.

References

Intel Math Kernel Library. http://software.intel.com/en-us/articles/intel-mkl/Google Scholar
Intel Xeon Gold 5118. https://ark.intel.com/content/www/us/en/ark/products/120473/intel-xeon-gold-5118-processor-16-5m-cache-2-30-ghz.html.Google Scholar
SMASH code. https://github.com/CMU-SAFARI/SMASHGoogle Scholar
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing," in ISCA, 2015.Google Scholar
J. Ahn, S. Yoo, O. Mutlu, and K. Y. Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture," 2015.Google Scholar
K. Akbudak and C. Aykanat, "Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures," TPDS, 2017.Google Scholar
M. Belgin, G. Back, and C. J. Ribbens, "Pattern-Based Sparse Matrix Representation for Memory-Efficient SMVM Kernels," in ISC, 2009.Google Scholar
M. Besta, F. Marending, E. Solomonik, and T. Hoefler, "SlimSell: A Vectorizable Graph Representation for Breadth-First Search," in IPDPS, 2017.Google Scholar
M. Besta, M. Podstawski, L. Groner, E. Solomonik, and T. Hoefler, "To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations," in HPDC, 2017.Google Scholar
J. Bolz, I. Farmer, E. Grinspun, and P. Schröder, "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid," SIGGRAPH, 2003.Google Scholar
A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, and O. Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks," ser. ASPLOS, 2018.Google ScholarDigital Library
S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine," in WWW, 1998.Google Scholar
A. Buluc and J. R. Gilbert, "Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication," in ICPP, 2008.Google Scholar
A. Buluç and J. R. Gilbert, "Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments," SISC, 2012.Google Scholar
A. Buluc, S. Williams, L. Oliker, and J. Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication," in IPDPS, 2011.Google Scholar
A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks," in SPAA, 2009.Google Scholar
J. Cui and Q. Qiu, "Towards Memristor based Accelerator for Sparse Matrix Vector Multiplication," in ISCAS, 2016.Google Scholar
S. Dalton, L. Olson, and N. Bell, "Optimizing Sparse Matrix-Matrix Multiplication for the GPU," TMS, 2015.Google Scholar
T. A. Davis and Y. Hu, "The University of Florida Sparse Matrix Collection," TOMS, 2011.Google Scholar
J. Dongarra, A. Lumsdaine, X. Niu, R. Pozoz, and K. Remington, "Sparse Matrix Libraries in C++ for High Performance Architectures," 1994.Google Scholar
A. Dziekonski and M. Mrozowski, "A GPU Solver for Sparse Generalized Eigenvalue Problems with Symmetric Complex-Valued Matrices Obtained Using Higher-Order FEM," IEEE Access, 2018.Google Scholar
A. Elafrou, G. Goumas, and N. Koziris, "Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi-and Many-Core Processors," in ICPP, 2017.Google Scholar
A. Elafrou, V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms," TOMS, 2018.Google Scholar
R. D. Falgout, "An Introduction to Algebraic Multigrid," Computing in Science Engineering, 2006.Google Scholar
R. D. Falgout and U. M. Yang, "hypre: A Library of High Performance Preconditioners," in ICCS, 2002.Google Scholar
J. Fowers, K. Ovtcharov, K. Strauss, E. S. Chung, and G. Stitt, "A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication," in FCCM, 2014.Google Scholar
L. Freeman, "A Set of Measures of Centrality Based on Betweenness," Sociometry, 1977.Google Scholar
M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, "TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory," in ASPLOS, 2017.Google Scholar
B. Graham, "Spatially-Sparse Convolutional Neural Networks," arXiv, 2014.Google Scholar
J. L. Greathouse and M. Daga, "Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format," in SC, 2014.Google Scholar
F. Gremse, A. Höfter, L. O. Schwen, F. Kiessling, and U. Naumann, "GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging," SIAM, 2015.Google Scholar
P. Grigoras, P. Burovskiy, E. Hung, and W. Luk, "Accelerating SpMV on FPGAs by Compressing Nonzero Values," in FCCM, 2015.Google Scholar
U. Gupta, X. Wang, M. Naumov, C. Wu, B. Reagen, D. Brooks, B. Cottel, K. M. Hazelwood, B. Jia, H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang, "The Architectural Implications of Facebook's DNN-based Personalized Recommendation," CoRR, 2019.Google Scholar
P. Hénon, P. Ramet, and J. Roman, "PASTIX: A High-Performance Parallel Direct Solver for Sparse Symmetric Positive Definite Systems," PMAA, 2002.Google Scholar
C. Hong, A. Sukumaran-Rajam, B. Bandyopadhyay, J. Kim, S. E. Kurt, I. Nisa, S. Sabhlok, Ü. V. Çatalyürek, S. Parthasarathy, and P. Sadayappan, "Efficient Sparse-Matrix Multi-Vector Product on GPUs," in HPDC, 2018.Google Scholar
K. Hsieh, S. M. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, "Accelerating Pointer Chasing in 3D-Stacked memory: Challenges, Mechanisms, Evaluation," 2016.Google Scholar
E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels," IJHPCA, 2004.Google Scholar
E.-J. Im and K. A. Yelick, "Optimizing Sparse Matrix Vector Multiplication on SMP." in PPSC, 1999.Google Scholar
J. Kestyn, V. Kalantzis, E. Polizzi, and Y. Saad, "PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-memory Linear Solvers," in SC, 2016.Google Scholar
F. Kjolstad, S. Chou, D. Lugato, S. Kamil, and S. Amarasinghe, "taco: A Tool to Generate Tensor Algebra Kernels," in ASE, 2017.Google Scholar
K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse Matrix-Vector Multiplication using Index and Value Compression," in CF, 2008.Google Scholar
K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris, "CSX: An Extended Compression Format for SpMV on Shared Memory Systems," in PPoPP, 2011.Google Scholar
N. Kurd, S. Bhamidipati, C. Mozak, J. L. Miller, T. M. Wilson, M. Nemani, and M. Chowdhury, "Westmere: A Family of 32nm IA Processors," in ISSCC, 2010.Google Scholar
D. Langr and P. Tvrdik, "Evaluation Criteria for Sparse Matrix Storage Formats," in TPDS, 2016.Google Scholar
J. Leskovec and R. Sosič, "Snap: A General-Purpose Network Analysis and Graph-Mining Library," TIST, 2016.Google Scholar
J. Li, G. Tan, M. Chen, and N. Sun, "SMAT: An Input Adaptive Auto-tuner for Sparse Matrix-vector Multiplication," in PLDI, 2013.Google Scholar
S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-P: Architecture-Level Modeling for SRAM-Based Structures with Advanced Leakage Reduction Techniques," in CAD, 2011.Google Scholar
C. Y. Lin, N. Wong, and H. K.-H. So, "Design Space Exploration for Sparse Matrix-Matrix Multiplication on FPGAs," FPT, 2013.Google Scholar
G. Linden, B. Smith, and J. York, "Amazon.com Recommendations: Item-to-Item Collaborative Filtering," IC, 2003.Google Scholar
B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky, "Sparse Convolutional Neural Networks," in CVPR, 2015.Google Scholar
W. Liu and B. Vinter, "An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data," in IPDPS, 2014.Google Scholar
W. Liu and B. Vinter, "A Framework for General Sparse Matrix-matrix Multiplication on GPUs and Heterogeneous Processors," JPDC, 2015.Google Scholar
W. Liu and B. Vinter, "CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication," in ICS, 2015.Google Scholar
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey, "Efficient Sparse Matrix-Vector Multiplication on x86-Based Many-Core Processors," in SC, 2013.Google Scholar
K. K. Matam, S. R. K. B. Indarapu, and K. Kothapalli, "Sparse Matrix-Matrix Multiplication on Modern Architectures," in HiPC, 2012.Google Scholar
J. Mellor-Crummey and J. Garvin, "Optimizing Sparse Matrix-Vector Product Computations using Unroll and Jam," IJHPCA, 2004.Google Scholar
D. Merrill and M. Garland, "Merge-Based Parallel Sparse Matrix-Vector Multiplication," in SC, 2016.Google Scholar
D. Merrill and M. Garland, "Merge-Based Sparse Matrix-Vector Multiplication (SpMV) using the CSR Storage Format," in PPoPP, 2016.Google Scholar
A. K. Mishra, E. Nurvitadhi, G. Venkatesh, J. Pearce, and D. Marr, "Fine-Grained Accelerators for Sparse Machine Learning Workloads," in ASP-DAC, 2017.Google Scholar
A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez, "Exploiting Locality in Graph Analytics Through Hardware-Accelerated Traversal Scheduling," in MICRO, 2018.Google Scholar
M. Naumov, D. Mudigere, H. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherniavskii, Y. Lu, R. Krishnamoorthi, A. Yu, V. Kondratenko, S. Pereira, X. Chen, W. Chen, V. Rao, B. Jia, L. Xiong, and M. Smelyanskiy, "Deep Learning Recommendation Model for Personalization and Recommendation Systems," CoRR, 2019.Google Scholar
R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick, "When Cache Blocking of Sparse Matrix Vector Multiply Works and Why," AAECC, 2007.Google Scholar
E. Nurvitadhi, A. Mishra, and D. Marr, "A Sparse Matrix Vector Multiply Accelerator for Support Vector Machine," in CASES, 2015.Google Scholar
E. Nurvitadhi, A. Mishra, Y. Wang, G. Venkatesh, and D. Marr, "Hardware Accelerator for Analytics of Sparse Data," in DAC, 2016.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web." Stanford InfoLab, Tech. Rep., 1999.Google Scholar
S. Pal, J. Beaumont, D.-H. Park, A. Amarnath, S. Feng, C. Chakrabarti, H.-S. Kim, D. Blaauw, T. Mudge, and R. Dreslinski, "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator," in HPCA, 2018.Google Scholar
G. Penn, "Efficient Transitive Closure of Sparse Matrices over Closed Semirings," AMiLP, 2006.Google Scholar
A. Pinar and M. T. Heath, "Improving Performance of Sparse Matrix-Vector Multiplication," in SC, 1999.Google Scholar
L. Ren, X. Chen, Y. Wang, C. Zhang, and H. Yang, "Sparse LU Factorization for Parallel Circuit Simulation on GPU," in DAC, 2012.Google Scholar
Y. Saad, Iterative Methods for Sparse Linear Systems, 2003.Google Scholar
D. Sanchez and C. Kozyrakis, "ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems," in ISCA, 2013.Google Scholar
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, "Scan Primitives for GPU Computing," in GH, 2007.Google Scholar
V. Seshadri, G. Pekhimenko, O. Ruwase, O. Mutlu, P. B. Gibbons, M. A. Kozuch, T. C. Mowry, and T. Chilimbi, "Page Overlays: An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management," in ISCA, 2015.Google Scholar
J. Shun and G. E. Blelloch, "Ligra: A Lightweight Graph Processing Framework for Shared Memory," in PPoPP, 2013.Google Scholar
A. Smith. 6 New Facts About Facebook. http://mediashift.orgGoogle Scholar
L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen, "GraphR: Accelerating Graph Processing using ReRAM," in HPCA, 2018.Google Scholar
B.-Y. Su and K. Keutzer, "clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs," in ICS, 2012.Google Scholar
P. D. Sulatycke and K. Ghose, "Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices," in IPPS, 1998.Google Scholar
S. Toledo, "Improving the Memory-System Performance of Sparse-Matrix Vector Multiplication," IBM Journal of research and development, 1997.Google Scholar
P.-A. Tsai, Y. L. Gan, and D. Sanchez, "Rethinking the Memory Hierarchy for Modern Languages," in MICRO, 2018.Google Scholar
P.-A. Tsai and D. Sanchez, "Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy," in ASPLOS, 2019.Google Scholar
Y. Umuroglu and M. Jahre, "An Energy Efficient Column-major Backend for FPGA SpMV Accelerators," in ICCD, 2014.Google Scholar
S. Van Dongen, "Graph Clustering via a Discrete Uncoupling Process," SIMAX, 2008.Google Scholar
N. Vijaykumar, E. Ebrahimi, K. Hsieh, P. B. Gibbons, and O. Mutlu, "The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs," in ISCA, 2018.Google Scholar
N. Vijaykumar, A. Jain, D. Majumdar, K. Hsieh, G. Pekhimenko, E. Ebrahimi, N. Hajinazar, P. B. Gibbons, and O. Mutlu, "A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory," in ISCA, 2018.Google Scholar
R. W. Vuduc and H.-J. Moon, "Fast Sparse Matrix-vector Multiplication by Exploiting Variable Block Structure," in HPCC, 2005.Google Scholar
Q. Wang, X. Zhang, Y. Zhang, and Q. Yi, "AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs," in SC, 2013.Google Scholar
J. B. White and P. Sadayappan, "On Improving the Performance of Sparse Matrix-Vector Multiplication," in HiPC, 1997.Google Scholar
J. Willcock and A. Lumsdaine, "Accelerating Sparse Matrix Computations via Data Compression," in ICS, 2006.Google Scholar
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms," in SC, 2007.Google Scholar
T. Wu, B. Wang, Y. Shan, F. Yan, Y. Wang, and N. Xu, "Efficient PageRank and SpMV Computation on AMD GPUs," in ICPP, 2010.Google Scholar
Z. Xianyi, W. Qian, and Z. Yunquan, "Model-driven Level 3 BLAS Performance Optimization on Loongson 3A processor," in ICPADS, 2012.Google Scholar
S. Yan, C. Li, Y. Zhang, and H. Zhou, "yaSpMV: Yet Another SpMV Framework on GPUs," in PPoPP, 2014.Google Scholar
L. Yavits and R. Ginosar, "Sparse Matrix Multiplication on CAM Based Accelerator," CoRR, 2017.Google Scholar
M. Zhang, Y. Zhuo, C. Wang, M. Gao, Y. Wu, K. Chen, C. Kozyrakis, and X. Qian, "GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition," in HPCA, 2018.Google Scholar
S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, "Cambricon-X: An Accelerator for Sparse Neural Networks," in MICRO, 2016.Google Scholar
Y. Zhao, J. Li, C. Liao, and X. Shen, "Bridging the Gap between Deep Learning and Sparse Matrix Format Selection," in PPoPP, 2018.Google Scholar
X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, "Cambricon-S: Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach," in MICRO, 2018.Google Scholar
Q. Zhu, T. Graf, H. E. Sumbul, L. Pileggi, and F. Franchetti, "Accelerating Sparse Matrix-Matrix Multiplication with 3D-Stacked Logic-in-Memory Hardware," in HPEC, 2013.Google Scholar
Y. Zhuo, C. Wang, M. Zhang, R. Wang, D. Niu, Y. Wang, and X. Qian, "GraphQ: Scalable PIM-Based Graph Processing," in MICRO, 2019.Google Scholar

Recommendations

Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA^T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, ...
Read More
A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

In this paper, a method via sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of a symmetric positive definite matrix is proposed. The resulting factorized sparse approximate inverse is used as a preconditioner for ...
Read More
Preconditioning Highly Indefinite and Nonsymmetric Matrices

Standard preconditioners, like incomplete factorizations, perform well when the coefficient matrix is diagonally dominant, but often fail on general sparse matrices. We experiment with nonsymmetric permutations and scalings aimed at placing large entries ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019
1104 pages
ISBN:9781450369381
DOI:10.1145/3352460

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerators
compression
efficiency
graph processing
hardware-software cooperation
linear algebra
memory
sparse matrices
specialized architectures
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 1,569
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

Preconditioning Highly Indefinite and Nonsymmetric Matrices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

Preconditioning Highly Indefinite and Nonsymmetric Matrices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media