research-article

Accelerating recommender systems using GPUs

Authors:

André Valente Rodrigues,

Inês DutraAuthors Info & Claims

SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Pages 879 - 884

https://doi.org/10.1145/2695664.2695850

Published: 13 April 2015 Publication History

Abstract

We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).

References

[1]

H. Andrews and C. Patterson. Singular value decompositions and digital image processing. Acoustics, Speech and Signal Processing, IEEE Transactions on, 24(1):26--53, Feb 1976.

[2]

E.-A. Baatarjav, S. Phithakkitnukoon, and R. Dantu. Group recommendation system for facebook. In Proceedings of the OTM Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems, OTM '08, pages 211--219, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[3]

O. Bretscher. Linear Algebra With Applications. Pearson Education, Boston, 2013.

[4]

R. Burke. The adaptive web. In P. Brusilovsky, A. Kobsa, and W. Nejdl, editors, Lecture Notes In Computer Science, Vol. 4321., chapter Hybrid Web Recommender Systems, pages 377--408. Springer-Verlag, Berlin, Heidelberg, 2007.

Digital Library

[5]

R. Chandra. Parallel Programming in OpenMP. High performance computing. Morgan Kaufmann, 2001.

Digital Library

[6]

J. Fang, A. L. Varbanescu, and H. Sips. A comprehensive performance comparison of cuda and opencl. In Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pages 216--225, Washington, DC, USA, 2011. IEEE Computer Society.

Digital Library

[7]

J. He. A Social Network-based Recommender System. PhD thesis, UCLA, Los Angeles, CA, USA, 2010. AAI3437557.

Digital Library

[8]

R. Hochberg. Matrix multiplication with cuda-a basic introduction to the cuda programming model. Shodor, 2012.

[9]

C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD, KDD '11, pages 1064--1072, New York, NY, USA, 2011. ACM.

Digital Library

[10]

Y. Koren and R. Bell. Advances in collaborative filtering. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, pages 145--186. Springer US, 2011.

[11]

A. Krishnamoorthy and D. Menon. Matrix inversion using cholesky decomposition. In Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2013, pages 70--72, Sept 2013.

[12]

T. Mahmood and F. Ricci. Improving recommender systems with adaptive conversational strategies. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, HT '09, pages 73--82, New York, NY, USA, 2009. ACM.

Digital Library

[13]

C. D. Meyer, editor. Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

[14]

P. Resnick and H. R. Varian. Recommender systems. Commun. ACM, 40(3):56--58, Mar. 1997.

Digital Library

[15]

J. Sanders and E. Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, 1st edition, 2010.

Digital Library

[16]

B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. Riedl. Application of dimensionality reduction in recommender system -- a case study. In IN ACM WEBKDD WORKSHOP, 2000.

[17]

G. Takács and D. Tikk. Alternating least squares for personalized ranking. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys '12, pages 83--90, New York, NY, USA, 2012. ACM.

Digital Library

[18]

N. Wilt. The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, 2013.

[19]

H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Parallel matrix factorization for recommender systems. Knowledge and Information Systems, pages 1--27, 2013.

Digital Library

[20]

D. Zachariah, M. Sundin, M. Jansson, and S. Chatterjee. Alternating least-squares for low-rank matrix reconstruction. Signal Processing Letters, IEEE, 19(4):231--234, April 2012.

[21]

G. Zhanchun and L. Yuying. Improving the collaborative filtering recommender system by using gpu. In Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on, pages 330--333, Oct 2012.

Digital Library

[22]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netix prize. In Proc. 4th Int'l Conf. Algorithmic Aspects in Information and Management, LNCS 5034, pages 337--348. Springer, 2008.

Digital Library

[23]

M. A. Zinkevich, A. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23, pages 2595--2603, 2010.

Cited By

Gao SYang Y(2022)A novel quantum recommender systemPhysica Scripta10.1088/1402-4896/aca4a898:1(010001)Online publication date: 19-Dec-2022
https://doi.org/10.1088/1402-4896/aca4a8
Chen JFang JLiu WYang C(2021)BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306494232:9(2291-2302)Online publication date: 1-Sep-2021
https://doi.org/10.1109/TPDS.2021.3064942
Mitroi BFrasincar FHung CCerny TShin DBechini A(2020)An elastic net regularized matrix factorization technique for recommender systemsProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373847(2184-2192)Online publication date: 30-Mar-2020
https://dl.acm.org/doi/10.1145/3341105.3373847
Show More Cited By

Index Terms

Accelerating recommender systems using GPUs

Recommendations

A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices

We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for multiple small matrices processed simultaneously on NVIDIA graphics processing units (GPUs). We focus on matrix sizes under 16. The implementation can be ...
Accelerating Single Iteration Performance of CUDA-Based 3D Reaction---Diffusion Simulations

The most commonly used approach for solving reaction---diffusion systems relies upon stencil computations. Although stencil computations feature low compute intensity, they place high demands on memory bandwidth. Fortunately, GPU computing allows for ...
Accelerating data transfer between host and device using idle GPU
GPGPU '22: Proceedings of the 14th Workshop on General Purpose Processing Using GPU

When running single-GPU applications on multi-GPU compute nodes, the remaining GPU devices are kept idle. We propose a novel technology to accelerate these single-GPU applications using the idle GPU devices. The data transfers between host and device ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

April 2015

2418 pages

ISBN:9781450331968

DOI:10.1145/2695664

Conference Chairs:
Roger L. Wainwright
University of Tulsa
,
Juan Manuel Corchado
University of Salamanca, Spain
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Jiman Hong
Soongsil University, South Korea

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SAC 2015

Sponsor:

SIGAPP

SAC 2015: Symposium on Applied Computing

April 13 - 17, 2015

Salamanca, Spain

Acceptance Rates

SAC '15 Paper Acceptance Rate 291 of 1,211 submissions, 24%;

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
157
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gao SYang Y(2022)A novel quantum recommender systemPhysica Scripta10.1088/1402-4896/aca4a898:1(010001)Online publication date: 19-Dec-2022
https://doi.org/10.1088/1402-4896/aca4a8
Chen JFang JLiu WYang C(2021)BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306494232:9(2291-2302)Online publication date: 1-Sep-2021
https://doi.org/10.1109/TPDS.2021.3064942
Mitroi BFrasincar FHung CCerny TShin DBechini A(2020)An elastic net regularized matrix factorization technique for recommender systemsProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373847(2184-2192)Online publication date: 30-Mar-2020
https://dl.acm.org/doi/10.1145/3341105.3373847
Chen JFang JLiu WTang TYang C(2018)clMF : A fine-grained and portable alternating least squares algorithm for parallel matrix factorizationFuture Generation Computer Systems10.1016/j.future.2018.04.071Online publication date: May-2018
https://doi.org/10.1016/j.future.2018.04.071
Yang XFang JChen JWu CTang TLu KGiorgi RBecchi MPalumbo F(2017)High Performance Coordinate Descent Matrix Factorization for Recommender SystemsProceedings of the Computing Frontiers Conference10.1145/3075564.3077625(117-126)Online publication date: 15-May-2017
https://dl.acm.org/doi/10.1145/3075564.3077625
Nisa ISukumaran-Rajam AKunchum RSadayappan P(2017)Parallel CCD++ on GPU for Matrix FactorizationProceedings of the General Purpose GPUs10.1145/3038228.3038240(73-83)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.1145/3038228.3038240
Chen JFang JLiu WTang TChen XYang C(2017)Efficient and Portable ALS Matrix Factorization for Recommender Systems2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.91(409-418)Online publication date: May-2017
https://doi.org/10.1109/IPDPSW.2017.91
Hewanadungodage CXia YLee J(2017)A GPU-oriented online recommendation algorithm for efficient processing of time-varying continuous data streamsKnowledge and Information Systems10.1007/s10115-016-0967-353:3(637-670)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s10115-016-0967-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten