ABSTRACT
Matrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient descent (SGD) is one of the most popular algorithms for matrix factorization. However, as a sequential approach, SGD is difficult to be parallelized for handling web-scale problems. In this paper, we develop a fast parallel SGD method, FPSGD, for shared memory systems. By dramatically reducing the cache-miss rate and carefully addressing the load balance of threads, FPSGD is more efficient than state-of-the-art parallel algorithms for matrix factorization.
- R. M. Bell and Y. Koren. Lessons from the Netflix prize challenge. ACM SIGKDD Explorations Newsletter, 9(2):75--79, 2007. Google ScholarDigital Library
- K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. Coordinate descent method for large-scale L2-loss linear SVM. Journal of Machine Learning Research, 9:1369--1398, 2008. Google ScholarDigital Library
- G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The Yahoo! music dataset and KDD-Cup 11. In JMLR Workshop and Conference Proceedings: Proceedings of KDD Cup 2011, volume 18, pages 3--18, 2012.Google Scholar
- R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pages 69--77, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- K. B. Hall, S. Gilpin, and G. Mann. MapReduce/Bigtable for distributed optimization. In Neural Information Processing Systems Workshop on Leaning on Cores, Clusters, and Clouds, 2010.Google Scholar
- J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23(3):462--466, 1952.Google ScholarCross Ref
- Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, 2009. Google ScholarDigital Library
- G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1231--1239. 2009.Google ScholarDigital Library
- R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics (ACL), pages 456--464, 2010. Google ScholarDigital Library
- F. Niu, B. Recht, C. Ré, and S. J. Wright. HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 693--701, 2011.Google Scholar
- I. Pilászy, D. Zibriczky, and D. Tikk. Fast ALS-based matrix factorization for explicit and implicit feedback datasets. In Proceedings of the Fourth ACM Conference on Recommender Systems, pages 71--78, 2010. Google ScholarDigital Library
- H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400--407, 1951.Google ScholarCross Ref
- H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In Proceedings of the IEEE International Conference on Data Mining, pages 765--774, 2012. Google ScholarDigital Library
- Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In Proceedings of the Fourth International Conference on Algorithmic Aspects in Information and Management, pages 337--348, 2008. Google ScholarDigital Library
- M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2595--2603. 2010.Google Scholar
Index Terms
- A fast parallel SGD for matrix factorization in shared memory systems
Recommendations
A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems
Matrix factorization is known to be an effective method for recommender systems that are given only the ratings from users to items. Currently, stochastic gradient (SG) method is one of the most popular algorithms for matrix factorization. However, as a ...
Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium WorkshopRecommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users ...
A parallel matrix factorization based recommender by alternating stochastic gradient decent
Collaborative Filtering (CF) can be achieved by Matrix Factorization (MF) with high prediction accuracy and scalability. Most of the current MF based recommenders, however, are serial, which prevent them sharing the efficiency brought by the rapid ...
Comments