skip to main content
10.1145/2726935.2726940acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Mahout on heterogeneous clusters using HadoopCL

Published:08 February 2015Publication History

ABSTRACT

MapReduce is a programming model capable of processing large amounts of data in parallel across hundreds of compute nodes in a cluster. Many applications have leveraged the power of this model, including a number of compute-hungry applications in machine learning. MapReduce can meet the demands of massive data analysis in applications such as web search, digital media selection, and online shopping analytics. The Mahout recommendation system is one of the most popular open source recommendation systems that employs machine learning techniques. Mahout provides us with a parallel computing infrastructure that can be applied to implement a range of applications that work with large datasets. A complementary trend in cluster computing has been the use of GPUs to increase the performance of data-intensive applications. There have been several efforts to utilize GPUs to accelerate the MapReduce framework to improve performance. HadoopCL is a framework that auto-generates OpenCL kernels from Hadoop tasks and can then run them across a heterogeneous cluster. In this paper, we analyze the performance of a Mahout recommendation system running on different cluster platforms, including CPU-only platforms and heterogeneous platforms, where both discrete GPUs and integrated APUs can be evaluated. We propose a cooperative HadoopCL model that improves both GPU/APU programming flexibility and performance.

References

  1. Apache mahout website. http://mahout.apache.org/.Google ScholarGoogle Scholar
  2. D. A. Alcantara, V. Volkov, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Amenta. Building an efficient hash table on the GPU. In W. W. Hwu, editor, GPU Computing Gems, volume 2, chapter 4, pages 39–53. Morgan Kaufmann, Oct. 2011..Google ScholarGoogle Scholar
  3. B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In In Workshop on Software Tools for MultiCore Systems, 2008. URL http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.141.1644.Google ScholarGoogle Scholar
  4. L. Chen, X. Huo, and G. Agrawal. Accelerating mapreduce on a coupled cpu-gpu architecture. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 25:1– 25:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. URL http://dl.acm.org/ citation.cfm?id=2388996.2389030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008. ISSN 0001-0782.. URL http://doi.acm.org/10. 1145/1327452.1327492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Gaster, L. Howes, D. Kaeli, P. Mistry, and D. Schaa. Heterogeneous computing with OpenCL, 2nd Edition. Morgan Kaufmann, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G.Frost. Aparapi website. https://code.google.com/p/ aparapi/.Google ScholarGoogle Scholar
  8. B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A mapreduce framework on graphics processors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, pages 260–269, New York, NY, USA, 2008. ACM. ISBN 978- 1-60558-282-5.. URL http://doi.acm.org/10.1145/ 1454115.1454152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to usenet news. Commun. ACM, 40(3):77–87, Mar. 1997. ISSN 0001-0782.. URL http://doi.acm.org/10. 1145/245108.245126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Lin, S. Okur, and C. Radoi. Hadoop+aparapi: Making heterogenous mapreduce programming easier. 2012.Google ScholarGoogle Scholar
  11. G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76–80, Jan 2003. ISSN 1089-7801.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NVIDIA Corporation. NVIDIA CUDA C Programming Guide, June 2011.Google ScholarGoogle Scholar
  13. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pages 285–295, New York, NY, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ACM. ISBN 1-58113-348-0.. URL http://doi.acm.org/ 10.1145/371920.372071.Google ScholarGoogle Scholar
  15. J. A. Stuart and J. D. Owens. Multi-gpu mapreduce on gpu clusters. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS ’’11, pages 1068–1079, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-0-7695-4385-7.. URL http: //dx.doi.org/10.1109/IPDPS.2011.102. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mahout on heterogeneous clusters using HadoopCL

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPAA 2015: Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications
          February 2015
          47 pages
          ISBN:9781450334051
          DOI:10.1145/2726935

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 February 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate6of7submissions,86%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader