skip to main content
10.1145/2535753.2535757acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Published:17 November 2013Publication History

ABSTRACT

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints paved the way for the development of multi and manycore processors. Research on the performance and the energy efficiency of numerical kernels on multicores are common but studies in the context of manycores are sparse. Unlike these works, in this paper we analyze a well-known irregular NP-complete problem, the Traveling-Salesman Problem (TSP). This study investigates two aspects of the TSP on multicore, NUMA, and manycore processors. First, we concentrate on the nontrivial task of adapting this application to a manycore, specifically the novel MPPA-256 manycore processor. Then, we analyze its performance and energy consumption on different platforms that comprise general-purpose and low-power multicores, a NUMA machine, and the MPPA-256 manycore. Our results show that applications able to fully use the resources of a manycore can have better performance and may consume 9.8 and 13 times less energy when compared to low-power and general-purpose multicore processors, respectively.

References

  1. P. Aubry, P.-E. Beaucamps, and F. Blanc et. al. Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor. In International Conference on Computational Science (ICCS), volume 18, pages 1624--1633, Barcelona, Spain, 2013. Elsevier.Google ScholarGoogle Scholar
  2. D. Brooks, P. Bose, and S. E. Schuster et. al. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro, 20(6): 26--44, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. D. de Dinechin, P. G. de Massasa, and G. Lagera et. al. A Distributed Run-Time Environment for the Kalray MPPA-256 Integrated Manycore Processor. In Intl. Conference on Computational Science (ICCS), volume 18, pages 1654--1663, Barcelona, Spain, 2013. Elsevier.Google ScholarGoogle Scholar
  4. D. Göddeke and Dimitri Komatitsch et al. Energy Efficiency vs. Performance of the Numerical Solution of PDEs: An Application Study on a Low-power ARM-based Cluster. J. Comput. Physics, 237: 132--150, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Hähnel, B. Döbel, M. Völp, and H. Härtig. Measuring Energy Consumption for Short Code Paths Using RAPL. ACM Sigmetrics Performance Evaluation Review, 40(3): 13--17, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. V. Kale and G. Zheng. Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects. In M. Parashar and X. Li, editors, Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications, chapter 13. John Wiley & Sons, Inc., Hoboken, NUSA, 2009.Google ScholarGoogle Scholar
  7. G. Laporte. The Traveling Salesman Problem: An Overview of Exact and Approximate Algorithms. European Journal of Operational Research, 59(2): 231--247, June 1992.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Larus. Spending Moore's Dividend. Communications of the ACM, 52: 62--69, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Li, Hui et. al. Locality and Loop Scheduling on NUMA Multiprocessors. In International Conference on Parallel Processing (ICPP), volume 2, pages 140--147, Syracuse, USA, 1993. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Rajovic et. al. The Low-Power Architecture Approach Towards Exascale Computing. In Workshop on Scalable Algorithms for Large-Scale Systems (ScalA), pages 1--2, New York, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Ou, B. Pang, Y. Deng, J. Nurminen, A. Ylä-Jääski, and P. Hui. Energy and Cost-Efficiency Analysis of ARM-Based Clusters. In IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 115--123, Ottawa, Canada, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. L. Padoin, D. A. G. de Oliveira, P. Velho, and P. Navaux. Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon. In Workshop on Applications for Multi-Core Architectures (WAMCA), pages 48--53, New York, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Rotem, A. Naveh, A. Ananthakrishnan, and E. Weissmann et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro, 32(2): 20--27, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Stanisic, B. Videau, J. Cronsioe, and A. Degomme et al. Performance Analysis of HPC Applications on Low-Power Embedded Platforms. In Design, Automation & Test in Europe (DATE), pages 475--480, Grenoble, France, 2013. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tilera Corporation. TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family. Accessed: September 2013.Google ScholarGoogle Scholar
  16. E. Totoni and B. Behzad et. al. Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs. In IEEE Intl. Symposium on Performance Analysis of Systems and Software (ISPASS), pages 78--87, New Brunswick, Canada, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
              November 2013
              92 pages
              ISBN:9781450325035
              DOI:10.1145/2535753

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 17 November 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              IA3 '13 Paper Acceptance Rate6of21submissions,29%Overall Acceptance Rate18of67submissions,27%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader