research-article

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Authors:
Márcio Castro

Institute of Informatics, UFRGS, Brazil

Institute of Informatics, UFRGS, Brazil
View Profile

,
Emilio Francesquini

University of São Paulo, Brazil and University of Grenoble, France

University of São Paulo, Brazil and University of Grenoble, France
View Profile

,
Thomas M. Nguélé

University of Yaoundé, Cameroon

University of Yaoundé, Cameroon
View Profile

,
Jean-François Méhaut

University of Grenoble, France

University of Grenoble, France
View Profile

IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and AlgorithmsNovember 2013Article No.: 5Pages 1–8https://doi.org/10.1145/2535753.2535757

Published:17 November 2013Publication History

IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

Pages 1–8

ABSTRACT

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints paved the way for the development of multi and manycore processors. Research on the performance and the energy efficiency of numerical kernels on multicores are common but studies in the context of manycores are sparse. Unlike these works, in this paper we analyze a well-known irregular NP-complete problem, the Traveling-Salesman Problem (TSP). This study investigates two aspects of the TSP on multicore, NUMA, and manycore processors. First, we concentrate on the nontrivial task of adapting this application to a manycore, specifically the novel MPPA-256 manycore processor. Then, we analyze its performance and energy consumption on different platforms that comprise general-purpose and low-power multicores, a NUMA machine, and the MPPA-256 manycore. Our results show that applications able to fully use the resources of a manycore can have better performance and may consume 9.8 and 13 times less energy when compared to low-power and general-purpose multicore processors, respectively.

References

P. Aubry, P.-E. Beaucamps, and F. Blanc et. al. Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor. In International Conference on Computational Science (ICCS), volume 18, pages 1624--1633, Barcelona, Spain, 2013. Elsevier.Google Scholar
D. Brooks, P. Bose, and S. E. Schuster et. al. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro, 20(6): 26--44, 2000. Google ScholarDigital Library
B. D. de Dinechin, P. G. de Massasa, and G. Lagera et. al. A Distributed Run-Time Environment for the Kalray MPPA-256 Integrated Manycore Processor. In Intl. Conference on Computational Science (ICCS), volume 18, pages 1654--1663, Barcelona, Spain, 2013. Elsevier.Google Scholar
D. Göddeke and Dimitri Komatitsch et al. Energy Efficiency vs. Performance of the Numerical Solution of PDEs: An Application Study on a Low-power ARM-based Cluster. J. Comput. Physics, 237: 132--150, 2013. Google ScholarDigital Library
M. Hähnel, B. Döbel, M. Völp, and H. Härtig. Measuring Energy Consumption for Short Code Paths Using RAPL. ACM Sigmetrics Performance Evaluation Review, 40(3): 13--17, 2012. Google ScholarDigital Library
L. V. Kale and G. Zheng. Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects. In M. Parashar and X. Li, editors, Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications, chapter 13. John Wiley & Sons, Inc., Hoboken, NUSA, 2009.Google Scholar
G. Laporte. The Traveling Salesman Problem: An Overview of Exact and Approximate Algorithms. European Journal of Operational Research, 59(2): 231--247, June 1992.Google ScholarCross Ref
J. Larus. Spending Moore's Dividend. Communications of the ACM, 52: 62--69, 2009. Google ScholarDigital Library
Li, Hui et. al. Locality and Loop Scheduling on NUMA Multiprocessors. In International Conference on Parallel Processing (ICPP), volume 2, pages 140--147, Syracuse, USA, 1993. IEEE Computer Society. Google ScholarDigital Library
N. Rajovic et. al. The Low-Power Architecture Approach Towards Exascale Computing. In Workshop on Scalable Algorithms for Large-Scale Systems (ScalA), pages 1--2, New York, USA, 2011. ACM. Google ScholarDigital Library
Z. Ou, B. Pang, Y. Deng, J. Nurminen, A. Ylä-Jääski, and P. Hui. Energy and Cost-Efficiency Analysis of ARM-Based Clusters. In IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 115--123, Ottawa, Canada, 2012. IEEE Computer Society. Google ScholarDigital Library
E. L. Padoin, D. A. G. de Oliveira, P. Velho, and P. Navaux. Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon. In Workshop on Applications for Multi-Core Architectures (WAMCA), pages 48--53, New York, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
E. Rotem, A. Naveh, A. Ananthakrishnan, and E. Weissmann et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro, 32(2): 20--27, 2012. Google ScholarDigital Library
L. Stanisic, B. Videau, J. Cronsioe, and A. Degomme et al. Performance Analysis of HPC Applications on Low-Power Embedded Platforms. In Design, Automation & Test in Europe (DATE), pages 475--480, Grenoble, France, 2013. IEEE Computer Society. Google ScholarDigital Library
Tilera Corporation. TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family. Accessed: September 2013.Google Scholar
E. Totoni and B. Behzad et. al. Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs. In IEEE Intl. Symposium on Performance Analysis of Systems and Software (ISPASS), pages 78--87, New Brunswick, Canada, 2012. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Recommendations

On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms

Until the last decade, performance of HPC architectures has been almost exclusively quantified by their processing power. However, energy efficiency is being recently considered as important as raw performance and has become a critical aspect to the ...
Read More
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Improving the performance of actor model runtime environments on multicore and manycore platforms
AGERE! 2013: Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control

The actor model is present in many systems that demand substantial computing resources which are often provided by multicore and multiprocessor platforms such as non-uniform memory access architectures (NUMA) and manycore processors. Yet, no mainstream ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
November 2013
92 pages
ISBN:9781450325035
DOI:10.1145/2535753
Conference Chairs:
Antonino Tumeo
PNNL
,
John Feo
PNNL
,
Oreste Villa
NVIDIA
,
Simone Secchi
Università di Cagliari, Italy
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NUMA
TSP
energy
manycore
multicore
performance
Qualifiers
- research-article
Conference

Acceptance Rates
IA³ '13 Paper Acceptance Rate6of21submissions,29%Overall Acceptance Rate18of67submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 246
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms

A performance study of general-purpose applications on graphics processors using CUDA

Improving the performance of actor model runtime environments on multicore and manycore platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms

A performance study of general-purpose applications on graphics processors using CUDA

Improving the performance of actor model runtime environments on multicore and manycore platforms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms