research-article

Open access

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite

Authors:

Daniele De Sensi,

Tiziano De Matteis,

Massimo Torquati,

Gabriele Mencagli,

Marco DaneluttoAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 4

Article No.: 33, Pages 1 - 26

https://doi.org/10.1145/3132710

Published: 24 October 2017 Publication History

Abstract

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

References

[1]

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, and Angelo Troina. 2011. On designing multicore-aware simulators for biological systems. In Proceedings of the 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’11). 318--325.

Digital Library

[2]

Marco Aldinucci and Marco Danelutto. 1999. Stream parallel skeleton optimization. In Proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems. 966--962.

[3]

Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. 2012. An efficient unbounded lock-free queue for multi-core systems. In Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, Vol. 7484. Springer, 662--673.

Digital Library

[4]

Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. 2014. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice and Experience 26, 3, 800--820.

Digital Library

[5]

Bruno Bacci, Marco Danelutto, Salvatore Orlando, Susanna Pelagatti, and Marco Vanneschi. 1995. P3L: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience 7, 3, 225--255.

[6]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81.

Digital Library

[7]

Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654.

[8]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SIGPLAN Notices 30, 8, 207--216.

Digital Library

[9]

Andrea Bracciali, Marco Aldinucci, Murray Patterson, Tobias Marschall, Nadia Pisanti, Ivan Merelli, and Massimo Torquati. 2016. PWHATSHAP: Efficient haplotyping for future generation sequencing. BMC Bioinformatics 17, S-11, 342.

[10]

Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE, Los Alamitos, CA, 89--100.

Digital Library

[11]

Daniele Buono, Marco Danelutto, Tiziano De Matteis, Gabriele Mencagli, and Massimo Torquati. 2014. A lightweight run-time support for fast dense linear algebra on multi-core. In Proceedings of the 12th IASTED International Conference on Parallel and Distributed Computing and Networks.

[12]

Colin Campbell and Ade Miller. 2011. A Parallel Programming With Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, Redmond, WA.

Digital Library

[13]

Denis Caromel, Ludovic Henrio, and Mario Leyton. 2008. Type safe algorithmic skeletons. In Proceedings of the 16th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’08). 45--53.

Digital Library

[14]

Juan M. Cebrian, Magnus Jahre, and Lasse Natvig. 2015. ParVec: Vectorizing the PARSEC benchmark suite. Computing 97, 11, 1077--1100.

Digital Library

[15]

Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 363--375.

Digital Library

[16]

Barbara Chapman. 2007. The Multicore Programming Challenge. Springer, Berlin, Germany.

[17]

Dimitrios Chasapis, Marc Casas, Miquel Moretó, Raul Vidal, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Transactions on Architecture and Code Optimization 12, 4, Article No. 41, 22 pages.

Digital Library

[18]

Murray Cole. 2004. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30, 3, 389--406.

Digital Library

[19]

Marco Danelutto, Luca Deri, Daniele De Sensi, and Massimo Torquati. 2013. Deep packet inspection on commodity hardware using FastFlow. In Proceedings of the 15th International Parallel Computing Conference (ParCo’13). 92--99.

[20]

Marco Danelutto, José Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, and Massimo Torquati. 2016. Introducing parallelism by using REPARA C++11 attributes. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 354--358.

[21]

Marco Danelutto, Tiziano De Matteis, Daniele De Sensi, Gabriele Mencagli, and Massimo Torquati. 2017. P³ ARSEC: Towards parallel patterns benchmarking. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1582--1589.

Digital Library

[22]

Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. Lecture Notes in Computer Science, Vol. 8606. Springer, 29--75.

[23]

Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A reconfiguration algorithm for power-aware parallel applications. ACM Transactions on Architecture and Code Optimization 13, 4, Article No. 43, 25 pages.

Digital Library

[24]

Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2017. Mammut: High-level management of system knobs and sensors. SoftwareX 6, 150--154.

[25]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113.

Digital Library

[26]

David del Rio Astorga, Manuel F. Dolz, Javier Fernndez, and J. Daniel Garca. 2017. A generic parallel pattern interface for stream and data processing. Available at

[27]

Antonio J. Dios, Rafael Asenjo, Angeles Navarro, Francisco Corbera, and Emilio L. Zapata. 2010. Evaluation of the task programming model in the parallelization of wavefront problems. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC’10). 257--264.

Digital Library

[28]

Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@Intel Magazine 9, 2, 1--10.

[29]

Kento Emoto and Kiminori Matsuzaki. 2014. An automatic fusion mechanism for variable-length list skeletons in SkeTo. International Journal of Parallel Programming 42, 4, 546--563.

Digital Library

[30]

Johan Enmyren and Christoph W. Kessler. 2010. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-Level Parallel Programming and Applications (HLPP’10). ACM, New York, NY, 5--14.

Digital Library

[31]

Steffen Ernsting and Herbert Kuchen. 2012. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IInternational Journal of High Performance Computing and Networking 7, 2, 129--138.

Digital Library

[32]

August Ernstsson, Lu Li, and Christoph Kessler. 2017. SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems. Available at

Digital Library

[33]

Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Longman, Boston, MA.

Digital Library

[34]

Buğra Gedik, Habibe G. Özsema, and Özcan Öztürk. 2016. Pipelined fission for stream programs with dynamic selectivity and partitioned state. Journal of Parallel and Distributed Computing 96, C, 106--120.

Digital Library

[35]

Horacio González-Vélez and Mario Leyton. 2010. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40, 12, 1135--1160.

Digital Library

[36]

Clemens Grelck. 2005. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 15, 3, 353--401.

Digital Library

[37]

Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: A DSL for high-level and productive stream parallelism. Parallel Processing Letters 27, 1, 1--20.

[38]

Michael Haidl and Sergei Gorlatch. 2017. High-level programming for many-cores using C++14 and the STL. Available at

Digital Library

[39]

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 2, 1--12.

Digital Library

[40]

David Heath, Robert Jarrow, and Andrew Morton. 1992. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 1, 77--105.

[41]

Vladimir Janjic, Chris Brown, Kenneth Mackenzie, Kevin Hammond, Marco Danelutto, Marco Aldinucci, and José Daniel Garcia. 2016. RPL: A domain-specific language for designing and implementing parallel C++ applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 288--295.

[42]

I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. ACM Transactions on Parallel Computing 2, 3, Article No. 17, 42 pages.

Digital Library

[43]

Joeffrey Legaux, Frdric Loulergue, and Sylvain Jubertie. 2013. OSL: An algorithmic skeleton library with exceptions. Procedia Computer Science 18, 260--269. 2013 International Conference on Computational Science.

[44]

Mario Leyton and José M. Piquer. 2010. Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of the18th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 289--296.

Digital Library

[45]

Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2006. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating System Review 40, 4, 317--330.

Digital Library

[46]

Kirk Martinez and John Cupitt. 2005. VIPS—a highly tuned image processing software architecture. In Proceedings of the IEEE International Conference on Image Processing, Vol. 2. II--574--7.

[47]

Tiziano De Matteis and Gabriele Mencagli. 2016. Keep calm and react with foresight: Strategies for low-latency and energy-efficient elastic data stream processing. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16).

Digital Library

[48]

Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley Professional.

Digital Library

[49]

Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming. Morgan Kaufmann, San Francisco, CA.

Digital Library

[50]

Gabriele Mencagli, Massimo Torquati, Marco Danelutto, and Tiziano De Matteis. 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1.

Digital Library

[51]

John C. Munson and Sebastian G. Elbaum. 1998. Code churn: A measure for estimating the impact of code change. In Proceedings of the International Conference on Software Maintenance. 24--31.

Digital Library

[52]

Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE’05). 284--292.

Digital Library

[53]

Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE, Los Alamitos, CA, 281--290.

Digital Library

[54]

Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2014. Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49, 6, 169--180.

Digital Library

[55]

Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL—a portable skeleton library for high-level GPU programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW’11). 1176--1182.

Digital Library

[56]

Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013. In ECOOP 2013—Object-Oriented Programming. Lecture Notes in Computer Science, Vol. 7920. Springer, 52--78.

Digital Library

[57]

Marco Vanneschi. 2002. The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28, 12, 1709--1732.

Digital Library

[58]

Elaine J. Weyuker. 1988. Evaluating software complexity measures. IEEE Transactions on Software Engineering 14, 9, 1357--1365.

Digital Library

[59]

William A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1, 20--24.

Digital Library

Cited By

Laguna IChapman PParasyris KGeorgakoudis GRubio-González C(2024)Testing the Unknown: A Framework for OpenMP Testing via Random Program GenerationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00080(577-587)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00080
Ernstsson AGriebler DKessler C(2022)Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel SystemsInternational Journal of Parallel Programming10.1007/s10766-022-00746-151:1(61-82)Online publication date: 6-Dec-2022
https://doi.org/10.1007/s10766-022-00746-1
Garcia AGriebler DSchepke CFernandes L(2022)SPBench: a framework for creating benchmarks of stream processing applicationsComputing10.1007/s00607-021-01025-6105:5(1077-1099)Online publication date: 10-Jan-2022
https://dl.acm.org/doi/10.1007/s00607-021-01025-6
Show More Cited By

Index Terms

Bringing Parallel Patterns Out of the Corner: The P³ ARSEC Benchmark Suite
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

The correlation between parallel patterns and multi-core benchmarks
IWMSE '10: Proceedings of the 3rd International Workshop on Multicore Software Engineering

Parallel Patterns can be thought of as standard solutions used to evaluate parallelism used in software. Multi-core benchmarks can be thought of as standard codes used for evaluating parallelism in hardware. In this document, we discuss the relationship ...
Data Parallel Algorithmic Skeletons with Accelerator Support

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of ...
Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures

Widely adumbrated as patterns of parallel computation and communication, algorithmic skeletons introduce a viable solution for efficiently programming modern heterogeneous multi-core architectures equipped not only with traditional multi-core CPUs, but ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 14, Issue 4

December 2017

600 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3154814

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2017

Accepted: 01 August 2017

Revised: 01 July 2017

Received: 01 June 2017

Published in TACO Volume 14, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

EU H2020-ICT-2014-1

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
766
Total Downloads

Downloads (Last 12 months)97
Downloads (Last 6 weeks)16

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Laguna IChapman PParasyris KGeorgakoudis GRubio-González C(2024)Testing the Unknown: A Framework for OpenMP Testing via Random Program GenerationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00080(577-587)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00080
Ernstsson AGriebler DKessler C(2022)Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel SystemsInternational Journal of Parallel Programming10.1007/s10766-022-00746-151:1(61-82)Online publication date: 6-Dec-2022
https://doi.org/10.1007/s10766-022-00746-1
Garcia AGriebler DSchepke CFernandes L(2022)SPBench: a framework for creating benchmarks of stream processing applicationsComputing10.1007/s00607-021-01025-6105:5(1077-1099)Online publication date: 10-Jan-2022
https://dl.acm.org/doi/10.1007/s00607-021-01025-6
Janjic VBrown CBarwell A(2021)Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline PatternsInternational Journal of Parallel Programming10.1007/s10766-021-00716-z49:6(886-910)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00716-z
Vogel AMencagli GGriebler DDanelutto MFernandes L(2021)Online and transparent self-adaptation of stream parallel patternsComputing10.1007/s00607-021-00998-8105:5(1039-1057)Online publication date: 23-Aug-2021
https://dl.acm.org/doi/10.1007/s00607-021-00998-8
Rinaldi LTorquati MMencagli GDanelutto MCastegren EDe Koster JSchmidt T(2020)High-throughput stream processing with actorsProceedings of the 10th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control10.1145/3427760.3428338(1-10)Online publication date: 17-Nov-2020
https://dl.acm.org/doi/10.1145/3427760.3428338
Panagiotou SErnstsson AAhlqvist JPapadopoulos LKessler CSoudris DCorporaal H(2020)Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePUProceedings of the 23th International Workshop on Software and Compilers for Embedded Systems10.1145/3378678.3391889(74-77)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3378678.3391889
Nair RField TAmaral JKoziolek ATrubiani CIosup A(2020)GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux ApplicationsProceedings of the ACM/SPEC International Conference on Performance Engineering10.1145/3358960.3379136(257-264)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3358960.3379136
Metzger PCole MFensch CAldinucci MBini E(2020)Enforcing Deadlines for Skeleton-based Parallel Programming2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.000-7(188-199)Online publication date: Apr-2020
https://doi.org/10.1109/RTAS48715.2020.000-7
Fang JHuang CTang TWang Z(2020)Parallel programming models for heterogeneous many-cores: a comprehensive surveyCCF Transactions on High Performance Computing10.1007/s42514-020-00039-42:4(382-400)Online publication date: 31-Jul-2020
https://doi.org/10.1007/s42514-020-00039-4
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents