skip to main content
research-article
Open access

Bringing Parallel Patterns Out of the Corner: The P3 ARSEC Benchmark Suite

Published: 24 October 2017 Publication History

Abstract

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

References

[1]
Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, and Angelo Troina. 2011. On designing multicore-aware simulators for biological systems. In Proceedings of the 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’11). 318--325.
[2]
Marco Aldinucci and Marco Danelutto. 1999. Stream parallel skeleton optimization. In Proceedings of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems. 966--962.
[3]
Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. 2012. An efficient unbounded lock-free queue for multi-core systems. In Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, Vol. 7484. Springer, 662--673.
[4]
Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. 2014. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice and Experience 26, 3, 800--820.
[5]
Bruno Bacci, Marco Danelutto, Salvatore Orlando, Susanna Pelagatti, and Marco Vanneschi. 1995. P3L: A structured high-level parallel language, and its structured support. Concurrency: Practice and Experience 7, 3, 225--255.
[6]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81.
[7]
Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654.
[8]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SIGPLAN Notices 30, 8, 207--216.
[9]
Andrea Bracciali, Marco Aldinucci, Murray Patterson, Tobias Marschall, Nadia Pisanti, Ivan Merelli, and Massimo Torquati. 2016. PWHATSHAP: Efficient haplotyping for future generation sequencing. BMC Bioinformatics 17, S-11, 342.
[10]
Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE, Los Alamitos, CA, 89--100.
[11]
Daniele Buono, Marco Danelutto, Tiziano De Matteis, Gabriele Mencagli, and Massimo Torquati. 2014. A lightweight run-time support for fast dense linear algebra on multi-core. In Proceedings of the 12th IASTED International Conference on Parallel and Distributed Computing and Networks.
[12]
Colin Campbell and Ade Miller. 2011. A Parallel Programming With Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, Redmond, WA.
[13]
Denis Caromel, Ludovic Henrio, and Mario Leyton. 2008. Type safe algorithmic skeletons. In Proceedings of the 16th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’08). 45--53.
[14]
Juan M. Cebrian, Magnus Jahre, and Lasse Natvig. 2015. ParVec: Vectorizing the PARSEC benchmark suite. Computing 97, 11, 1077--1100.
[15]
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 363--375.
[16]
Barbara Chapman. 2007. The Multicore Programming Challenge. Springer, Berlin, Germany.
[17]
Dimitrios Chasapis, Marc Casas, Miquel Moretó, Raul Vidal, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Transactions on Architecture and Code Optimization 12, 4, Article No. 41, 22 pages.
[18]
Murray Cole. 2004. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30, 3, 389--406.
[19]
Marco Danelutto, Luca Deri, Daniele De Sensi, and Massimo Torquati. 2013. Deep packet inspection on commodity hardware using FastFlow. In Proceedings of the 15th International Parallel Computing Conference (ParCo’13). 92--99.
[20]
Marco Danelutto, José Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, and Massimo Torquati. 2016. Introducing parallelism by using REPARA C++11 attributes. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 354--358.
[21]
Marco Danelutto, Tiziano De Matteis, Daniele De Sensi, Gabriele Mencagli, and Massimo Torquati. 2017. P3 ARSEC: Towards parallel patterns benchmarking. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1582--1589.
[22]
Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. Lecture Notes in Computer Science, Vol. 8606. Springer, 29--75.
[23]
Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A reconfiguration algorithm for power-aware parallel applications. ACM Transactions on Architecture and Code Optimization 13, 4, Article No. 43, 25 pages.
[24]
Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2017. Mammut: High-level management of system knobs and sensors. SoftwareX 6, 150--154.
[25]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113.
[26]
David del Rio Astorga, Manuel F. Dolz, Javier Fernndez, and J. Daniel Garca. 2017. A generic parallel pattern interface for stream and data processing. Available at
[27]
Antonio J. Dios, Rafael Asenjo, Angeles Navarro, Francisco Corbera, and Emilio L. Zapata. 2010. Evaluation of the task programming model in the parallelization of wavefront problems. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC’10). 257--264.
[28]
Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@Intel Magazine 9, 2, 1--10.
[29]
Kento Emoto and Kiminori Matsuzaki. 2014. An automatic fusion mechanism for variable-length list skeletons in SkeTo. International Journal of Parallel Programming 42, 4, 546--563.
[30]
Johan Enmyren and Christoph W. Kessler. 2010. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-Level Parallel Programming and Applications (HLPP’10). ACM, New York, NY, 5--14.
[31]
Steffen Ernsting and Herbert Kuchen. 2012. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IInternational Journal of High Performance Computing and Networking 7, 2, 129--138.
[32]
August Ernstsson, Lu Li, and Christoph Kessler. 2017. SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems. Available at
[33]
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Longman, Boston, MA.
[34]
Buğra Gedik, Habibe G. Özsema, and Özcan Öztürk. 2016. Pipelined fission for stream programs with dynamic selectivity and partitioned state. Journal of Parallel and Distributed Computing 96, C, 106--120.
[35]
Horacio González-Vélez and Mario Leyton. 2010. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40, 12, 1135--1160.
[36]
Clemens Grelck. 2005. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 15, 3, 353--401.
[37]
Dalvan Griebler, Marco Danelutto, Massimo Torquati, and Luiz Gustavo Fernandes. 2017. SPar: A DSL for high-level and productive stream parallelism. Parallel Processing Letters 27, 1, 1--20.
[38]
Michael Haidl and Sergei Gorlatch. 2017. High-level programming for many-cores using C++14 and the STL. Available at
[39]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 2, 1--12.
[40]
David Heath, Robert Jarrow, and Andrew Morton. 1992. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 1, 77--105.
[41]
Vladimir Janjic, Chris Brown, Kenneth Mackenzie, Kevin Hammond, Marco Danelutto, Marco Aldinucci, and José Daniel Garcia. 2016. RPL: A domain-specific language for designing and implementing parallel C++ applications. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). 288--295.
[42]
I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. ACM Transactions on Parallel Computing 2, 3, Article No. 17, 42 pages.
[43]
Joeffrey Legaux, Frdric Loulergue, and Sylvain Jubertie. 2013. OSL: An algorithmic skeleton library with exceptions. Procedia Computer Science 18, 260--269. 2013 International Conference on Computational Science.
[44]
Mario Leyton and José M. Piquer. 2010. Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of the18th Euromicro Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 289--296.
[45]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2006. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating System Review 40, 4, 317--330.
[46]
Kirk Martinez and John Cupitt. 2005. VIPS—a highly tuned image processing software architecture. In Proceedings of the IEEE International Conference on Image Processing, Vol. 2. II--574--7.
[47]
Tiziano De Matteis and Gabriele Mencagli. 2016. Keep calm and react with foresight: Strategies for low-latency and energy-efficient elastic data stream processing. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16).
[48]
Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming. Addison-Wesley Professional.
[49]
Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming. Morgan Kaufmann, San Francisco, CA.
[50]
Gabriele Mencagli, Massimo Torquati, Marco Danelutto, and Tiziano De Matteis. 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1.
[51]
John C. Munson and Sebastian G. Elbaum. 1998. Code churn: A measure for estimating the impact of code change. In Proceedings of the International Conference on Software Maintenance. 24--31.
[52]
Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE’05). 284--292.
[53]
Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE, Los Alamitos, CA, 281--290.
[54]
Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2014. Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49, 6, 169--180.
[55]
Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL—a portable skeleton library for high-level GPU programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW’11). 1176--1182.
[56]
Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013. In ECOOP 2013—Object-Oriented Programming. Lecture Notes in Computer Science, Vol. 7920. Springer, 52--78.
[57]
Marco Vanneschi. 2002. The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28, 12, 1709--1732.
[58]
Elaine J. Weyuker. 1988. Evaluating software complexity measures. IEEE Transactions on Software Engineering 14, 9, 1357--1365.
[59]
William A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1, 20--24.

Cited By

View all
  • (2024)Testing the Unknown: A Framework for OpenMP Testing via Random Program GenerationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00080(577-587)Online publication date: 17-Nov-2024
  • (2022)Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel SystemsInternational Journal of Parallel Programming10.1007/s10766-022-00746-151:1(61-82)Online publication date: 6-Dec-2022
  • (2022)SPBench: a framework for creating benchmarks of stream processing applicationsComputing10.1007/s00607-021-01025-6105:5(1077-1099)Online publication date: 10-Jan-2022
  • Show More Cited By

Index Terms

  1. Bringing Parallel Patterns Out of the Corner: The P3 ARSEC Benchmark Suite

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
    December 2017
    600 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3154814
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2017
    Accepted: 01 August 2017
    Revised: 01 July 2017
    Received: 01 June 2017
    Published in TACO Volume 14, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Parallel patterns
    2. algorithmic skeletons
    3. benchmarking
    4. multicore programming
    5. parsec

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • EU H2020-ICT-2014-1

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)97
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Testing the Unknown: A Framework for OpenMP Testing via Random Program GenerationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00080(577-587)Online publication date: 17-Nov-2024
    • (2022)Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel SystemsInternational Journal of Parallel Programming10.1007/s10766-022-00746-151:1(61-82)Online publication date: 6-Dec-2022
    • (2022)SPBench: a framework for creating benchmarks of stream processing applicationsComputing10.1007/s00607-021-01025-6105:5(1077-1099)Online publication date: 10-Jan-2022
    • (2021)Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline PatternsInternational Journal of Parallel Programming10.1007/s10766-021-00716-z49:6(886-910)Online publication date: 1-Dec-2021
    • (2021)Online and transparent self-adaptation of stream parallel patternsComputing10.1007/s00607-021-00998-8105:5(1039-1057)Online publication date: 23-Aug-2021
    • (2020)High-throughput stream processing with actorsProceedings of the 10th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control10.1145/3427760.3428338(1-10)Online publication date: 17-Nov-2020
    • (2020)Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePUProceedings of the 23th International Workshop on Software and Compilers for Embedded Systems10.1145/3378678.3391889(74-77)Online publication date: 25-May-2020
    • (2020)GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux ApplicationsProceedings of the ACM/SPEC International Conference on Performance Engineering10.1145/3358960.3379136(257-264)Online publication date: 20-Apr-2020
    • (2020)Enforcing Deadlines for Skeleton-based Parallel Programming2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.000-7(188-199)Online publication date: Apr-2020
    • (2020)Parallel programming models for heterogeneous many-cores: a comprehensive surveyCCF Transactions on High Performance Computing10.1007/s42514-020-00039-42:4(382-400)Online publication date: 31-Jul-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media