skip to main content
10.1145/3002125.3002128acmconferencesArticle/Chapter ViewAbstractPublication PagessepsConference Proceedingsconference-collections
research-article

A divide-and-conquer parallel pattern implementation for multicores

Published: 21 October 2016 Publication History

Abstract

Divide-and-Conquer (DaC) is a sequential programming paradigm which models a large class of algorithms used in real-life applications. Although suitable to extract parallelism in a straightforward way, the parallel implementation of DaC algorithms still requires some expertise in parallel programming tools by the programmer.
In this paper we aim at providing to non-expert programmers a high-level solution for fast prototyping parallel DaC programs on multicores with minimal programming effort.
Following the rationale of parallel design pattern methodology, we design a C++11-compliant template interface for developing parallel DaC programs. The interface is implemented using different back-end frameworks (i.e. OpenMP, Intel TBB and FastFlow) supporting source code reuse and a certain amount of performance portability.
Experiments on a 24-core Intel server show the effectiveness of our approach: with a reduced programming effort the programmer easily prototypes parallel versions with performance comparable with hand-made parallelizations.

References

[1]
M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. An efficient unbounded lock-free queue for multi-core systems. In Proc. of 18th Intl. Euro-Par 2012 Parallel Processing, volume 7484 of LNCS, pages 662–673, Rhodes Island, Greece, Aug. 2012. Springer.
[3]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, E. Su, P. Unnikrishnan, and G. Zhang. A Proposal for Task Parallelism in OpenMP, pages 1–12. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008. ISBN 978-3- 540-69303-1. 1.
[4]
G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, pages 501–510, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics.
[5]
D. Buono, M. Danelutto, T. D. Matteis, G. Mencagli, and M. Torquati. A lightweight run-time support for fast dense linear algebra on multi-core. In Proc. of the 12th International Conference on Parallel and Distributed Computing and Networks (PDCN 2014). IASTED, ACTA press, Feb. 2014.
[6]
M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA, USA, 1988. ISBN 0-262-53086-4.
[7]
M. Cole. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Comput., 30(3):389–406, Mar. 2004. ISSN 0167-8191.
[8]
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. ISBN 0070131511.
[9]
M. Danelutto and M. Torquati. Structured parallel programming with ”core” fastflow. In V. Zs´ok, Z. Horváth, and L. Csat´o, editors, Central European Functional Programming School, volume 8606 of LNCS, pages 29–75. Springer, 2015. ISBN 978-3-319-15939-3. 978-3-319-15940-9 2.
[10]
D. del Rio Astorga, M. F. Dolz, L. M. Sanchez, and J. D. Garc´ıa. Discovering Pipeline Parallel Patterns in Sequential Legacy C++ Codes. In Procs of the 7th Int’l Workshop on Progr. Models and Applications for Multicores and Manycores, PMAM’16, pages 11–19, NY, USA, 2016. ACM. ISBN 978-1-4503-4196-7.
[11]
I. Dorta, C. Leon, C. Rodriguez, and A. Rojas. Parallel skeletons for divide-and-conquer and branch-and-bound techniques. In Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, pages 292–298, Feb 2003.
[12]
[13]
A. Duran, J. Corbalán, and E. Ayguadé. An adaptive cut-off for task parallelism. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, pages 36:1–36:11, Piscataway, NJ, USA, 2008. IEEE Press. ISBN 978-1-4244- 2835-9.
[14]
A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of openmp task scheduling strategies. In Proceedings of the 4th International Conference on OpenMP in a New Era of Parallelism, IWOMP’08, pages 100–110, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3-540-79560-X, 978-3-540- 79560-5.
[15]
A. Fonseca and B. Cabral. Evaluation of runtime cut-off approaches for parallel programs. In Proceedings of the 12th International Meeting on High Performance Computing for Computational Science (VECPAR 2016). Springer, 2016. to appear.
[16]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. SIGPLAN Not., 33(5):212–223, May 1998. ISSN 0362-1340.
[17]
C. H. Gonzalez and B. B. Fraguela. A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications, HPCC ’10, pages 79–88, Washington, DC, USA, 2010.
[18]
IEEE Computer Society. ISBN 978-0-7695-4214-0.
[19]
C. A. Herrmann and C. Lengauer. A higher-order language for divide-and-conquer. Parallel Processing Letters, 10(02n03): 239–250, 2000.
[20]
Intel R ⃝Threading Building Blocks (Intel R ⃝TBB) Developer Guide. Intel R ⃝, 2016. Available at: https://www.threadingbuildingblocks.org/docs/ help/tbb_userguide/Design_Patterns/Divide_and_ Conquer.html.
[21]
M. Leyton and J. Piquer. Skandium: Multi-core programming with algorithmic skeletons. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pages 289–296, Feb 2010.
[22]
T. Mattson, B. Sanders, and B. Massingill. Patterns for Parallel Programming. Addison-Wesley Professional, first edition, 2004. ISBN 0321228111.
[23]
G. Mencagli, M. Vanneschi, and E. Vespa. Control-theoretic adaptation strategies for autonomic reconfigurable parallel applications on cloud environments. In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pages 11–18, July 2013.
[24]
[25]
G. Mencagli, M. Vanneschi, and E. Vespa. A cooperative predictive control approach to improve the reconfiguration stability of adaptive distributed parallel applications. ACM Trans. Auton. Adapt. Syst., 9(1):2:1–2:27, Mar. 2014. ISSN 1556-4665.
[26]
K. Molitorisz, T. Müller, and W. F. Tichy. Patty: A patternbased parallelization tool for the multicore age. In Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ’15, pages 153–163, New York, NY, USA, 2015.
[27]
ACM. ISBN 978-1-4503-3404-4.
[28]
[29]
[30]
OpenMP Architecture Review Board. Openmp application program interface, 2011.
[31]
M. Poldner and H. Kuchen. Task parallel skeletons for divide and conquer. In Proceedings of the Workshop of the Working Group Programming Languages and Computing Concepts of the German Computer Science Association GI, Bad Honnef, 2008.
[32]
J. Reinders. Intel Threading Building Blocks. O’Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808.

Cited By

View all
  • (2024)Extending parallel programming patterns with adaptability featuresCluster Computing10.1007/s10586-024-04622-027:9(12547-12568)Online publication date: 1-Dec-2024
  • (2022)On Generalizing Divide and Conquer Parallel Programming PatternMathematics10.3390/math1021392510:21(3925)Online publication date: 23-Oct-2022
  • (2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 24-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SEPS 2016: Proceedings of the 3rd International Workshop on Software Engineering for Parallel Systems
October 2016
34 pages
ISBN:9781450346412
DOI:10.1145/3002125
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Divide and Conquer
  2. High-level parallel patterns

Qualifiers

  • Research-article

Conference

SPLASH '16
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Extending parallel programming patterns with adaptability featuresCluster Computing10.1007/s10586-024-04622-027:9(12547-12568)Online publication date: 1-Dec-2024
  • (2022)On Generalizing Divide and Conquer Parallel Programming PatternMathematics10.3390/math1021392510:21(3925)Online publication date: 23-Oct-2022
  • (2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 24-Jan-2022
  • (2022)ARM vs FPGA: Comparative Analysis of Sorting AlgorithmsAdvanced Information Networking and Applications10.1007/978-3-030-99619-2_27(275-287)Online publication date: 31-Mar-2022
  • (2021)A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep ProblemsInternational Journal of Parallel Programming10.1007/s10766-021-00709-yOnline publication date: 14-May-2021
  • (2019)Multi-way Divide and Conquer Parallel Programming based on PLists2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SOFTCOM.2019.8903794(1-6)Online publication date: Sep-2019
  • (2019)Accelerating Spectral Graph Analysis Through Wavefronts of Linear Algebra Operations2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671640(9-16)Online publication date: Feb-2019
  • (2017)An Efficient Hardware Implementation of TimSort and MergeSort Algorithms Using High Level Synthesis2017 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS.2017.92(580-587)Online publication date: Jul-2017
  • (2017)A general and efficient divide-and-conquer algorithm framework for multi-core clustersCluster Computing10.1007/s10586-017-0766-y20:3(2605-2626)Online publication date: 1-Sep-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media