Article

A parallel dynamic programming algorithm on a multi-core architecture

Authors:

Guang R. GaoAuthors Info & Claims

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

Pages 135 - 144

https://doi.org/10.1145/1248377.1248399

Published: 09 June 2007 Publication History

Abstract

Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynnamic programming is typically called nonserial polyadic dynamic programming. Owing to the non-uniform data dependence, it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with tile technique. We formulate and analytically solve the optimization problem determing the tile size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.

References

[1]

http://grape--dr.adm.s.u--tokyo.ac.jp/system-en.html.

[2]

http://www.cray.com/products/xmt/.

[3]

A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM}, 31(9):1116--1127, 1998.

Digital Library

[4]

F. Almeida, R. Andonov, and D. Gonzalez. Optimal tiling for rna base pairing problem. acm symposium on parallel architecture and algorithm. In ACM Symposium on Parallel Architecture and Algorithm, pages 173--182, 2002.

Digital Library

[5]

D. Bader and K. Madduri. Design and implementation of the hpcs graph analysis benchmark on symmetric multiprocessors. In The 12th International Conference on High Performance Computing (HiPC 2005), pages 465--476, 2005.

Digital Library

[6]

R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny. A model and compilation strategy for out--of--core data parallel programs. In Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming}, pages 1--10, 1995.

Digital Library

[7]

P. G. Bradford. Efficient parallel dynamic programming. in 30th annual allerton conference on communication. In Control and Computing, pages 185--194, 1992.

[8]

J. H. Chen, S. Y. Le, B. A. Shapiro, and J. V. Maizel. Optimization of an rna folding algorithm for parallel architectures. Parallel Computing, 24(1617--1634), 1998.

Digital Library

[9]

S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, 1995.

Digital Library

[10]

J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA'05), 2005.

[11]

J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Tiny threads: a thread virtual machine for the cyclops-64 cellular architecture. In Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, 2005.

Digital Library

[12]

J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Towards a software infrastructure for the cyclops-64 cellular architecture. In The 20th International Symposium on High Performance Computing Systems and Applications (HPCS'06), 2006.

Digital Library

[13]

M. Denneau and H. S. Warren. 64-bit cyclops priciples of operation. Technical report, IBM Watson Research Center, 2005.

[14]

P. Edmonds, E. Chu, and A. George. Dynamic programming on a shared memory multiprocessor. Parallel Computing, 19(1):9--22, 1993.

[15]

I. H. M. Fekete and P. Stadler. Prediction of rna base pairing posibilities for rna secondary structure. Biopolymers, 9:1105--1119, 1990.

[16]

Z. Galil and K. Park. Parallel algorithm for dynamic programming recurrences with more than o(1) dependency. Journal of Parallel and Distributed Computing, 21:213--222, 1994.

Digital Library

[17]

A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing. Addison Wesley, 2003.

Digital Library

[18]

M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in cell's multicore architecture. IEEE Micro, pages 10--24, March 2006.

Digital Library

[19]

L. Guibas, H. Kung, and C. Thomson. Direct vlsi implementation of combinatorial algorithms. In Caltech Conference on VLSI, pages 509--525, 1979.

[20]

J. Hong and H. Kong. I/o complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.

Digital Library

[21]

F. Irigoin and R. Triolet. Supernode partitioning. In proceedings of the 15th ACM Symposium on Principles of Programming Languages, pages 319--329, 1988.

Digital Library

[22]

B. Louka and M. Tchuente. Dynamic programming on two-dimensional systolic arrays. Information Processing Letters, 29:97--104, 1988.

Digital Library

[23]

R. B. Lyngso and M. Zuker. Fast evaluation of internal loops in rna secondary structure prediction. Bioinformatics, 15(6), 440--445 1999.

[24]

J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for non share memory machines. In Supercomputing, pages 111--120, 1991.

Digital Library

[25]

B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts. The massively parallel genetic algorithm for rna folding: Mimd implementation and population variation. Bioinformatics, 17(2):137--148, 2001.

[26]

T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.

[27]

G. Tan, S. Feng, and N. Sun. Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing.

Digital Library

[28]

G. Tan, S. Feng, and N. Sun. Load balancing algorithm in cluster-based rna secondary structure prediction. In proceedings of the 4th International Symposium on Parallel and Distributed Computing, pages 91--96. IEEE Computer Society, 2005.

Digital Library

[29]

J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001.

Digital Library

[30]

M. Wolfe. Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing, pages 357--361, 1987.

Digital Library

[31]

M. Wolfe. Data dependence and program restructuring. The Journal of Supercomputing, 4:321--344, 1990.

Digital Library

[32]

J. Xue and C. Huang. Reuse driven tiling for improving data locality. In International Journal of Parallel Programming, volume 26, pages 671--696, 1998.

Digital Library

[33]

W. Zhou and D. K. Lowenthal. A parallel, out-of-core algorithm for rna secondary structure prediction. In Proceedings of the 2006 International Conference on Parallel Processing, pages 74--81, 2006.

Digital Library

[34]

W. Zhu, Z. Hu, and G. R. Gao. Efficient fine-grain synchronization on a multi-core chip architecture: A fresh look. Technical report, CAPSL, University of Delaware, Technique Report 57.

Cited By

Jung KNamkoong SUm HKim HKim YPark YShrivastava ASui Y(2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652032.3657571
Thant Sin SPortnov EBain A(2023)Using the Branch and Bound Method to Solve the Optimal Packaging Problem2023 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon58002.2023.10272881(786-790)Online publication date: 10-Sep-2023
https://doi.org/10.1109/RusAutoCon58002.2023.10272881
Saccon EBevilacqua PFontanelli DFrego MPalopoli LPasserone R(2021)Robot Motion Planning: can GPUs be a Game Changer?2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00015(21-30)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00015
Show More Cited By

Index Terms

A parallel dynamic programming algorithm on a multi-core architecture
1. Theory of computation
  1. Computational complexity and cryptography
    1. Complexity classes

Recommendations

Parallel Shared-Memory Workloads Performance on Asymmetric Multi-core Architectures
PDP '10: Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing

Putting performance asymmetric cores inside the same processor can be a good alternative to obtain high performance per area, throughput and single-threaded performance. However, the impact of running parallel applications on this type of machine is not ...
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures

Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. This paper focuses on one type of DP, which is called nonserial polyadic dynamic programming (NPDP). Owing to the nonuniform data ...
Coordinated control of multiple prefetchers in multi-core systems
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

June 2007

376 pages

ISBN:9781595936677

DOI:10.1145/1248377

General Chair:
Phillip B. Gibbons
Intel Research, USA
,
Program Chair:
Christian Scheideler
Technische Universität München, Germany

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 11, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
2,998
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jung KNamkoong SUm HKim HKim YPark YShrivastava ASui Y(2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652032.3657571
Thant Sin SPortnov EBain A(2023)Using the Branch and Bound Method to Solve the Optimal Packaging Problem2023 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon58002.2023.10272881(786-790)Online publication date: 10-Sep-2023
https://doi.org/10.1109/RusAutoCon58002.2023.10272881
Saccon EBevilacqua PFontanelli DFrego MPalopoli LPasserone R(2021)Robot Motion Planning: can GPUs be a Game Changer?2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00015(21-30)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00015
Lacmou Zeutouo JKengne Tchendji VMyoupo J(2021)A fast sequential algorithm for the matrix chain ordering problemConcurrency and Computation: Practice and Experience10.1002/cpe.644533:24Online publication date: 21-Jun-2021
https://doi.org/10.1002/cpe.6445
Javanmard MAhmad ZKong MPouchet LChowdhury RHarrison RMars JTang LXue JWu P(2020)Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilersProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377916(317-329)Online publication date: 22-Feb-2020
https://dl.acm.org/doi/10.1145/3368826.3377916
Wani MAhmad M(2019)Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU'sInternational Journal of Grid and High Performance Computing10.4018/IJGHPC.201901010411:1(49-70)Online publication date: 1-Jan-2019
https://doi.org/10.4018/IJGHPC.2019010104
Javanmard MGanapathi PDas RAhmad ZTschudi SChowdhury R(2019)Toward Efficient Architecture-Independent Algorithms for Dynamic ProgramsHigh Performance Computing10.1007/978-3-030-20656-7_8(143-164)Online publication date: 17-May-2019
https://doi.org/10.1007/978-3-030-20656-7_8
Fatmi AChentoufi ABekri MBenhlima SSabbane M(2017)A heuristic algorithm for RNA secondary structure based on genetic algorithm2017 Intelligent Systems and Computer Vision (ISCV)10.1109/ISACV.2017.8054964(1-7)Online publication date: Apr-2017
https://doi.org/10.1109/ISACV.2017.8054964
Zhao CSahni S(2017)Efficient RNA folding using Zuker's method2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)10.1109/ICCABS.2017.8114309(1-6)Online publication date: Oct-2017
https://doi.org/10.1109/ICCABS.2017.8114309
Umrao LMahato DSingh R(2014)Recent Trends in Parallel ComputingEncyclopedia of Information Science and Technology, Third Edition10.4018/978-1-4666-5888-2.ch350(3580-3589)Online publication date: 31-Jul-2014
https://doi.org/10.4018/978-1-4666-5888-2.ch350
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten