skip to main content
10.1145/1248377.1248399acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

A parallel dynamic programming algorithm on a multi-core architecture

Published: 09 June 2007 Publication History

Abstract

Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynnamic programming is typically called nonserial polyadic dynamic programming. Owing to the non-uniform data dependence, it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with tile technique. We formulate and analytically solve the optimization problem determing the tile size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.

References

[1]
http://grape--dr.adm.s.u--tokyo.ac.jp/system-en.html.
[2]
http://www.cray.com/products/xmt/.
[3]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM}, 31(9):1116--1127, 1998.
[4]
F. Almeida, R. Andonov, and D. Gonzalez. Optimal tiling for rna base pairing problem. acm symposium on parallel architecture and algorithm. In ACM Symposium on Parallel Architecture and Algorithm, pages 173--182, 2002.
[5]
D. Bader and K. Madduri. Design and implementation of the hpcs graph analysis benchmark on symmetric multiprocessors. In The 12th International Conference on High Performance Computing (HiPC 2005), pages 465--476, 2005.
[6]
R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny. A model and compilation strategy for out--of--core data parallel programs. In Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming}, pages 1--10, 1995.
[7]
P. G. Bradford. Efficient parallel dynamic programming. in 30th annual allerton conference on communication. In Control and Computing, pages 185--194, 1992.
[8]
J. H. Chen, S. Y. Le, B. A. Shapiro, and J. V. Maizel. Optimization of an rna folding algorithm for parallel architectures. Parallel Computing, 24(1617--1634), 1998.
[9]
S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, 1995.
[10]
J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA'05), 2005.
[11]
J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Tiny threads: a thread virtual machine for the cyclops-64 cellular architecture. In Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, 2005.
[12]
J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Towards a software infrastructure for the cyclops-64 cellular architecture. In The 20th International Symposium on High Performance Computing Systems and Applications (HPCS'06), 2006.
[13]
M. Denneau and H. S. Warren. 64-bit cyclops priciples of operation. Technical report, IBM Watson Research Center, 2005.
[14]
P. Edmonds, E. Chu, and A. George. Dynamic programming on a shared memory multiprocessor. Parallel Computing, 19(1):9--22, 1993.
[15]
I. H. M. Fekete and P. Stadler. Prediction of rna base pairing posibilities for rna secondary structure. Biopolymers, 9:1105--1119, 1990.
[16]
Z. Galil and K. Park. Parallel algorithm for dynamic programming recurrences with more than o(1) dependency. Journal of Parallel and Distributed Computing, 21:213--222, 1994.
[17]
A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing. Addison Wesley, 2003.
[18]
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in cell's multicore architecture. IEEE Micro, pages 10--24, March 2006.
[19]
L. Guibas, H. Kung, and C. Thomson. Direct vlsi implementation of combinatorial algorithms. In Caltech Conference on VLSI, pages 509--525, 1979.
[20]
J. Hong and H. Kong. I/o complexity: The red blue pebble game. In Proceedings of ACM Symposium on Theory of Computing, 1981.
[21]
F. Irigoin and R. Triolet. Supernode partitioning. In proceedings of the 15th ACM Symposium on Principles of Programming Languages, pages 319--329, 1988.
[22]
B. Louka and M. Tchuente. Dynamic programming on two-dimensional systolic arrays. Information Processing Letters, 29:97--104, 1988.
[23]
R. B. Lyngso and M. Zuker. Fast evaluation of internal loops in rna secondary structure prediction. Bioinformatics, 15(6), 440--445 1999.
[24]
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for non share memory machines. In Supercomputing, pages 111--120, 1991.
[25]
B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts. The massively parallel genetic algorithm for rna folding: Mimd implementation and population variation. Bioinformatics, 17(2):137--148, 2001.
[26]
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.
[27]
G. Tan, S. Feng, and N. Sun. Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing.
[28]
G. Tan, S. Feng, and N. Sun. Load balancing algorithm in cluster-based rna secondary structure prediction. In proceedings of the 4th International Symposium on Parallel and Distributed Computing, pages 91--96. IEEE Computer Society, 2005.
[29]
J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001.
[30]
M. Wolfe. Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing, pages 357--361, 1987.
[31]
M. Wolfe. Data dependence and program restructuring. The Journal of Supercomputing, 4:321--344, 1990.
[32]
J. Xue and C. Huang. Reuse driven tiling for improving data locality. In International Journal of Parallel Programming, volume 26, pages 671--696, 1998.
[33]
W. Zhou and D. K. Lowenthal. A parallel, out-of-core algorithm for rna secondary structure prediction. In Proceedings of the 2006 International Conference on Parallel Processing, pages 74--81, 2006.
[34]
W. Zhu, Z. Hu, and G. R. Gao. Efficient fine-grain synchronization on a multi-core chip architecture: A fresh look. Technical report, CAPSL, University of Delaware, Technique Report 57.

Cited By

View all
  • (2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024
  • (2023)Using the Branch and Bound Method to Solve the Optimal Packaging Problem2023 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon58002.2023.10272881(786-790)Online publication date: 10-Sep-2023
  • (2021)Robot Motion Planning: can GPUs be a Game Changer?2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00015(21-30)Online publication date: Jul-2021
  • Show More Cited By

Index Terms

  1. A parallel dynamic programming algorithm on a multi-core architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
    June 2007
    376 pages
    ISBN:9781595936677
    DOI:10.1145/1248377
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data dependence
    2. dynamic programming
    3. memory hierarchy
    4. multi-core
    5. scalabilitiy

    Qualifiers

    • Article

    Conference

    SPAA07

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Upcoming Conference

    SPAA '25
    37th ACM Symposium on Parallelism in Algorithms and Architectures
    July 28 - August 1, 2025
    Portland , OR , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024
    • (2023)Using the Branch and Bound Method to Solve the Optimal Packaging Problem2023 International Russian Automation Conference (RusAutoCon)10.1109/RusAutoCon58002.2023.10272881(786-790)Online publication date: 10-Sep-2023
    • (2021)Robot Motion Planning: can GPUs be a Game Changer?2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00015(21-30)Online publication date: Jul-2021
    • (2021)A fast sequential algorithm for the matrix chain ordering problemConcurrency and Computation: Practice and Experience10.1002/cpe.644533:24Online publication date: 21-Jun-2021
    • (2020)Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilersProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377916(317-329)Online publication date: 22-Feb-2020
    • (2019)Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU'sInternational Journal of Grid and High Performance Computing10.4018/IJGHPC.201901010411:1(49-70)Online publication date: 1-Jan-2019
    • (2019)Toward Efficient Architecture-Independent Algorithms for Dynamic ProgramsHigh Performance Computing10.1007/978-3-030-20656-7_8(143-164)Online publication date: 17-May-2019
    • (2017)A heuristic algorithm for RNA secondary structure based on genetic algorithm2017 Intelligent Systems and Computer Vision (ISCV)10.1109/ISACV.2017.8054964(1-7)Online publication date: Apr-2017
    • (2017)Efficient RNA folding using Zuker's method2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)10.1109/ICCABS.2017.8114309(1-6)Online publication date: Oct-2017
    • (2014)Recent Trends in Parallel ComputingEncyclopedia of Information Science and Technology, Third Edition10.4018/978-1-4666-5888-2.ch350(3580-3589)Online publication date: 31-Jul-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media