ACM Home Page
Please provide us with feedback. Feedback
A parallel dynamic programming algorithm on a multi-core architecture
Full text PdfPdf (303 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures table of contents
San Diego, California, USA
SESSION: Multicore architectures and algorithms table of contents
Pages: 135 - 144  
Year of Publication: 2007
ISBN:978-1-59593-667-7
Authors
Guangming Tan  Chinese Academy of Sciences, Beijing, China
Ninghui Sun  Chinese Academy of Sciences, Beijing, China
Guang R. Gao  University of Delaware, Newark, DE
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 66,   Downloads (12 Months): 595,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1248377.1248399
What is a DOI?

ABSTRACT

Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynnamic programming is typically called nonserial polyadic dynamic programming. Owing to the non-uniform data dependence, it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with tile technique. We formulate and analytically solve the optimization problem determing the tile size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
D. Bader and K. Madduri. Design and implementation of the hpcs graph analysis benchmark on symmetric multiprocessors. In The 12th International Conference on High Performance Computing (HiPC 2005), pages 465--476, 2005.
6
 
7
P. G. Bradford. Efficient parallel dynamic programming. in 30th annual allerton conference on communication. In Control and Computing, pages 185--194, 1992.
 
8
9
 
10
J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA'05), 2005.
 
11
 
12
 
13
M. Denneau and H. S. Warren. 64-bit cyclops priciples of operation. Technical report, IBM Watson Research Center, 2005.
 
14
P. Edmonds, E. Chu, and A. George. Dynamic programming on a shared memory multiprocessor. Parallel Computing, 19(1):9--22, 1993.
 
15
I. H. M. Fekete and P. Stadler. Prediction of rna base pairing posibilities for rna secondary structure. Biopolymers, 9:1105--1119, 1990.
 
16
 
17
A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing. Addison Wesley, 2003.
 
18
 
19
L. Guibas, H. Kung, and C. Thomson. Direct vlsi implementation of combinatorial algorithms. In Caltech Conference on VLSI, pages 509--525, 1979.
20
21
 
22
 
23
R. B. Lyngso and M. Zuker. Fast evaluation of internal loops in rna secondary structure prediction. Bioinformatics, 15(6), 440--445 1999.
24
 
25
B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts. The massively parallel genetic algorithm for rna folding: Mimd implementation and population variation. Bioinformatics, 17(2):137--148, 2001.
 
26
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.
27
 
28
29
 
30
 
31
 
32
 
33
 
34
W. Zhu, Z. Hu, and G. R. Gao. Efficient fine-grain synchronization on a multi-core chip architecture: A fresh look. Technical report, CAPSL, University of Delaware, Technique Report 57.


Collaborative Colleagues:
Guangming Tan: colleagues
Ninghui Sun: colleagues
Guang R. Gao: colleagues