|
ABSTRACT
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynnamic programming is typically called nonserial polyadic dynamic programming. Owing to the non-uniform data dependence, it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with tile technique. We formulate and analytically solve the optimization problem determing the tile size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
Francisco Almeida , Rumen Andonov , Daniel Gonzalez , Luz M. Moreno , Vincent Poirriez , Casiano Rodriguez, Optimal tiling for the RNA base pairing problem, Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, August 10-13, 2002, Winnipeg, Manitoba, Canada
[doi> 10.1145/564870.564901]
|
| |
5
|
D. Bader and K. Madduri. Design and implementation of the hpcs graph analysis benchmark on symmetric multiprocessors. In The 12th International Conference on High Performance Computing (HiPC 2005), pages 465--476, 2005.
|
 |
6
|
Rajesh Bordawekar , Alok Choudhary , Ken Kennedy , Charles Koelbel , Michael Paleczny, A model and compilation strategy for out-of-core data parallel programs, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-10, July 19-21, 1995, Santa Barbara, California, United States
|
| |
7
|
P. G. Bradford. Efficient parallel dynamic programming. in 30th annual allerton conference on communication. In Control and Computing, pages 185--194, 1992.
|
| |
8
|
|
 |
9
|
|
| |
10
|
J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA'05), 2005.
|
| |
11
|
|
| |
12
|
|
| |
13
|
M. Denneau and H. S. Warren. 64-bit cyclops priciples of operation. Technical report, IBM Watson Research Center, 2005.
|
| |
14
|
P. Edmonds, E. Chu, and A. George. Dynamic programming on a shared memory multiprocessor. Parallel Computing, 19(1):9--22, 1993.
|
| |
15
|
I. H. M. Fekete and P. Stadler. Prediction of rna base pairing posibilities for rna secondary structure. Biopolymers, 9:1105--1119, 1990.
|
| |
16
|
|
| |
17
|
A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing. Addison Wesley, 2003.
|
| |
18
|
Michael Gschwind , H. Peter Hofstee , Brian Flachs , Martin Hopkins , Yukio Watanabe , Takeshi Yamazaki, Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, v.26 n.2, p.10-24, March 2006
[doi> 10.1109/MM.2006.41]
|
| |
19
|
L. Guibas, H. Kung, and C. Thomson. Direct vlsi implementation of combinatorial algorithms. In Caltech Conference on VLSI, pages 509--525, 1979.
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
R. B. Lyngso and M. Zuker. Fast evaluation of internal loops in rna secondary structure prediction. Bioinformatics, 15(6), 440--445 1999.
|
 |
24
|
|
| |
25
|
B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts. The massively parallel genetic algorithm for rna folding: Mimd implementation and population variation. Bioinformatics, 17(2):137--148, 2001.
|
| |
26
|
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
W. Zhu, Z. Hu, and G. R. Gao. Efficient fine-grain synchronization on a multi-core chip architecture: A fresh look. Technical report, CAPSL, University of Delaware, Technique Report 57.
|
|