skip to main content
10.1145/1366230.1366235acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine

Published: 05 May 2008 Publication History

Abstract

This paper presents and evaluates a model and a methodology for implementing parallel wavefront algorithms on the Cell Broadband Engine. Wavefront algorithms are vital in several application areas such as computational biology, particle physics, and systems of linear equations.
The model uses blocked data decomposition with pipelined execution of blocks across the synergistic processing elements (SPEs) of the Cell. To evaluate the model, we implement the Smith-Waterman sequence alignment algorithm as a wavefront algorithm and present key optimization techniques that complement the vector processing capabilities of the SPE. Our results show perfect linear speedup for up to 16 SPEs on the QS20 dual-Cell blades, and our model shows that our implementation is highly scalable for more cores, if available. Furthermore, the accuracy of our model is within 3% of the measured values on average.
Lastly, we also test our model in a throughput-oriented experimental setting, where we couple the model with scheduling techniques that exploit parallelism across the simultaneous execution of multiple sequence alignments. Using our model, we improved the throughput of realistic multisequence alignment workloads by up to 8% compared to FCFS (first-come, first-serve), by trading off parallelism within alignments with parallelism across alignments.

References

[1]
S. Alam, J. Meredith, and J. Vetter. Balancing Productivity and Performance on the Cell Broadband Engine. In Proc. of the 2007 IEEE Annual International Conference on Cluster Computing, September 2007.
[2]
D. Bader and V. Agarwal. FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine. In Proc. of the 14 th IEEE International Conference on High Performance Computing (HiPC), Lecture Notes in Computer Science 4873, 2007.
[3]
D. Bader, V. Agarwal, and K. Madduri. On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking. In Proc. of the 21st International Parallel and Distributed Processing Symposium, pages 1--10, 2007.
[4]
P. Bellens, J. Pérez, R. Badia, and J. Labarta. Memory - Cellss: A Programming Model for the Cell BE Architecture. In Proc. of Supercomputing'2006, page 86, 2006.
[5]
F. Blagojevic, A. Stamatakis, C. Antonopoulos, and D. S. Nikolopoulos. RAxML-CELL: Parallel Phylogenetic Tree Construction on the Cell Broadband Engine. In Proc. of the 21st IEEE/ACM International Parallel and Distributed Processing Symposium, March 2007.
[6]
G. Buehrer and S. Parthasarathy. The Potential of the Cell Broadband Engine for Data Mining. Technical Report TR-2007-22, Department of Computer Science and Engineering, Ohio State University, 2007.
[7]
T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation. IBM developerWorks, Nov 2005.
[8]
A. Eichenberger et al. Using Advanced Compiler Technology to Exploit the Performance of the Cell Broadband Engine Architecture. IBM Systems Journal, 45(1):59--84, 2006.
[9]
J. Kahle et al. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, pages 589--604, Jul-Sep 2005.
[10]
K. Fatahalian, D. Reiter Horn, T. Knight, L. Leem, M. Houston, J.-Y. Park, M. Erez, M. Ren, A. Aiken, W. Dally, and P. Hanrahan. Memory - Sequoia: Programming the Memory Hierarchy. In Proc. of Supercomputing'2006, page 83, 2006.
[11]
M. Gardner, W. Feng, J. Archuleta, H. Lin, and X. Ma. Parallel Genomic Sequence-Searching on an Ad-hoc Grid: Experiences, Lessons Learned, and Implications. In Proc. of the ACM/IEEE SC 2006 Conference, pages 22--22, 11-17 Nov. 2006.
[12]
B. Gedik, R. Bordawekar, and P. Yu. Cellsort: High Performance Sorting on the Cell Processor. In Proc. of the 33rd Very Large Databases Conference, pages 1286--1207, 2007.
[13]
S. Heman, N. Nes, M. Zukowski, and P. Boncz. Vectorized Data Processing on the Cell Broadband Engine. In Proc. of the Third International Workshop on Data Management on New Hardware, June 2007.
[14]
A. Hoisie, O. Lubeck, and H. Wasserman. Performance and Scalability Analysis of Teraop-scale Parallel Architectures using Multidimensional Wavefront Applications. Technical Report LAUR-98-3316, Los Alamos National Laboratory, August 1998.
[15]
Y. Liu, W. Huang, J. Johnson, and S. Vaidya. GPU Accelerated Smith-Waterman. In Proc. of the 2006 International Conference on Computational Science, Lectures Notes in Computer Science Vol. 3994, pages 188--195, June 2006.
[16]
C. Mueller, B. Martin, and A. Lumsdaine. CorePy: High-Productivity Cell/B.E. Programming. In Proc. of the First STI/Georgia Tech Workshop on Software and Applications for the Cell/B.E. Processor, June 2007.
[17]
F. Petrini, G. Fossum, J. Fernández, A. Lucia Varbanescu, M. Kistler, and M. Perrone. Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine. In Proc. of the 21st International Parallel and Distributed Processing Symposium, pages 1--10, 2007.
[18]
T. Rognes and E. Seeberg. Six-fold Speed-up of Smith-Waterman Sequence Database Searches using Parallel Processing on Common Microprocessors. Bioinformatics, 16(8):699--706, 2000.
[19]
V. Sachdeva, M. Kistler, E. Speight, and T.-H. Tzeng. Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications. In Proc. of the 6th IEEE International Workshop on High Performance Computational Biology, 2007.
[20]
T. Smith and M. Waterman. Identification of Common Molecular Subsequences. Journal of molecular biology, 147:195--197, 1981.
[21]
O. Storaasli and D. Strenski. Exploring Accelerating Science Applications with FPGAs. In Proc. of the Reconfigurable Systems Summer Institute, July 2007.
[22]
A. Varbanescu, H. Sips, K. Ross, Q. Liu, L.-K. Liu, A. Natsev, and J. Smith. An Effective Strategy for Porting C++ Applications on Cell. In Proc. of the 2007 International Conference on Parallel Processing, page 59, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '08: Proceedings of the 5th conference on Computing frontiers
May 2008
334 pages
ISBN:9781605580777
DOI:10.1145/1366230
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cell broadband engine
  2. smith-waterman
  3. wavefront algorithms

Qualifiers

  • Research-article

Conference

CF '08
Sponsor:
CF '08: Computing Frontiers Conference
May 5 - 7, 2008
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platformsJournal of Real-Time Image Processing10.1007/s11554-013-0357-y11:3(571-587)Online publication date: 1-Mar-2016
  • (2015)Accelerating Search of Protein Sequence Databases using CUDA-Enabled GPUDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_17(279-298)Online publication date: 9-Apr-2015
  • (2013)Fine-grained parallel implementations for SWAMP+ Smith-Waterman alignmentParallel Computing10.1016/j.parco.2013.08.00839:12(819-833)Online publication date: 1-Dec-2013
  • (2013)C2FPGA-A dependency-timing graph design methodologyJournal of Parallel and Distributed Computing10.1016/j.jpdc.2012.09.00173:11(1417-1429)Online publication date: 1-Nov-2013
  • (2012)Characterization of Smith-Waterman sequence database search in X10Proceedings of the 2012 ACM SIGPLAN X10 Workshop10.1145/2246056.2246058(1-7)Online publication date: 14-Jun-2012
  • (2012)Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++International Journal of Parallel Programming10.1007/s10766-012-0209-641:1(111-136)Online publication date: 11-Aug-2012
  • (2011)Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchiesParallel Computing10.1016/j.parco.2011.02.00837:9(550-561)Online publication date: 1-Sep-2011
  • (2011)Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video ProcessingJournal of Signal Processing Systems10.1007/s11265-010-0463-z62:3(301-318)Online publication date: 1-Mar-2011
  • (2011)Scalable multicore architectures for long DNA sequence comparisonConcurrency and Computation: Practice & Experience10.1002/cpe.175323:17(2205-2219)Online publication date: 1-Dec-2011
  • (2010)Long DNA sequence comparison on multicore architecturesProceedings of the 16th international Euro-Par conference on Parallel processing: Part II10.5555/1885276.1885303(247-259)Online publication date: 31-Aug-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media