ABSTRACT
Biological sequence comparison is a very important operation in Bioinformatics. Even though there do exist exact methods to compare biological sequences, these methods are often neglected due to their quadratic time and space complexity. In order to accelerate these methods, many GPU algorithms were proposed in the literature. Nevertheless, all of them restrict the size of the smallest sequence in such a way that Megabase genome comparison is prevented. In this paper, we propose and evaluate CUDAlign, a GPU algorithm that is able to compare Megabase biological sequences with an exact Smith-Waterman affine gap variant. CUDAlign was implemented in CUDA and tested in two GPU boards, separately. For real sequences whose size range from 1MBP (Megabase Pairs) to 47MBP, a close to uniform GCUPS (Giga Cells Updates per Second) was obtained, showing the potential scalability of our approach. Also, CUDAlign was able to compare the human chromosome 21 and the chimpanzee chromosome 22. This operation took 21 hours on GeForce GTX 280, resulting in a peak performance of 20.375 GCUPS. As far as we know, this is the first time such huge chromosomes are compared with an exact method.
- S. Batzoglou. The many faces of sequence alignment. Brief Bioinform, 6(1):6--22, March 2005.Google ScholarCross Ref
- Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1999.Google Scholar
- O. Gotoh. An improved algorithm for matching biological sequences. J Mol Biol, 162(3):705--708, December 1982.Google ScholarCross Ref
- T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J Mol Biol, 147(1):195--197, March 1981.Google ScholarCross Ref
- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403--410, October 1990.Google ScholarCross Ref
- John Johnson Yang Liu, Wayne Huang and Sheila Vaidya. Gpu accelerated smith-waterman. In Computational Science - ICCS 2006, volume 3994 of LNCS, pages 188--195. Springer, 2006. Google ScholarDigital Library
- Weiguo Liu, B. Schmidt, G. Voss, A. Schroder, and W. Muller-Wittig. Bio-sequence database scanning on a gpu. IPDPS, page 274, 2006.Google Scholar
- Svetlin Manavski and Giorgio Valle. Cuda compatible gpu cards as efficient hardware accelerators for smith-waterman sequence alignment. BMC Bioinformatics, 9(Suppl 2), 2008.Google Scholar
- Yongchao Liu, Douglas Maskell, and Bertil Schmidt. Cudasw++: optimizing smith-waterman sequence database searches for cuda-enabled graphics processing units. BMC Research Notes, 2(1):73, 2009.Google ScholarCross Ref
- Lukasz Ligowski and Witold Rudnicki. An efficient implementation of Smith-Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In IEEE International Workshop on High Performance Computational Biology (HiCOMB 2009), 2009. Google ScholarDigital Library
- NVIDIA. NVIDIA CUDA Programming Guide 2.1. 2008.Google Scholar
- Gregory F. Pfister. In search of clusters: the coming battle in lowly parallel computing. Prentice-Hall, Inc., 1995. Google ScholarDigital Library
- A. Wozniak. Using video-oriented instructions to speed up sequence comparison. Computer Applications in the Biosciences, 13(2):145--150, 1997.Google Scholar
- T. Rognes and E. Seeberg. Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics, 16(8):699--706, August 2000.Google ScholarCross Ref
- M. Farrar. Striped smith-waterman speeds database searches six times over other simd implementations. Bioinformatics, 23(2):156--161, January 2007. Google ScholarDigital Library
- Adrianto Wirawan, Chee K. Kwoh, Tri H. Nim, and Bertil Schmidt. Cbesw: Sequence alignment on the playstation 3. BMC Bioinformatics, 9, September 2008.Google Scholar
- P. Green. Url: http://www.phrap.org/phredphrap/swat.html.Google Scholar
- Stjepan Rajko and Srinivas Aluru. Space and time optimal parallel sequence alignments. IEEE Trans. Parallel Distrib. Syst., 15(12):1070--1081, 2004. Google ScholarDigital Library
- Rodolfo Bezerra Batista, Azzedine Boukerche, and Alba Cristina Magalhaes Alves de Melo. A parallel strategy for biological sequence alignment in restricted memory space. J. Parallel Distrib. Comput., 68(4):548--561, 2008. Google ScholarDigital Library
- Chunxi Chen and Bertil Schmidt. Computing large-scale alignments on a multi-cluster. Cluster Computing, IEEE International Conference on, page 38, 2003.Google Scholar
- Azzedine Boukerche, Alba Cristina Melo, Edans Flavius Sandes, and Mauricio Ayala-Rincon. An exact parallel algorithm to compare very long biological sequences in clusters of workstations. Cluster Computing, 10(2):187--202, 2007. Google ScholarDigital Library
- Xiandong Meng and Vipin Chaudhary. Optimised fine and coarse parallelism for sequence homology search. Int. J. Bioinformatics Res. Appl., 2(4):430--441, 2006. Google ScholarDigital Library
- Peiheng Zhang, Guangming Tan, and Guang R. Gao. Implementation of the smith-waterman algorithm on a reconfigurable supercomputing platform. In HPRCTA '07: Proc. of the 1st int. workshop on High-performance reconfigurable computing technology and applications, pages 39--48, 2007. Google ScholarDigital Library
- L. Xu P. Zhang N. Sun X. Jiang, X. Liu. "a reconfigurable accelerator for smith-waterman algorithm". "IEEE Transactions on Circuits and Systems II", 54(12):1077--1081, December 2007.Google ScholarCross Ref
- Azzedine Boukerche, Jan Mendonca Correa, Alba Cristina Magalhaes Alves de Melo, Ricardo P. Jacobi, and Adson Ferreira Rocha. Reconfigurable architecture for biological sequence comparison in reduced memory space. In IPDPS, pages 1--8, 2007.Google ScholarCross Ref
- The international chimpanzee chromosome 22 consortium. Dna sequence and comparative analysis of chimpanzee chromosome 22. Nature, 429(6990):382--388, May 2004.Google ScholarCross Ref
Index Terms
- CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences
Recommendations
CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences
PPoPP '10Biological sequence comparison is a very important operation in Bioinformatics. Even though there do exist exact methods to compare biological sequences, these methods are often neglected due to their quadratic time and space complexity. In order to ...
Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingThis paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) algorithm for megabase DNA sequences in heterogeneous multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple ...
GPU Acceleration of Pyrosequencing Noise Removal
SAAHPC '12: Proceedings of the 2012 Symposium on Application Accelerators in High Performance ComputingAmplicon Noise [1], an updated version of Py-ronoise [2], is a tool for removing noise from metagenomic data recorded by a 454 pyrosequencer. Amplicon Noise has shown to be effective in reducing overestimation of operational taxonomic units (OTUs) and ...
Comments