skip to main content
10.1145/1693453.1693473acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences

Published:09 January 2010Publication History

ABSTRACT

Biological sequence comparison is a very important operation in Bioinformatics. Even though there do exist exact methods to compare biological sequences, these methods are often neglected due to their quadratic time and space complexity. In order to accelerate these methods, many GPU algorithms were proposed in the literature. Nevertheless, all of them restrict the size of the smallest sequence in such a way that Megabase genome comparison is prevented. In this paper, we propose and evaluate CUDAlign, a GPU algorithm that is able to compare Megabase biological sequences with an exact Smith-Waterman affine gap variant. CUDAlign was implemented in CUDA and tested in two GPU boards, separately. For real sequences whose size range from 1MBP (Megabase Pairs) to 47MBP, a close to uniform GCUPS (Giga Cells Updates per Second) was obtained, showing the potential scalability of our approach. Also, CUDAlign was able to compare the human chromosome 21 and the chimpanzee chromosome 22. This operation took 21 hours on GeForce GTX 280, resulting in a peak performance of 20.375 GCUPS. As far as we know, this is the first time such huge chromosomes are compared with an exact method.

References

  1. S. Batzoglou. The many faces of sequence alignment. Brief Bioinform, 6(1):6--22, March 2005.Google ScholarGoogle ScholarCross RefCross Ref
  2. Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1999.Google ScholarGoogle Scholar
  3. O. Gotoh. An improved algorithm for matching biological sequences. J Mol Biol, 162(3):705--708, December 1982.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J Mol Biol, 147(1):195--197, March 1981.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403--410, October 1990.Google ScholarGoogle ScholarCross RefCross Ref
  6. John Johnson Yang Liu, Wayne Huang and Sheila Vaidya. Gpu accelerated smith-waterman. In Computational Science - ICCS 2006, volume 3994 of LNCS, pages 188--195. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Weiguo Liu, B. Schmidt, G. Voss, A. Schroder, and W. Muller-Wittig. Bio-sequence database scanning on a gpu. IPDPS, page 274, 2006.Google ScholarGoogle Scholar
  8. Svetlin Manavski and Giorgio Valle. Cuda compatible gpu cards as efficient hardware accelerators for smith-waterman sequence alignment. BMC Bioinformatics, 9(Suppl 2), 2008.Google ScholarGoogle Scholar
  9. Yongchao Liu, Douglas Maskell, and Bertil Schmidt. Cudasw++: optimizing smith-waterman sequence database searches for cuda-enabled graphics processing units. BMC Research Notes, 2(1):73, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. Lukasz Ligowski and Witold Rudnicki. An efficient implementation of Smith-Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In IEEE International Workshop on High Performance Computational Biology (HiCOMB 2009), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. NVIDIA. NVIDIA CUDA Programming Guide 2.1. 2008.Google ScholarGoogle Scholar
  12. Gregory F. Pfister. In search of clusters: the coming battle in lowly parallel computing. Prentice-Hall, Inc., 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Wozniak. Using video-oriented instructions to speed up sequence comparison. Computer Applications in the Biosciences, 13(2):145--150, 1997.Google ScholarGoogle Scholar
  14. T. Rognes and E. Seeberg. Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics, 16(8):699--706, August 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Farrar. Striped smith-waterman speeds database searches six times over other simd implementations. Bioinformatics, 23(2):156--161, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Adrianto Wirawan, Chee K. Kwoh, Tri H. Nim, and Bertil Schmidt. Cbesw: Sequence alignment on the playstation 3. BMC Bioinformatics, 9, September 2008.Google ScholarGoogle Scholar
  17. P. Green. Url: http://www.phrap.org/phredphrap/swat.html.Google ScholarGoogle Scholar
  18. Stjepan Rajko and Srinivas Aluru. Space and time optimal parallel sequence alignments. IEEE Trans. Parallel Distrib. Syst., 15(12):1070--1081, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rodolfo Bezerra Batista, Azzedine Boukerche, and Alba Cristina Magalhaes Alves de Melo. A parallel strategy for biological sequence alignment in restricted memory space. J. Parallel Distrib. Comput., 68(4):548--561, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chunxi Chen and Bertil Schmidt. Computing large-scale alignments on a multi-cluster. Cluster Computing, IEEE International Conference on, page 38, 2003.Google ScholarGoogle Scholar
  21. Azzedine Boukerche, Alba Cristina Melo, Edans Flavius Sandes, and Mauricio Ayala-Rincon. An exact parallel algorithm to compare very long biological sequences in clusters of workstations. Cluster Computing, 10(2):187--202, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiandong Meng and Vipin Chaudhary. Optimised fine and coarse parallelism for sequence homology search. Int. J. Bioinformatics Res. Appl., 2(4):430--441, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peiheng Zhang, Guangming Tan, and Guang R. Gao. Implementation of the smith-waterman algorithm on a reconfigurable supercomputing platform. In HPRCTA '07: Proc. of the 1st int. workshop on High-performance reconfigurable computing technology and applications, pages 39--48, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Xu P. Zhang N. Sun X. Jiang, X. Liu. "a reconfigurable accelerator for smith-waterman algorithm". "IEEE Transactions on Circuits and Systems II", 54(12):1077--1081, December 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. Azzedine Boukerche, Jan Mendonca Correa, Alba Cristina Magalhaes Alves de Melo, Ricardo P. Jacobi, and Adson Ferreira Rocha. Reconfigurable architecture for biological sequence comparison in reduced memory space. In IPDPS, pages 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  26. The international chimpanzee chromosome 22 consortium. Dna sequence and comparative analysis of chimpanzee chromosome 22. Nature, 429(6990):382--388, May 2004.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
              January 2010
              372 pages
              ISBN:9781605588773
              DOI:10.1145/1693453
              • cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 45, Issue 5
                PPoPP '10
                May 2010
                346 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/1837853
                Issue’s Table of Contents

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 January 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate230of1,014submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader