skip to main content
research-article

Mercury BLASTP: Accelerating Protein Sequence Alignment

Published: 01 June 2008 Publication History

Abstract

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

References

[1]
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25, 17, 3389--3402.
[2]
Altschul, S. F. and Gish, W. 1996. Local alignment statistics. Metho. Enzymol. 266, 460--80.
[3]
Buhler, J. D., Lancaster, J. M., Jacob, A. C., and Chamberlain, R. D. 2007. Mercury BLASTN: Faster DNA sequence comparison using a streaming hardware architecture. In Proceedings of Reconfigurable Systems Summer Institute.
[4]
Chamberlain, R. D. et al. 2003. The Mercury System: Exploiting truly fast hardware for data search. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--72.
[5]
Chamberlain, R. D. and Shands, B. 2005. Streaming data from disk store to application. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--23.
[6]
Dayhoff, M. O., Schwartz, R., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure 5, 345--52.
[7]
Henikoff S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 22, 10915--10919.
[8]
Herbordt, M. C., Model, J., Gu, Y., Sukhwani, B., and VanCourt, T. 2006. Single pass, BLAST-like approximate string matching on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). 217--26.
[9]
Herbordt, M. C., Model, J., Sukhwani, B., Gu, Y., and VanCourt, T. 2007. Single pass streaming BLAST on FPGAs. Parall. Comput. 33, 10-11, 741--756.
[10]
Hirschberg, J. D., Hughey, R., and Karplus, K. 1996. Kestrel: A programmable array for sequence analysis. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP). 25--34.
[11]
Hoang, D. T. 1993. Searching genetic databases on Splash 2. In IEEE Workshop on FPGAs for Custom Computing Machines (FCCM). 185--191.
[12]
Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., Jacob, A., and Lancaster, J. 2007. Biosequence similarity search on the Mercury system. J. VLSI Signal Process. 49, 101--121.
[13]
Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., and Lancaster, J. 2004. Biosequence similarity search on the Mercury system. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 365--375.
[14]
Lancaster, J., Buhler, J., and Chamberlain, R. D. 2005. Acceleration of ungapped extension in Mercury BLAST. In Proceedings of 7th Workshop on Media and Streaming Processors. 50--57.
[15]
Lancaster, J., Buhler, J., and Chamberlain, R. D. 2008. Acceleration of ungapped extension in Mercury BLAST. Intl. J. of Embed. Sys. To appear.
[16]
Lavenier, D., Guyetant, S., Derrien, S., and Rubini, S. 2003. A reconfigurable parallel disk system for filtering genomic banks. In Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA). 154--166.
[17]
Lin, H., Ma, X., Chandramohan, P., Geist, A. and Samatova, N. 2005. Efficient data access for parallel BLAST. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS). 72.2.
[18]
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., et al. 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437, 326--7.
[19]
McGinnis, S. and Madden, T. L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nuc. Acids Res. 32, 20--5.
[20]
Muriki, K., Underwood, K. D., and Sass, R. 2005. RC-BLAST: Towards a portable, cost-effective open source hardware implementation. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 196.2.
[21]
Portugaly, E. and Ninio, M. 2004. HMMERHEAD - accelerating HMM searches on large databases. In Proceedings of the International Conference on Research in Molecular Biology (RECOMB). 250--251.
[22]
Rangwala, H., Lantz, E., Musselman, R., Pinnow, K., Smith, B., and Wallenfelt, B. 2005. Massively parallel BLAST for the Blue Gene/L. In High Availability and Performance Computing Workshop.
[23]
Schaffer, A. A., Wolf, Y. I., Ponging, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. 1999. IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000--11.
[24]
Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.
[25]
Sotiriades, E., Dollas, A., and Kozanitis, C. 2006. Some initial results on hardware BLAST acceleration with a reconfigurable architecture. In Proceedings of the 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB).
[26]
Swiss Institute of Bioinformatics. 2006. Growth of Swiss-Prot. http://www.expasy.org/sprot/ relnotes/#SPstat.
[27]
Wang, T. and Stormo, G. D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102, 17400--5.
[28]
Yamaguchi, Y., Maruyama, T., and Konagaya, A. 2002. High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing. 271--282.

Cited By

View all
  • (2025)Genome-Wide Identification and Expression Analysis of bHLH-MYC Family Genes from Mustard That May Be Important in Trichome FormationPlants10.3390/plants1402026814:2(268)Online publication date: 18-Jan-2025
  • (2025)Genome-Wide Exploration and Characterization of the TCP Gene Family’s Expression Patterns in Response to Abiotic Stresses in Siberian Wildrye (Elymus sibiricus L.)International Journal of Molecular Sciences10.3390/ijms2605192526:5(1925)Online publication date: 23-Feb-2025
  • (2024)Genome-Wide Identification of GATA Family Genes in Potato and Characterization of StGATA12 in Response to Salinity and Osmotic StressInternational Journal of Molecular Sciences10.3390/ijms25221242325:22(12423)Online publication date: 19-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 1, Issue 2
June 2008
143 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/1371579
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2008
Accepted: 01 March 2008
Revised: 01 January 2008
Received: 01 August 2007
Published in TRETS Volume 1, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bioinformatics
  2. biological sequence alignment

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Genome-Wide Identification and Expression Analysis of bHLH-MYC Family Genes from Mustard That May Be Important in Trichome FormationPlants10.3390/plants1402026814:2(268)Online publication date: 18-Jan-2025
  • (2025)Genome-Wide Exploration and Characterization of the TCP Gene Family’s Expression Patterns in Response to Abiotic Stresses in Siberian Wildrye (Elymus sibiricus L.)International Journal of Molecular Sciences10.3390/ijms2605192526:5(1925)Online publication date: 23-Feb-2025
  • (2024)Genome-Wide Identification of GATA Family Genes in Potato and Characterization of StGATA12 in Response to Salinity and Osmotic StressInternational Journal of Molecular Sciences10.3390/ijms25221242325:22(12423)Online publication date: 19-Nov-2024
  • (2024)Genome-Wide Identification and Expression Analyses of the FAR1/FHY3 Gene Family Provide Insight into Inflorescence Development in MaizeCurrent Issues in Molecular Biology10.3390/cimb4601002746:1(430-449)Online publication date: 2-Jan-2024
  • (2024)New insights into the evolution analysis of trihelix gene family in eggplant (Solanum melongena L.) and expression analysis under abiotic stressBMC Genomics10.1186/s12864-024-10959-y25:1Online publication date: 5-Nov-2024
  • (2024)Genome-wide identification and expression analysis of the SPL transcription factor family and its response to abiotic stress in Pisum sativum LBMC Genomics10.1186/s12864-024-10262-w25:1Online publication date: 31-May-2024
  • (2024) Identification of autophagy gene family in potato and the role of StATG8a in salt and drought stress Physiologia Plantarum10.1111/ppl.14584176:5Online publication date: 21-Oct-2024
  • (2024)Allopolyploidization events and immense paleogenome reshuffling underlying the diversification of plants and secondary metabolites in OleaceaeJournal of Systematics and Evolution10.1111/jse.13116Online publication date: Jul-2024
  • (2024)BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00052(325-331)Online publication date: 2-Sep-2024
  • (2024)Genome evolution and diversity of wild and cultivated rice speciesNature Communications10.1038/s41467-024-54427-315:1Online publication date: 18-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media