ABSTRACT
The genome sequence alignment, whereby ultra scale of sequence reads should be compared to an enormous long reference, has been one central challenge to the biologists for a long period. For recent years, new sequencing technology makes it possible to generate longer reads (sequences of genome fragments) which seem more valuable for the life science research. It has been foreseen that long genome reads (length longer than 200 base pairs) will dominate the field in the near future. Unfortunately, most of the state-of-art aligners nowadays are optimized and only applicable for the short read mapping while present long read aligners are still not satisfying at the aspect of speed. In this paper, we propose a novel PC-FPGA hybrid system to improve the performance of the long read mapping. The BWA-SW algorithm is chosen as the alignment approach and by accelerating the bottleneck of the algorithm, our solution could archive a significant improvement in term of speed. Experiments demonstrate that the described system is as accurate as the BWA-SW aligner and about 1.41-2.73 times faster than it for reads with lengths ranging from 500bp to 2000bp.
- Wu, T.D. and C.K. Watanabe, GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 2005. 21(9): p. 1859--1875. Google ScholarDigital Library
- Lin, H., Z. Zhang, M.Q. Zhang, et al., ZOOM! Zillions of oligos mapped. Bioinformatics, 2008. 24(21): p. 2431--7. Google ScholarDigital Library
- Li, R., Y. Li, K. Kristiansen, et al., SOAP: short oligonucleotide alignment program. Bioinformatics, 2008. 24(5): p. 713--4. Google ScholarDigital Library
- Li, H. and R. Durbin, Fast and accurate short read alingment with Burrows-Wheeler transform. Bioinformatics, 2009. 25(14): p. 1754--1760. Google ScholarDigital Library
- Li, H. and N. Homer, A survey of sequence alignment algorithms for next-generation sequencing. Bioinformatics, 2010. 11(5): p. 473--483.Google Scholar
- Misra, S., A. Agrawal, W.K. Liao, et al., Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing. Bioinformatics, 2011. 27(2): p. 189--195. Google ScholarDigital Library
- Olson, C.B., M. Kim, C. Clauson, et al. Hardware Acceleration of Short Read Mapping. in 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 2012. Google ScholarDigital Library
- Altschul, S.F., W. Gish, W. Miller, et al., Basic Local Alignment Search Tool. Molecular Biology, 1990. 215(3): p. 403 -- 410.Google Scholar
- CoX, A., ELAND: Efficient Local Alignment of Nucleotide Data. (Unpublished), 2007.Google Scholar
- Li, H. and R. Durbin. MAQ. Available from: http://maq.sourceforge.net/.Google Scholar
- Homer, N., B. Merriman, and S.F. Nelson, BFAST: An Alignment Tool for Large Scale Genome Resequencing. PLoS ONE, 2009. 4(11): p. e7767.Google ScholarCross Ref
- Kurtz, S., A. Phillippy, A.L. Delcher, et al., Versatile and open software for comparing large genomes. Genome Biology, 2004. 5(2): p. R12.Google Scholar
- Langmead, B., C. Trapnell, M. Pop, et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 2009. 10(3): p. 1--10.Google Scholar
- Kent, W.J., BLAT--The BLAST-like Alignment Tool. Genome Research, 2002. 12(4): p. 656--664.Google Scholar
- Smith, A.D., Z. Xuan, and M.Q. Zhang, Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC bioinformatics, 2008. 9(128): p. 1471--2105.Google Scholar
- Rumble, S.M., P. Lacroute, A.V. Dalca, et al., SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Comput Biol, 2009. 5(5): p. e1000386.Google ScholarCross Ref
- Chen, Y., T. Souaiaia, and T. Chen, PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics, 2009. 25(19): p. 2514--2521. Google ScholarDigital Library
- Mosaik. Available from: http://bioinformatics.bc.edu/marthlab/Mosaik.Google Scholar
- Li, H. and R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 2010. 26(5): p. 589--595. Google ScholarDigital Library
- Ning, Z., A.J. Cox, and J.C. Mullikin, SSAHA: a fast search method for large DNA databases. Genome Research, 2001. 11(10): p. 1725--9.Google Scholar
- Kehr, B., D. Weese, and K. Reinert, STELLAR: fast and exact local alignments. BMC bioinformatics, 2011. 12(Suppl 9): p. S15.Google Scholar
- Schatz, M.C., C. Trapnell, A.L. Delcher, et al., High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics, 2007(8): p. 474.Google Scholar
- Trapnell, C. and M.C. Schatz, Optimizing data intensive GPGPU computations for DNA sequence alignment. Parallel Computing 2009. 35(8--9): p. 429--440. Google ScholarDigital Library
- Liu, Y., B. Schmidt, and D. Maskell, CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Res Notes, 2010(3): p. 93.Google Scholar
- Schatz, M.C., CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 2009. 25(11): p. 1363--1369 Google ScholarDigital Library
- Pireddu, L., S. Leo, and G. Zanetti, SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics, 2011. 27(15): p. 2159--2160. Google ScholarDigital Library
- Luethy, R. and C. Hoover, Hardware and software systems for accelerating common bioinformatics sequence analysis algorithms. BIOSILICO, 2004. 2(1): p. 12--17.Google ScholarCross Ref
- Knodel, O., T.B. Preusser, and R.G. Spallek. Next-generation massively parallel short-read mapping on FPGAs. in 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 2011. Google ScholarDigital Library
- Preusser, T.B., O. Knodel, and R.G. Spallek. Short-Read Mapping by a Systolic Custom FPGA Computation. in 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 2012. Google ScholarDigital Library
- Tang, W., W. Wang, B. Duan, et al. Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator. in 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 2012. Google ScholarDigital Library
- Chen, Y., B. Schmidt, and D.L. Maskell. Accelerating short read mapping on an FPGA (abstract only). in Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. 2012. Monterey, California, USA: ACM. Google ScholarDigital Library
Index Terms
- Acceleration of the long read mapping on a PC-FPGA architecture (abstract only)
Recommendations
Accelerating the next generation long read mapping with the FPGA-based system
To compare the newly determined sequences against the subject sequences stored in the databases is a critical job in the bioinformatics. Fortunately, recent survey reports that the state-of-the-art aligners are already fast enough to handle the ultra ...
mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes
Bioinformatics Research and ApplicationsAbstractLong reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new ...
Accelerating short read mapping on an FPGA (abstract only)
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate ArraysThe explosive growth of short read datasets produced by high throughput DNA sequencing technologies poses a challenge to the mapping of short reads to a reference genome in terms of sensitivity and execution speed. Existing methods often use a ...
Comments