skip to main content
article

SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage

Published: 01 October 2008 Publication History

Abstract

Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the em haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.

References

[1]
V. Bafna, B.V. Halldorsson, R. Schwartz, A.G. Clark, and S. Istrail, "Haplotypes and Informative SNP Selection Algorithms: Don't Block Out Information," Proc. Seventh Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '03), pp. 19-27, 2003.
[2]
V. Bafna, S. Istrail, G. Lancia, and R. Rizzi, "Polynomial and APX-Hard Cases of the Individual Haplotyping Problem," Theoretical Computer Science, vol. 335, no. 1, pp. 109-125, 2005.
[3]
P. Bonizzoni, G. Della Vedova, R. Dondi, and J. Li, "The Haplotyping Problem: An Overview of Computational Models and Solutions," J. Computer Science and Technology, vol. 18, no. 6, pp. 675-688, 2003.
[4]
R. Cilibrasi, L. van Iersel, S. Kelk, and J. Tromp, "On the Complexity of Several Haplotyping Problems," Proc. Fifth Int'l Workshop Algorithms in Bioinformatics (WABI '05), pp. 128-139, 2005.
[5]
R. Cilibrasi, L. van Iersel, S. Kelk, and J. Tromp, "On the Complexity of the Single Individual SNP Haplotyping Problem," Algorithmica, in print, 2007.
[6]
The Int'l HapMap Consortium, "A Haplotype Map of the Human Genome" Nature, vol. 437, pp. 1299-1320, 2005.
[7]
M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, "High-Resolution Haplotype Structure in the Human Genome," Nature Genetics, vol. 29, pp. 229-232, 2001.
[8]
R. Grossi, A. Gupta, and J.S. Vitter, "High-Order Entropy-Compressed Text Indexes," Proc. 14th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '03), pp. 841-850, 2003.
[9]
Y. Guo and D.C. Jamison, "The Distribution of SNPS in Human Gene Regulatory Regions," BMC Genomics, vol. 6, no. 140, 2005.
[10]
D. Gusfield and S.H. Orzack, "Haplotype Inference," CRC Handbook on Bioinformatics, chapter 1, pp. 1-25, CRC Press, 2005.
[11]
C.-G. Hur, S. Kim, C.H. Kim, S.H. Yoon, Y.-H. In, C. Kim, and H.G. Cho, "Fasim: Fragments Assembly Simulation Using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing," J. Microbiology and Biotechnology, vol. 16, no. 5, 2006.
[12]
X. Ke, S. Hunt, W. Tapper, R. Lawrence, G. Stavrides, J. Ghori, P. Whittaker, A. Collins, A.P. Morris, D. Bentley, L.R. Cardon, and P. Deloukas, "The Impact of SNP Density on Fine-Scale Patterns of Linkage Disequilibrium," Human Molecular Genetics, vol. 13, no. 6, pp. 577-588, 2004.
[13]
G. Lancia, V. Bafna, S. Istrail, R. Lippert, and R. Schwartz, "SNPs Problems, Complexity, and Algorithms," Proc. Ninth Ann. European Symp. Algorithms (ESA '01), pp. 182-193, 2001.
[14]
E.S. Lander and M.S. Waterman, "Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis," Genomics, vol. 2, pp. 231-239, 1988.
[15]
L. Li, J.H. Kim, and M.S. Waterman, "Haplotype Reconstruction from SNP Alignment," Proc. Seventh Ann. Int'l Conf. Computational Molecular Biology (RECOMB '03), pp. 207-216, 2003.
[16]
L.K. Matukumalli, J.J. Grefenstette, D.L. Hyten, I.-Y. Choi, P.B. Cregan, and C.P. Van Tassell, "Application of Machine Learning in SNP Discovery," BMC Bioinformatics, vol. 7, no. 4, 2006.
[17]
E.W. Myers, "The Fragment Assembly String Graph," Bioinformatics, vol. 21, no. suppl 2, ii79-85, 2005.
[18]
G. Myers, "A Dataset Generator for Whole Genome Shotgun Sequencing," Proc. Seventh Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '99), pp. 202-210, 1999.
[19]
G. Navarro and V. Mäkinen, "Compressed Full-Text Indexes," ACM Computing Surveys, vol. 39, no. 1, 2007.
[20]
A. Panconesi and M. Sozio, "Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction," Proc. Fourth Int'l Workshop Algorithms in Bioinformatics (WABI '04), pp. 266-277, 2004.
[21]
J.K. Pritchard and M. Przeworski, "Linkage Disequilibrium in Humans: Models and Data," Am. J. Human Genetics, vol. 69, pp. 1-14, 2001.
[22]
R. Sachidanandam et al., "A Map of Human Genome Sequence Variation Containing 1.42 Million Single Nucleotide Polymorphisms," Nature, vol. 409, pp. 928-933, Feb. 2001.
[23]
K.A. Frazer et al., "A Second Generation Human Haplotype Map of over 3.1 Million SNPs," Nature, vol. 449, pp. 851-861, Oct. 2007.
[24]
R. Rizzi, V. Bafna, S. Istrail, and G. Lancia, "Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem," Proc. Second Int'l Workshop Algorithms in Bioinformatics (WABI '02), pp. 29-43, 2002.
[25]
J.C. Roach, C. Boysen, K. Wang, and L. Hood, "Pairwise End Sequencing: A Unified Approach to Genomic Mapping and Sequencing," Genomics, vol. 26, no. 2, pp. 345-353, 1995.
[26]
L.-Y. Wu, R.-S. Wang, X.-S. Zhang, and L. Chen, "A Markov Chain Model for Haplotype Assembly from SNP Fragments," Genome Informatics, vol. 17, no. 2, pp. 162-171, 2006.
[27]
R.-S. Wang, L.-Y. Wu, Z.-P. Li, and X.-S. Zhang, "Haplotype Reconstruction from SNP Fragments by Minimum Error Correction," Bioinformatics, vol. 21, no. 10, pp. 2456-2462, 2005.
[28]
Y. Wang, E. Feng, and R. Wang, "A Clustering Algorithm Based on Two Distance Functions for MEC Model," Computational Biology and Chemistry, vol. 31, no. 2, pp. 148-150, 2007.
[29]
M.P. Weiner and T.J. Hudson, "Introduction to SNPs: Discovery of Markers for Disease," Biotechniques, suppl., 2002.
[30]
Y.-Y. Zhao, L.-Y. Wu, J.-H. Zhang, R.-S. Wang, and X.-S. Zhang, "Haplotype Assembly from Aligned Weighted SNP Fragments," Computational Biology and Chemistry, vol. 29, no. 4, pp. 281-287, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 4
October 2008
158 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2008
Published in TCBB Volume 5, Issue 4

Author Tags

  1. Algorithms
  2. Biology and genetics

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Decoding genetic variationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.246236713:3(518-530)Online publication date: 1-May-2016
  • (2015)LGHIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.243035212:6(1255-1266)Online publication date: 1-Nov-2015
  • (2012)Using genetic algorithm in reconstructing single individual haplotype with minimum error correctionJournal of Biomedical Informatics10.1016/j.jbi.2012.03.00445:5(922-930)Online publication date: 1-Oct-2012
  • (2010)ReFHapProceedings of the First ACM International Conference on Bioinformatics and Computational Biology10.1145/1854776.1854802(160-169)Online publication date: 2-Aug-2010

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media