ABSTRACT
In the context of genome assembly, the contig orientation problem is described as the problem of removing sufficient edges from the scaffold graph so that the remaining subgraph assigns a consistent orientation to all sequence nodes in the graph. This problem can also be phrased as a weighted MAX-CUT problem. The performance of MAX-CUT heuristics in this application is untested. We present a greedy heuristic solution to the contig orientation problem and compare its performance to a weighted MAX-CUT semi-definite programming heuristic solution on several graphs. We note that the contig orientation problem can be used to identify inverted repeats and inverted haplotypes, as these represent sequences whose orientation appears ambiguous in the conventional genome assembly framework.
- S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389--3402, 1997.Google Scholar
- F. Antonacci, J. M. Kidd, T. Marques-Bonet, M. Ventura, P. Siswara, Z. Jiang, and E. E. Eichler. Characterization of six human disease-associated inversion polymorphisms. Human molecular genetics, 18(14):2555--2566, 2009.Google Scholar
- V. Bansal, A. Bashir, and V. Bafna. Evidence for large inversion polymorphisms in the human genome from hapmap data. Genome research, 17(2):219--230, 2007.Google ScholarCross Ref
- S. Batzoglou, D. B. Jaffe, K. Stanley, J. Butler, S. Gnerre, E. Mauceli, B. Berger, J. P. Mesirov, and E. S. Lander. Arachne: a whole-genome shotgun assembler. Genome research, 12(1):177--189, 2002.Google ScholarCross Ref
- P. Bodily, J. Price, M. Clement, and Q. Snell. Scaffoldscaffolder: An aggressive scaffold finishing algorithm. In Proceedings of the 2012 International Conference on Bioinformatics & Computational Biology, pages 385--390. CSREA Press, 2012.Google Scholar
- J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe. Allpaths: de novo assembly of whole-genome shotgun microreads. Genome research, 18(5):810--820, 2008.Google ScholarCross Ref
- A. Dayarian, T. P. Michael, and A. M. Sengupta. Sopra: Scaffolding algorithm for paired reads via statistical optimization. BMC bioinformatics, 11(1):345, 2010.Google ScholarCross Ref
- C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 107--114. IEEE, 2001. Google ScholarDigital Library
- N. Donmez and M. Brudno. Scarpa: scaffolding reads with practical algorithms. Bioinformatics, 29(4):428--434, 2013. Google ScholarDigital Library
- M. X. Goemans and D. P. Williamson. .879-approximation algorithms for max cut and max 2sat. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 422--431. ACM, 1994. Google ScholarDigital Library
- M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115--1145, 1995. Google ScholarDigital Library
- W. Huang, L. Li, J. R. Myers, and G. T. Marth. Art: a next-generation sequencing read simulator. Bioinformatics, 28(4):593--594, 2012. Google ScholarDigital Library
- R. M. Karp. Reducibility among combinatorial problems. Springer, 1972.Google ScholarCross Ref
- S. Khot, G. Kindler, E. Mossel, and R. O'Donnell. Optimal inapproximability results for max-cut and other 2-variable csps? SIAM Journal on Computing, 37(1):319--357, 2007. Google ScholarDigital Library
- J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48--50, 1956.Google ScholarCross Ref
- B. Langmead and S. L. Salzberg. Fast gapped-read alignment with bowtie 2. Nature Methods, 9(4):357--359, 2012.Google ScholarCross Ref
- B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, et al. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol, 10(3):R25, 2009.Google ScholarCross Ref
- R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome research, 20(2):265--272, 2010.Google ScholarCross Ref
- M. W. M. Muskens, A. P. A. Vissers, J. N. M. Mol, and J. M. Kooter. Role of inverted dna repeats in transcriptional and post-transcriptional gene silencing. Plant Molecular Biology, 43(2-3):243--260, 2000.Google ScholarCross Ref
- J. Nijkamp, W. Winterbach, M. van den Broek, J.-M. Daran, M. Reinders, and D. de Ridder. Integrating genome assemblies with maia. Bioinformatics, 26(18):i433--i439, 2010. Google ScholarDigital Library
- N. Okuda. Hapmaker: Synthetic haplotype generator. preprint, 2013.Google Scholar
- M. Pop, D. S. Kosack, and S. L. Salzberg. Hierarchical scaffolding with bambus. Genome research, 14(1):149--159, 2004.Google ScholarCross Ref
- F. Rendl, G. Rinaldi, and A. Wiegele. Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Mathematical Programming, 121(2):307--335, 2010. Google ScholarDigital Library
- D. C. Rio and G. M. Rubin. Identification and purification of a drosophila protein that binds to the terminal 31-base-pair inverted repeats of the p transposable element. Proceedings of the National Academy of Sciences, 85(23):8929--8933, 1988.Google ScholarCross Ref
- S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the ACM (JACM), 23(3):555--565, 1976. Google ScholarDigital Library
- D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.Google ScholarCross Ref
- M. C. Zody, Z. Jiang, H.-C. Fung, F. Antonacci, L. W. Hillier, M. F. Cardone, T. A. Graves, J. M. Kidd, Z. Cheng, A. Abouelleil, et al. Evolutionary toggling of the mapt 17q21. 31 inversion region. Nature genetics, 40(9):1076--1083, 2008.Google ScholarCross Ref
- Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly
Recommendations
Genome Sequence Assembly: Algorithms and Issues
Ultimately, genome sequencing seeks to provide an organism's complete DNA sequence. Automation of DNA sequencing allowed scientists to decode entire genomes and gave birth to genomics, the analytic and comparative study of genomes. Although genomes can ...
Comments