skip to main content
10.1145/2506583.2506606acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly

Authors Info & Claims
Published:22 September 2013Publication History

ABSTRACT

In the context of genome assembly, the contig orientation problem is described as the problem of removing sufficient edges from the scaffold graph so that the remaining subgraph assigns a consistent orientation to all sequence nodes in the graph. This problem can also be phrased as a weighted MAX-CUT problem. The performance of MAX-CUT heuristics in this application is untested. We present a greedy heuristic solution to the contig orientation problem and compare its performance to a weighted MAX-CUT semi-definite programming heuristic solution on several graphs. We note that the contig orientation problem can be used to identify inverted repeats and inverted haplotypes, as these represent sequences whose orientation appears ambiguous in the conventional genome assembly framework.

References

  1. S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389--3402, 1997.Google ScholarGoogle Scholar
  2. F. Antonacci, J. M. Kidd, T. Marques-Bonet, M. Ventura, P. Siswara, Z. Jiang, and E. E. Eichler. Characterization of six human disease-associated inversion polymorphisms. Human molecular genetics, 18(14):2555--2566, 2009.Google ScholarGoogle Scholar
  3. V. Bansal, A. Bashir, and V. Bafna. Evidence for large inversion polymorphisms in the human genome from hapmap data. Genome research, 17(2):219--230, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Batzoglou, D. B. Jaffe, K. Stanley, J. Butler, S. Gnerre, E. Mauceli, B. Berger, J. P. Mesirov, and E. S. Lander. Arachne: a whole-genome shotgun assembler. Genome research, 12(1):177--189, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. P. Bodily, J. Price, M. Clement, and Q. Snell. Scaffoldscaffolder: An aggressive scaffold finishing algorithm. In Proceedings of the 2012 International Conference on Bioinformatics & Computational Biology, pages 385--390. CSREA Press, 2012.Google ScholarGoogle Scholar
  6. J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe. Allpaths: de novo assembly of whole-genome shotgun microreads. Genome research, 18(5):810--820, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Dayarian, T. P. Michael, and A. M. Sengupta. Sopra: Scaffolding algorithm for paired reads via statistical optimization. BMC bioinformatics, 11(1):345, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 107--114. IEEE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Donmez and M. Brudno. Scarpa: scaffolding reads with practical algorithms. Bioinformatics, 29(4):428--434, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. X. Goemans and D. P. Williamson. .879-approximation algorithms for max cut and max 2sat. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 422--431. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115--1145, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Huang, L. Li, J. R. Myers, and G. T. Marth. Art: a next-generation sequencing read simulator. Bioinformatics, 28(4):593--594, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. M. Karp. Reducibility among combinatorial problems. Springer, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Khot, G. Kindler, E. Mossel, and R. O'Donnell. Optimal inapproximability results for max-cut and other 2-variable csps? SIAM Journal on Computing, 37(1):319--357, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48--50, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  16. B. Langmead and S. L. Salzberg. Fast gapped-read alignment with bowtie 2. Nature Methods, 9(4):357--359, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, et al. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol, 10(3):R25, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome research, 20(2):265--272, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. W. M. Muskens, A. P. A. Vissers, J. N. M. Mol, and J. M. Kooter. Role of inverted dna repeats in transcriptional and post-transcriptional gene silencing. Plant Molecular Biology, 43(2-3):243--260, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Nijkamp, W. Winterbach, M. van den Broek, J.-M. Daran, M. Reinders, and D. de Ridder. Integrating genome assemblies with maia. Bioinformatics, 26(18):i433--i439, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Okuda. Hapmaker: Synthetic haplotype generator. preprint, 2013.Google ScholarGoogle Scholar
  22. M. Pop, D. S. Kosack, and S. L. Salzberg. Hierarchical scaffolding with bambus. Genome research, 14(1):149--159, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  23. F. Rendl, G. Rinaldi, and A. Wiegele. Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Mathematical Programming, 121(2):307--335, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. C. Rio and G. M. Rubin. Identification and purification of a drosophila protein that binds to the terminal 31-base-pair inverted repeats of the p transposable element. Proceedings of the National Academy of Sciences, 85(23):8929--8933, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the ACM (JACM), 23(3):555--565, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. C. Zody, Z. Jiang, H.-C. Fung, F. Antonacci, L. W. Hillier, M. F. Cardone, T. A. Graves, J. M. Kidd, Z. Cheng, A. Abouelleil, et al. Evolutionary toggling of the mapt 17q21. 31 inversion region. Nature genetics, 40(9):1076--1083, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  1. Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
        September 2013
        987 pages
        ISBN:9781450324342
        DOI:10.1145/2506583

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 September 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial
        • Research
        • Refereed limited

        Acceptance Rates

        BCB'13 Paper Acceptance Rate43of148submissions,29%Overall Acceptance Rate254of885submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader