Abstract
With the number of sequenced genomes growing ever larger, it is now common practice to concatenate sequence alignments from several genomic loci as a first step to phylogenetic tree inference. However, as different loci may support different trees due to processes such as gene duplication and lineage sorting, it is important to better understand how commonly used phylogenetic inference methods behave on such "phylogenetic mixtures". Here we shall focus on how parsimony, one of the most popular methods for reconstructing phylogenetic trees, behaves for mixtures of two trees. In particular, we show that (i) the parsimony problem is NP-complete for mixtures of two trees, (ii) there are mixtures of two trees that have exponentially many (in the number of leaves) most parsimonious trees, and (iii) give an explicit description of the most parsimonious tree(s) and scores corresponding to the mixture of a pair of trees related by a single TBR operation.
- E. Allman and J. Rhodes, "The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models," J. Computational Biology, vol. 13, pp. 1101-1113, 2006.Google ScholarCross Ref
- H.-J. Bandelt, P. Forster, B.C. Sykes, and M.B. Richards, "Mitochondrial Portraits of Human Populations Using Median Networks," Genetics, vol. 141, pp. 743-753, 1995.Google Scholar
- H.-J. Bandelt, Median Hulls as Steiner Hulls in Rectilinear and Molecular Sequence Spaces, A. Brandstädt and V.B. Le, eds., WG 2001, pp. 1-7, 2001. Google ScholarDigital Library
- M. Bordewich and C. Semple, "Computing the Hybridization Number of Two Phylogenetic Trees is Fixed-Parameter Tractable," IEEE/ACM Trans. Computational Biology and Bioinformatics, to appear. Google ScholarDigital Library
- D. Cieslik, Steiner Minimal Trees. Kluwer Academic, 1998.Google Scholar
- W. Day, "Computationally Difficult Parsimony Problems in Phylogenetics Systematics," J. Theoretical Biology, vol. 103, pp. 429-438, 1983.Google ScholarCross Ref
- F. Delsuc, H. Brinkmann, and H. Philippe, "Phylogenomics and the Reconstruction of the Tree of Life," Nature Rev. Genetics, vol. 6, pp. 361-375, 2005.Google ScholarCross Ref
- J. Felsenstein, Inferring Phylogenies. Sinauer Assoc., 2004.Google Scholar
- L. Foulds and R. Graham, "The Steiner Problem in Phylogeny is NP-Complete," Advances in Applied Math., vol. 3, pp. 43-49, 1982.Google ScholarDigital Library
- M.R. Garey and D.S. Johnson, "The Rectilinear Steiner Tree Problem is NP-Complete," SIAM J. Applied Math., vol. 32, pp. 826-834, 1977.Google ScholarCross Ref
- J. Hein, "Reconstructing Evolution of Sequences Subject to Recombination Using Parsimony," Math. Biosciences, vol. 98, pp. 185-200, 1990.Google ScholarCross Ref
- J. Hein, T. Jiang, L. Wang, and K. Zhang, "On the Complexity of Comparing Evolutionary Trees," Discrete Applied Math., vol. 71, pp. 153-169, 1996. Google ScholarDigital Library
- M. Hanan, "On Steiner's Problem with Rectilinear Distance," SIAM J. Applied Math., vol. 14, pp. 255-265, 1966.Google ScholarCross Ref
- S. Klav¿ar, H.M. Mulder, and R. ¿krekovski, "An Euler-Type Formula for Median Graphs," Discrete Math., vol. 187, pp. 255-258, 1998. Google ScholarDigital Library
- F. Matsen, E. Mossel, and M. Steel, "Mixed-Up Trees: The Structure of Phylogenetic Mixtures," Bull. of Math. Biology, in press.Google Scholar
- C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003.Google Scholar
- D. ¿tefankovi¿ and E. Vigoda, "Phylogeny of Mixture Models: Robustness of Maximum Likelihood and Non-Identifiable Distributions," J. Computational Biology, vol. 14, pp. 156-189, 2007.Google ScholarCross Ref
- R. Tamassia, "On Embedding a Graph in the Grid with the Minimum Number of Bends," SIAM J. Computing, vol. 16, pp. 421-444, 1987. Google ScholarDigital Library
- F.Y. Wu, "Number of Spanning Trees on a Lattice," J. Physics A: Math. and General, vol. 10, pp. 113-115, 1977.Google ScholarCross Ref
Index Terms
- Maximum Parsimony for Tree Mixtures
Recommendations
Islands of Tractability for Parsimony Haplotyping
We study the parsimony approach to haplotype inference, which calls for finding a set of haplotypes of minimum cardinality that explains an input set of genotypes. We prove that the problem is APX-hard even in very restricted cases. On the positive side,...
A Preprocessing Procedure for Haplotype Inference by Pure Parsimony
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective ...
Comments