|
ABSTRACT
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein, 1981), but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model - three taxa, two state characters, under a molecular clock (MC). Quoting Ziheng Yang (2000), who initiated the analytic approach, "this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation".In this work, we give analytic solutions for four taxa, two state characters under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the MC case are developed.Four taxa rooted trees have two topologies -- the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of MC trees with the Hadamard conjugation (Hendy and Penny, 1993) to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations. We finally use tools from algebraic geometry (e.g. Grobner bases, ideal saturation, resultants) and employ symbolic algebra software to obtain closed form analytic solutions (expressed parametrically in the input data) for the fork topology, and analytic solutions for the comb. We show that in contrast to the fork, the comb has no closed form solutions (expressed by radicals in the input data). In general, four taxa trees can have multiple ML points (Steel, 1994, Chor et. al., 2001). In contrast, we can now prove that under the MC assumption, both the fork and the comb topologies have a unique (local and global) ML point.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Aho, Y. Sagiv, T. Szymanski, and J. Ullman. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal of Computing, 10(3):405--421, 1981.
|
| |
2
|
|
| |
3
|
A. Ben-Dor, B. Chor, D. Graur, R. Ophir, and D. Pelleg. Constructing phylogenies from quarbcgoptets: Elucidation of eutherian superordinal relationships. Jour. of Comput. Biology, 5(3):377--390, 1998.
|
| |
4
|
B. Chor, M. Hendy, B. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: An analytic approach. MBE, 17(10):1529--1541, 2000. Earlier version appeared in RECOMB 2000.
|
| |
5
|
|
| |
6
|
J. Felsenstein. PHYLIP - phylogenetic inference package, (version 3.2). Cladistics, 5:164--166, 1989.
|
| |
7
|
K. Fukami and Y. Tateno. On the uniqueness of the maximum likelihood method for estimating molecular trees: Uniqueness of the likelihood point. J. Mol. Evol., 28:460--464, 1989.
|
| |
8
|
I. Gelfand, M. Kapranov, and A. Zelevinsky. Discriminants, Resultants and Multidimensional Determinants. Birkhauser, Boston, 1994.
|
| |
9
|
D. Graur and W. Li. Fundamentals of Molecular Evolution. Sinauer Assoc., 2000.
|
| |
10
|
M. D. Hendy and D. Penny. Spectral analysis of phylogenetic data. J. Classif., 10:5--24, 1993.
|
| |
11
|
M. D. Hendy, D. Penny, and M. Steel. Discrete fourier analysis for evolutionary trees. Proc. Natl. Acad. Sci. USA., 91:3339--3343, 1994.
|
| |
12
|
|
| |
13
|
Tandy Warnow , Bernard M. E. Moret , Katherine St. John, Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.196-205, January 07-09, 2001, Washington, D.C., United States
|
| |
14
|
T. Jukes and C. Cantor. Evolution of protein molecules. In H. Munro, editor, Mammalian Protein Metabolism, pages 21--132. Academic Press, New York, 1969.
|
 |
15
|
|
| |
16
|
M. Kimura. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, 1983.
|
| |
17
|
J. Neymann. Molecular studies of evolution: A source of novel statistical problems. In S. Gupta and Y. Jackel, editors, Statistical Decision Theory and Related Topics, pages 1--27. Academic Press, New York, 1971.
|
| |
18
|
|
| |
19
|
V. Ranwez and O. Gascuel. Quartet-based phylogenetic inference: Improvements and limits. Molecular Biology and Evolution, 18:1103--1116, 2001.
|
| |
20
|
M. Ruvolo. Molecular phylogeny of the hominoids: Inferences from multiple independent dna sequence data sets. Molecular Biology and Evolution, 14(7):248--265, 1997.
|
| |
21
|
|
| |
22
|
M. Steel. The complexity of reconstructing trees from qualitative characters and subtress. Journal of Classification, 9(1):91--116, 1992.
|
| |
23
|
M. Steel. The maximum likelihood point for a phylogenetic tree is not unique. Syst. Biology, 43(4):560--564, 1994.
|
| |
24
|
M. Steel, A. Dress, and S. Boker. Simple but fundamental limitations on supertree and consensus tree methods. Systematic Biology, 49:363--368, 2000.
|
| |
25
|
K. Strimmer and A. von Haeseler. Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Molecular Biology and Evolution, 13(7):964--969, 1996. Software available at ftp://ftp.ebi.ac.uk/pub/software/unix/puzzle/.
|
| |
26
|
J. Thompson, D. Higgins, and T. Gibson. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalty and weight matrix choice. Nucleic Acids Research, 22:4673--4780, 1994.
|
| |
27
|
E. Tillier. Maximum likelihood with multiparameter models of substitution. J. Mol. Evol., 39:409--417, 1994.
|
| |
28
|
Z. Yang. Complexity of the simplest phylogenetic estimation problem. Proc. R. Soc. Lond. B, 267:109--119, 2000.
|
REVIEW
"Angele M. Hamel : Reviewer"
Maximum likelihood is a well-known technique for determining phylogenetic trees. In this paper, the authors consider maximum likelihood as a constrained optimization problem, and explore techniques for solving it on four taxa, two state characters
more...
|