Module 6
Phylogenetics & Molecular Evolution
Phylogenetic inference reconstructs evolutionary trees from molecular sequences. Four principal approaches β distance (NJ), parsimony, maximum likelihood (IQ-TREE, RAxML), and Bayesian (BEAST2, MrBayes) β trade off speed, accuracy, and model sophistication. This module covers distance corrections, substitution models, model selection, and molecular-clock dating.
1. Distance Corrections
Raw p-distance underestimates true divergence because multiple substitutions can occur at the same site. Jukes-Cantor 1969 (JC69) assumes equal rates:
\[ d_{JC69} \;=\; -\tfrac{3}{4}\ln\!\Bigl(1 - \tfrac{4}{3}p\Bigr) \]
Kimura 2-parameter (K80) distinguishes transitions from transversions. HKY85 and GTR (general time-reversible) allow unequal base frequencies and different substitution rates. For protein, JTT, WAG, LG matrices serve the same role. ModelTest / ModelFinder selects the best-fit model via AIC or BIC.
2. Maximum Likelihood & Bayesian
ML phylogenetics (Felsenstein 1981) scores trees by the likelihood of the observed alignment under the substitution model, then searches tree space via stochastic perturbations (NNI, SPR, TBR). Bootstrap resampling provides confidence at each bipartition. Bayesian MCMC (MrBayes, BEAST2) samples from the posterior over trees + parameters, returning credibility intervals and clade probabilities.
Simulation: Jukes-Cantor & K2P Corrections
Click Run to execute the Python code
Code will be executed with Python 3 on the server
3. Molecular Clocks & Coalescent
A strict molecular clock assumes a constant rate of substitution along all branches; relaxed clocks (Thorne 1998, Drummond 2006) allow rates to vary and can be calibrated with fossil or tip dates. Coalescent theory (Kingman 1982) links demographic history to genealogy: effective population size Nedetermines the timescale of lineage coalescences. BEAST2 and similar integrate molecular clocks + coalescent + substitution models into one Bayesian inference.
4. Ancestral Sequence Reconstruction
Given an aligned tree, ancestral-sequence reconstruction (Yang 1995, PAML, HyPhy) maps the likelihood of nucleotide/AA states at internal nodes. Resurrected ancestral proteins have been experimentally validated β a spectacular closed-loop test of phylogenetic modelling. Liberles 2007 and Harms & Thornton 2014 review the approach.
Key References
β’ Felsenstein, J. (1981). βEvolutionary trees from DNA sequences: a maximum likelihood approach.β J. Mol. Evol., 17, 368β376.
β’ Nguyen, L. T. et al. (2015). βIQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.β Mol. Biol. Evol., 32, 268β274.
β’ Bouckaert, R. et al. (2019). βBEAST 2.5.β PLOS Comput. Biol., 15, e1006650.
β’ Yang, Z. (2007). βPAML 4: phylogenetic analysis by maximum likelihood.β Mol. Biol. Evol., 24, 1586β1591.