Phylogenetic tree reconstruction

Distance methods

Phylogenetic trees are reconstructed by iteratively grouping species with small distance. Distance methods involve two steps. In the first step, pairwise distances of sequences are calculated. In the second step, build a phylogenetic tree from the pairwise distances. The most popular distance method is neighbor-joining (NJ).

Parsimony methods

The parsimony score of a phylogenetic tree is equal to the number of mutations required to explain the nucleotide variation observed among sequences. Parsimony methods reconstruct phylogenetic trees by minimizing the parsimony score, i.e., finding the tree with the minimum parsimony score.

Maximum likelihood

The phylogenetic tree is estimated by maximizing the likelihood function $L(T,\theta|D)$ with respect to the tree $T$ and parameters $\theta$ in the substitution model. Since the analytic solution is intractable, phylogenetic programs implement different optimization algorithms to find maximum likelihood trees. The popular maximum likelihood phylogenetic programs include RAxML, PHYML, PHYLIP, PAML.

Uncertainty of tree estimates

Uncertainty of a tree estimate is defined as the variation of the tree estimateacross samples. 1) using bootstrap techniques to generate bootstrap samples, 2) calculating the tree estimate for each bootstrap sample, 3) calculatingthe variation of tree estimates across samples.There are two bootstrap techniques and the difference lies in how the sam-ples are generated. In nonparametric bootstrapping, samples are generated by resampling with replacement the original data. In contrast, parametric bootstrapping generates samples from the model. For each bootstrap sample,we calculate the tree estimate. We use the consensus method to summarize bootstrap trees. We find the groups that are supported by majority bootstrap trees and build the consensus tree from those groups. The support values ofthe groups in the consensus tree is called bootstrap support values.

Bayesian methods

In the Bayesian phylogenetic model, the likehood function is the same asthat used in the ML methods. In addition, the model assumes a uniformprior for the tree topology (i.e., all topologies are equally likely), independent exponential priors for branch lengths, and uniform priors for substitution model parameters. The posterior distribution of the phylogenetic tree is approximated by the MCMC algorithm. The burnin samples are discarded and we subsample every 1000. The Bayesian phylogenetic programs include MrBayes and BEAST.

Phylogenetic tree reconstruction

Phylogenomics