A patristic distance is the sum of all branch lengths connecting two leaves of a phylogenetic tree.
Definition
editPatristic distance: The sum of the lengths of the branches that connect two nodes in a phylogenetic tree, where those nodes are typically terminal nodes representing extant taxa. It is thus an inferred distance (taking into account multiple substitutions) greater than the uncorrected distance directly computed from the number of differences observed between the two corresponding sequences in the alignment.[1]
Etymology
editThe name derives from the ancient greek adjective πατρίς, -ίδος (patris, -idos) meaning from the ancestors. It refers to the fact that, when computing a so-called patristic distance, two leaves are reachable by repeated moves along all branches connecting them to their most recent common ancestral node.
Misc
edit- Path-length distance.
- Connection of internal nodes (not only leaves).
- Link with patrocladogram.
Working example
editLet us assume that we have five elements and the following matrix of pairwise distances between them :
a | b | c | d | e | |
---|---|---|---|---|---|
a | — | 17 | 21 | 31 | 23 |
b | 17 | — | 30 | 34 | 21 |
c | 35 | 35 | — | 28 | 39 |
d | 35 | 35 | 28 | — | 43 |
e | 22 | 22 | 35 | 35 | — |
Upper right half-matrix: observed distances. Lower left half-matrix: patristic distances (yellow cells).
Use to build trees
editThe Fitch-Margoliash least-squares method.
Softwares
edit- Computation under R.
- PAUP*
Select uncorrected distances under the un-weighted least squares criterion: dset distance=p objective=lsfit power=0 The dset command is used to set various options for the distance-based methods. Option "distance=p" specifies the use of "uncorrected sequence distances", i.e., we do not want to correct the observed distances for multiple substitutions. Note that distances are here reported as "substitutions per site". This simply means that the number of differences has been divided by the length of the sequence. You can think of this distance as the fraction of sites that are different between two sequences. The option "objective=lsfit" specifies that we want to reconstruct trees using the least squares optimality criterion. Recall that under least squares we are trying to find the tree that has the smallest possible deviation between the observed pairwise distances and the pairwise distances measured along the tree. (The distance between two taxa measured along the tree is called the "patristic" distance). The overall fit of the tree is found by (1) computing the difference between each observed distance and the corresponding patristic distance, (2) squaring this difference (this way we are sure to obtain a positive number, regardless of whether the observed or the patristic difference is bigger), (3) adding all the squared differences. The option "power=0" specifies that we do not want to weight the squared differences according to branch lengths when computing this fit. For each possible tree topology PAUP* finds the best set of branch lengths, and then computes the "sum of squared errors" as a measure of how well the patristic distances fit the observed, pairwise sequence distances. The best tree is the one with the smallest sum of squared errors. At the end of the run, PAUP* outputs a histogram giving the distribution of the sum of squares for all trees (only three in this case).
- PATRISTIC
See also
editReferences
edit- ^ Philippe, H.; Brinkmann, H.; Lavrov, D. V.; Littlewood, D. T. J.; Manuel, M.; Wörheide, G.; Baurain, D. (2011). Penny, David (ed.). "Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough". PLoS Biology. 9 (3): e1000602. doi:10.1371/journal.pbio.1000602. PMC 3057953. PMID 21423652.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ Fourment M.; Gibbs M. J. (2006). "PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change". BioMed Central Evolutionary Biology. 6: 1–5. doi:10.1186/1471-2148-6-1.
{{cite journal}}
: CS1 maint: unflagged free DOI (link)