Skip to main content
Statistics LibreTexts

20.13: How to get the distances from a distance tree

  • Page ID
    45292
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction

    Extract the patristic distance, the sum of the branch lengths that link two nodes in a tree, for each pair of species.

    This distance — see our Chapter 16.6 – Similarity and Distance— is the proportion \((p)\) of amino acid (or nucleotide for DNA or RNA) sites at which the two sequences to be compared are different. It is obtained by dividing the number of amino acid differences by the total number of sites compared. It does not make any correction for multiple substitutions at the same site or differences in evolutionary rates among sites. On a gene tree (Fig. \(\PageIndex{1}\)), distances are the lengths of the branches connecting the taxa. We want to know, how different are two species for the given protein? That’s the distance between them in proportion of amino acid sites that are different by total number compared.

    Example

    A gene tree of the protein HBA1 with 5 species.
    Figure \(\PageIndex{1}\): A gene tree of the product (protein HBA1) with five species.

    Here’s the Newick format for the tree (HBA1.nwk)

    (Mouse:0.0474516,Human:0.104063,((Chicken:0.127652,Alligator:0.202421):0.0616593,Xenopus:0.422801):0.143939);
    

    R code to extract distances and output sorted, pairwise comparisons to a text file:

    library(ape)
    
    # Create a function
    getDis <- function(tree, tips) {
         myTree <- cophenetic(tree)
         myTree <- myTree[,tips]
         xy <- t(combn(colnames(myTree), 2))
         xy <- xy[order(xy[,1], xy[,2]),]
         myOut <- data.frame(xy, myTree[xy])
         colnames(myOut) <- c("Spp1", "Spp2", "Distance")
       return(myOut)
    }
    
    # Read a tree file, Newick format
    tree5 <-read.tree(text="(Mouse:0.0474516,Human:0.104063,((Chicken:0.127652,Alligator:0.202421):0.0616593,Xenopus:0.422801):0.143939);")
    
    # get taxa names from the tree file
    all.tips <- tree5$tip.label; all.tips
    
    # Run the function
    myDis <- getDis(tree5, all.tips)
    
    # Check the output
    head(myDis)
    
    # Create the results file
    write.csv(myDis, file = "my_out.txt")

    Example output from head(myDist)

           Spp1    Spp2  Distance
    1 Alligator Xenopus 0.6868813
    2 Chicken Alligator 0.3300730
    3 Chicken   Xenopus 0.6121123
    4 Human   Alligator 0.5120823
    5 Human     Chicken 0.4373133
    6 Human     Xenopus 0.6708030
    

    The function sorts first by Spp1, then by Spp2.

    Molecular clock plot

    Collect divergence times from timetree.org

    Spp1 Spp2 Time (median MYA)
    Alligator Xenopus 352
    Chicken Alligator 245
    Chicken Xenopus 352
    Human Alligator 319
    Human Chicken 319
    Human Xenopus 352

    A scatterplot of distance HBA protein sequence by log10-transformed millions of years ago divergence time is shown in Figure \(\PageIndex{2}\). Note that, although tempting, calculating the slope from a linear regression to estimate the rate of evolution would not be appropriate without accounting for the lack of independence of the data (see Phylogenetically independent contrasts). Better methods exist, including calculating rate of change after fitting a model that assumes a strict clock vs relaxed clock.

    Scatterplot of HBA distance vs. log base 10 of divergence time in millions of years ago.
    Figure \(\PageIndex{2}\): Scatterplot of HBA distance by log10(MYA) divergence time

    Questions

    [pending]

    Suggested readings

    Bevan, R. B., Lang, B. F., & Bryant, D. (2005). Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis. Systematic biology, 54(6), 900-915.


    This page titled 20.13: How to get the distances from a distance tree is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Michael R Dohm via source content that was edited to the style and standards of the LibreTexts platform.