20.11: Plot a Newick tree

    The phrase “paradigm shift”, attributed to Kuhn (1962, see Wikipedia), may be well-worn and even abused today (Naughton 2012), but the shift in thinking from essential types and group thinking (essentialism) to viewing species as varying individuals in populations (populating thinking) revolutionized biology (O’Hara 1997, Sandvik 2008). Tree thinking is the manifestation of Charles Darwin’s “descent with modification” metaphor (Gregory 2008). Thus, every biology student should have ability to work with, and interpret, phylogenetic trees (tree thinking). The subject of creating and working with phylogenetic graphs is complicated with an extensive library. A good review is available from Holder and Lewis (2003) and readers should know Felsenstein’s book (2004).

    Here, I include a modest, incomplete primer on working with trees in R.

    • Loading the tree file
    • Change tip names
    • Write tip names to a text file
    • Plot the tree as phylogram or cladogram
    • Get node labels
    • Re-root the tree
    • Write a tree to a file

    I assume that the student already has a set of species or other taxa; has gathered sequences (DNA or protein), aligned the sequences, and estimated a gene or phylogeny tree; and wishes to view and manipulate the tree in R. While these kinds of analyses can be done with R and R packages (see Task view: Phylogenetics), other software may be better choice for the student just beginning with phylogenetic tree building (see Unipro UGENE and MEGA, for examples). If the goal is just to view a tree file, or add annotations, then I recommend the iTOL tools.

    Data formats

    Phylogeny and gene trees are special cases of network graphs. Newick format (Wikipedia) is a common but limited representation of the tree which uses parentheses (groupings) and commas (branching). Other formats permit additional information; examples are Nexus file (Wikipedia) and the extension of Nexus to XML, NeXML (Wikipedia), and phyloXML (Wikipedia) formats. Our example uses Newick format.

    Data set

    I’ll use a “time tree” for an example. Tree from, list of species (copy/paste list to a text file, load the text file Load list of of Species, then save the tree as a Newick file).

    Alligator mississippiensis
    Felis catus
    Bos taurus
    Gallus gallus
    Pan troglodytes
    Canis lupus
    Homo sapiens
    Anolis carolinensis
    Macaca mulatta
    Mus musculus
    Didelphis virginiana
    Sus scrofa
    Oryctolagus cuniculus
    Rattus norvegicus

    R code

    Requires the ape package. Phylotools and Phytools packages provide additional handy functions. References for these packages are listed at the end of this page.

    #If tree file, then
    tree14 <- read.tree(file.choose())
    #If no tree file saved, copy the Newick data use text="", replace example tree with your Newick tree
    tree14 <-read.tree(text="((Anolis_carolinensis:279.65697667,(Gallus_gallus:236.50266286,Alligator_mississippiensis:236.50266286)'14':43.15431381)'13':32.24694470,
    #return information about the object

    Output returned by R:

    Phylogenetic tree with 14 tips and 13 internal nodes.
    Tip labels:
    Anolis_carolinensis, Gallus_gallus, Alligator_mississippiensis, Didelphis_virginiana, Felis_catus, Canis_lupus, ...
    Node labels:
    , 13, 14, 27, 29, 19, ...
    Rooted; includes branch lengths.

    Change the tip names. Create a data frame with the tip labels and new tip names.

    timeTreeTips <- tree14$tip.label
    replaceTips <- c("Alligator", "Cat", "Chicken", "Chimpanzee", "Cow", "Dog", "Human", 
    "Lizard", "Macaque", "Mouse", "Opossum", "Pig", "Rabbit", "Rat")
    myDat <- data.frame(timeTreeTips,replaceTips)
    ntree14<- sub.taxa.label(tree14,myDat)

    Collect and write the tip names to a text file

    #Extract tips from newick file, write to text file
    require(ape) <- sort(tree14$tip.label)
    #option 1
    #option 2
    my_conn = file("outfile.txt")

    Next, make the plot.


    Result, a simple phylogram, i.e., a tree diagram with branching patterns and branch lengths proportional to amount of character change.

    Phylogram plot of 14 taxa, with rectangular branching pattern and branch lengths proportional to amount of character change.
    Figure \(\PageIndex{1}\): Phylogram plot of 14 taxa.

    Or, change from default “phylogram” to “cladogram” view.

    plot(tree14, type="cladogram")
    Cladogram plot of the same 14 taxa, with triangular branching pattern and branch lengths proportional to amount of character change.
    Figure \(\PageIndex{2}\): Cladogram view of the same 14 taxa.

    Note that while the tree is rooted, it’s a midpoint rooting, the default setting in Newick files. For true root based on outgroup(s), identify the nodes, then select root.

    Add node labels; plot() must be run first.

    The cladogram from Figure 2 with labeled nodes.
    Figure \(\PageIndex{3}\): Plot of tree with labeled nodes.

    The outgroup(s) were the reptiles (Alligator, Chicken, Lizard), so reroot at node 16.

    Tree rerooted at the node that in Figure 3 was labeled node 16.
    Figure \(\PageIndex{4}\): Re-rooted tree.

    To write the tree to a file:


    To export tree to Newick format

    write.tree(tree14, file = "filename.nwk")

    for Nexus format, file = "filename.nex")

    Star phylogeny

    Collapse the tree to a star phylogeny, an unlikely evolutionary model in which the species resulted from “… a single explosive adaptive radiation” (Felsenstein 1985). Star phylogeny is an extreme tree shape, or multifurcation (polytomy), where all tips derive from the same node (Colijn and Plazzotta 2018). This type of phylogeny can be viewed as a null model for inference (but see Bayesian “star phylogeny paradox,” cf. Kolaczkowski and Thornton 2006).

    Star phylogeny of the 14 taxa.
    Figure \(\PageIndex{5}\): Star phylogeny.

    Under a star phylogeny model, all taxa are assumed independent of each other, in contrast to the nested hierarchical model of evolution (e.g., Fig. \(\PageIndex{4}\)), which shows a lack of independence among the taxa. More succinctly, comparisons fitted to uncorrected taxa may violate the assumption that errors are independent and identically distributed. Phylogenetically correct methods attempt to address the lack of independence among taxa for comparative analysis (Felsenstein 1985, Uyeda et al 2016). Biologists should know about Felsenstein’s 1985 paper. Felsenstein’s paper created a paradigm shift in how to analyze comparative datasets and has been cited more than ten thousand times (1 August 2023, Google Scholar). To put that number in context, the 1986 paper by Kary Mullis et al., which announced invention of PCR with thermally stable polymerase that has revolutionized molecular biology, has been cited 6721 times over that same period.

    Additional packages of note

    The R package tanggle works with the package ggtree and advantage of the ggplot2 environment. Contains many functions to work with phylogeny graphs including re-rooting and swapping nodes. The package is available from Bioconductor,

    if (!requireNamespace("BiocManager", quietly = TRUE))

    ggtree is also a Bioconductor package, not available at CRAN.

    Online viewers

    Many browser-based tree viewers are available online, including and iTOL tools. Additional tree viewers listed at Wikipedia.

