1  Introduction

1.1 Working with phylogenetic trees

When you use a software tool to build, or you otherwise acquire, phylogenetic tree data it will be in a (plain text) file. This file will be in one of a number of more or less cryptic file formats, such as:

  • newick, file ending .new, .nwk, .newick
  • NEXUS, file ending .nex, .nexus
  • phylip, file ending .phy, .phylip

These files can be cryptic to read, as a human. For example, the first newick format file we will work with has these contents:

(((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);

It is difficult, as a human, to read a file like this and understand the structure of the tree it describes, so we almost always use computational tools to open these files and visualise the trees they represent.

Figure 1.1: A genAI representation of “phylogenetic tree as a real tree.”

1.1.1 Phylogenetic tree visualisation software

A number of software packages are available to visualise, edit, and refine phylogenetic tree files for reports and publications. Some of these can be downloaded and run on your own computer, such as:

but the interactive Tree of Life (iTol) service allows you to visualise and edit trees directly in the browser. If you register and sign in, you can save your trees for future use.

Caution

In this workshop we will be using the online iToL service to visualise and interpret trees.

Traditionally, phylogenetic trees have been visualised using independent software tools like these, but the R software ecosystem is a very powerful tool for computational biology and bioinformatics work, encompassing sequence analysis and genomics, transcriptomic and evolutionary analyses, with excellent visualisation capabilities - including for phylogenetic trees.

As R is also a programming language, large parts of the analysis can be automated and made reproducible, which is an advantage over independent tools, and a factor in the growing popularity of this approach. Basic competence and skills in R are in high demand in academia and industry.

Note

You can produce equivalent trees to those you will produce in these exercises using standalone tools like figtree and dendroscope. Some tree visualisations can only be achieved in specialised packages, or with tools like ggtree.

ggtree is an R package that extends the ggplot2 data visualisation package to visualise and annotate phylogenetic trees (or any other tree-like structure!).