1 Microbial Ecology
This section introduces key concepts of how we use molecular techniques to identify and quantify the composition of microbial communities, to study microbial ecology.
This is accompanying material to give additional background and explanation for the workshop. You don’t need to read this for the workshop itself as we will be covering the material in person, but we hope you find it helpful.
1.1 What is microbial ecology?
Microbial ecology is the scientific study of natural microbial communities, which are critical to many important processes including:
- airborne microbial communities (Šantl-Temkiv et al. 2022)
- environmental bioremediation of pollutants (Xu et al. 2023)
- carbon sequestration in the ocean (Castillo et al. 2022)
- seasonal variation of carbon cycles in soil (Poppeliers et al. 2022)
- decomposition of animals (Mason, Taylor, and DeBruyn 2023)
as well as clinical contexts, such as:
- pathogenesis and nutrient processing in the vertebrate gut (Foster-Nyarko and Pallen 2022)
- cancer immunotherapy responses (Simpson et al. 2022)
- oral disease (Zhang, Whiteley, and Lewin 2022)
- infant health (Qi et al. 2022)
It is a hugely important, but arguably underdeveloped area of biology. Important questions remain to be completely answered in detail for many biological systems:
- what microbes are found in a community?
- what proportion of each microbe species (or other group) is present in a community?
- what are the physical limits of the community?
- is the community stable (or is it growing, shrinking, or changing composition)?
- do community members share or compete for resources, and which resources?
- do the community members effectively operate as a larger-scale system?
- does the gut microbial community influence the host animal’s nutrition, health, or disease status?
- does the rhizosphere microbial community influence a plant’s ability to extract nutrition or energy from the environment, and contribute to protection against pathogens?
- when two communities come together, what does the resulting community look like?
- are there two or more separate (or interacting communities), or do they somehow combine?
- can we predict which microbes can combine into a productive community?
- can we choose these microbes to engineer a community that achieves a particular goal (e.g. plant protection or gut health)
- can we perturb existing communities to make them beneficial to health or achieve some other goal?
This array of questions can be summarised more simply as (Prosser 2020):
- Who is there?
- What is meant by “there”?
- Who is doing what in there?
- What is the effect of doing [SOMETHING…] to “there”?
1.2 Answering questions about microbial ecology using molecular techniques
Molecular techniques enabled by modern sequencing approaches, such as amplicon marker sequencing (e.g. 16S, ITS1 methods) - also known as metabarcoding - and sequencing the entire genomic DNA of a complex sample - also known as metagenomics - have become the dominant approach to understanding microbial community composition (Boughner and Singh 2016). These tools give rapid insights into the complex composition of microbial communities that are unattainable using laborious, time-consuming culture-based techniques.
The two words are quite similar, but it is important to distinguish between
- metabarcoding: sequencing one or more marker sequences from a population
- metagenomics: deep sequencing of the entire genomic DNA material in a sample, often attempting to assemble representative genomes from the data
In addition to their speed and the scale of data they can produce, these molecular techniques have other advantages:
- The (sequence) data collected can be preserved and shared, alongside metadata describing the experiment, in public databases. This enables reanalysis and integration into larger-scale studies, adding value to the study.
- Fastidious organisms, or those that might be outcompeted in a laboratory culture, can be sequenced and identified.
- Organisms present in low numbers can be sequenced and identified.
These molecular techniques answer the question of “Who is there?”, in two ways:
- They tell us what kinds of organism are present (often to species or genus level)
- They give us a relative count of how much of each kind of organism is present
Taken together, these answers can quantify the diversity of the community.
We will introduce measures of community diversity in a later section.
1.2.1 Which organisms are present?
Both metabarcoding and metagenomics can tell us which microorganisms are present in a biological or environmental sample.
1.2.1.1 Metabarcoding
In metabarcoding, we use the polymerase chain reaction (PCR) and primers that specifically target one or more marker sequences for amplification. A marker here is a gene fragment or other stretch of DNA that we believe to be present in all organisms of interest. For bacteria, this is usually a fragment of the 16S rRNA gene. For fungi and other eukaryotic mocroorganisms it may be a fragment of the ITS1 (Internally Transcribed Spacer 1) region.
These regions of the genome acquire sequence changes that are specific to lineages as organisms evolve over time, such that variants of a marker sequence can often be associated with particular groups of microorganisms, usually at the species or genus level. By amplifying and then sequencing the many variants of the marker sequence there may be in the sample, we can compare these to databases of known sequences, to identify which organisms may be present.
1.2.1.2 Metagenomics
A similar principle applies with metagenomics: we sequence the sample and then compare the resulting sequences to a database containing sequences of known organisms, and use this to identify which of them are present in our sample. This differs from metabarcoding mainly because metagenomics is untargeted - we do not know in advance what sequences we will recover - and generates very large amounts of data. Three of the main consequences of this are:
- We can compare individual sequencing reads from many points on a genome to the database of known sequences. But, due to the large amount of sequence data (in comparison to metabarcoding) we have to do this using special techniques like \(k\)-mer alignment, rather than direct sequence comparison (Wood and Salzberg 2014).
- We can assemble near-complete and complete genome sequences from the data: Metagenome-Assembled Genomes (MAGs). (Setubal 2021).
- We can obtain much more information than simple organism identity, such as the presence or absence of individual genes or gene functions, such as antimicrobial resistance genes (Abreu, Perdigão, and Almeida 2020).
1.2.2 How much of each organism is present?
Whichever method we use to sequence a biological sample, we can obtain a list of organisms that for which there is evidence of presence. This tells us something about the composition of the sample, but not everything. There is clearly a difference between one community that is 50% Escherichia coli and 50% Staphylococcus aureus, and another community that is 99% S. aureus and 1% E. coli. But how can we distinguish between the two (and does it matter)?
In both metabarcoding and metagenomics we can count the number of sequences that correspond to each of the organism types we have identified. For metabarcoding, this might be the count of each amplicon sequence corresponding to a microbial genus. For metagenomics, this might be the number of reads assigned to a microbial species by a tool like Kraken.
All counts of organisms, by both methods, are strictly estimates of the representation of each organism in the sample. There are many factors that influence the way this estimate may very from the actual amount or proportion of any organism in the sample, including:
- DNA quality (Manzari et al. 2020)
- laboratory technique (sample spillover or other handling problems)
- PCR artefacts (especially for low-biomass samples)
- sequence database integrity
- variable accuracy of associating a sequence with an organism or group (taxonomic misassignment)
With metagenomic analyses that result in assembled MAGs, an additional way of estimating the amount of each organism present is possible: we can count the number of reads that contribute to each MAG assembly. These reads are then assigned the same identity as that of the MAG itself.
We refer to the count of each distinct group of microbes as its abundance.
1.3 Describing community composition
The list of organisms identified as present in our sample and the (relative) count of how much of the community is composed of each organism, taken together, describe our measured microbial community composition.