11 Assemble the ERR531380
Genome
The next step in the investigation would normally be to assemble the ERR531380
genome and assess the assembly quality.
Using Galaxy
, genome assembly would take too long for this workshop.
In this scenario, a colleague has kindly assembled your genome for you using the SPAdes
assembler (Prjibelski et al. (2020)) and provided the result for you to download in two files, below - the assembled genome (.fasta
file), and the assembly graph (.gfa
file):
- Download the assembled genome and assembly graph to the local folder containing your workshop files
- Upload both files to
Galaxy
ERR531380
assembly files
- Click on the uploaded
ERR531380.contigs.fasta
file
- Has the genome been assembled into a single sequence?
- How many sequences (called contigs) was the genome assembled into?
In this section you will investigate the quality of this genome assembly visually using the Bandage
tool. You will also use the CheckM
tool to obtain a measure of genome completeness.
11.1 Visualising the Assembly
The Bandage
software package (Wick et al. (2015)) can take assembly output from tools such as SPAdes
and Shovill
, and visualise them as a graph. This is a useful step in assessing the quality of an assembly and can help identify poorly-assembled regions and when there is potential unintentional sequencing of co-cultures of related strains, rather than an axenic isolate.
To use Bandage
to visualise your genome assembly
- Navigate to the
Bandage Image
tool using theTools
sidebar in Galaxy - Select the
Bandage Image
tool - Make sure that you have selected the
.gfa
file as theGraphical Fragment Assembly
input - Click
Run Tool
ERR531380
assembly using Bandage Image
eye
icon for the Assembly Graph Image output.
Each separate subgraph in the Bandage
graph image shows a section of the genome that the assembly software thinks might be linked together in some way. The branched and looped nature of these graphs shows where the assembler has been unable to make a decision between two or more ways the reads could be joined together.
Each differently-coloured section of the subgraph is a part of the genome sequence that the assembler was confident about stitching together into a contiguous sequence.
This kind of genome assembly, comprising many fragments, some of which can’t be confidently linked together, is called a draft genome assembly.
11.2 Estimating Assembly Quality
Bandage
gives a visual account of the “quality” of an assembly in terms of its contiguity. By contrast, the CheckM
software (Parks et al. (2015)) is a suite of tools used to assess the quality of a bacterial assembly in terms of whether it resembles known similar genome sequences.
CheckM
estimates the completeness of the genome by looking for the presence of single copy marker genes (SCMGs) for a stated phylogenetic lineage. The more of these marker genes that can be determined to be present in the genome, the more complete the assembly is assumed to be.
The use CheckM
to estimate the quality of your assembly:
- Navigate to the
CheckM taxonomy_wf
tool using theTools
sidebar in Galaxy - Select the Taxonomic rank of
Species
and set theTaxon of interest
to be Pseudomonas aeruginosa (the identity of the isolate is known, or at least strongly suspected, on the basis of polyphasic tests) - Under
Data structure for bins
selectIn individual datasets
and choose theERR531380.contigs.fasta
file - Click on
Run Tool
CheckM
11.3 Next steps
With the draft genome assembly you have for ERR531380
you can use pubMLST to classify your isolate in more detail, using Galaxy
and the pubMLST
server.