11 Assemble the ERR531380 Genome
The next step in the investigation would normally be to assemble the ERR531380 genome and assess the assembly quality.
Using Galaxy, genome assembly would take too long for this workshop.
In this scenario, a colleague has kindly assembled your genome for you using the SPAdes assembler (Prjibelski et al. (2020)) and provided the result for you to download in two files, below - the assembled genome (.fasta file), and the assembly graph (.gfa file):
- Download the assembled genome and assembly graph to the local folder containing your workshop files
- Upload both files to
Galaxy
ERR531380 assembly files
- Click on the uploaded
ERR531380.contigs.fastafile
- Has the genome been assembled into a single sequence?
- How many sequences (called contigs) was the genome assembled into?
In this section you will investigate the quality of this genome assembly visually using the Bandage tool. You will also use the CheckM tool to obtain a measure of genome completeness.
11.1 Visualising the Assembly
The Bandage software package (Wick et al. (2015)) can take assembly output from tools such as SPAdes and Shovill, and visualise them as a graph. This is a useful step in assessing the quality of an assembly and can help identify poorly-assembled regions and when there is potential unintentional sequencing of co-cultures of related strains, rather than an axenic isolate.
To use Bandage to visualise your genome assembly
- Navigate to the
Bandage Imagetool using theToolssidebar in Galaxy - Select the
Bandage Imagetool - Make sure that you have selected the
.gfafile as theGraphical Fragment Assemblyinput - Click
Run Tool
ERR531380 assembly using Bandage Image
eye icon for the Assembly Graph Image output.
Each separate subgraph in the Bandage graph image shows a section of the genome that the assembly software thinks might be linked together in some way. The branched and looped nature of these graphs shows where the assembler has been unable to make a decision between two or more ways the reads could be joined together.
Each differently-coloured section of the subgraph is a part of the genome sequence that the assembler was confident about stitching together into a contiguous sequence.
This kind of genome assembly, comprising many fragments, some of which can’t be confidently linked together, is called a draft genome assembly.
11.2 Estimating Assembly Quality
Bandage gives a visual account of the “quality” of an assembly in terms of its contiguity. By contrast, the CheckM software (Parks et al. (2015)) is a suite of tools used to assess the quality of a bacterial assembly in terms of whether it resembles known similar genome sequences.
CheckM estimates the completeness of the genome by looking for the presence of single copy marker genes (SCMGs) for a stated phylogenetic lineage. The more of these marker genes that can be determined to be present in the genome, the more complete the assembly is assumed to be.
The use CheckM to estimate the quality of your assembly:
- Navigate to the
CheckM taxonomy_wftool using theToolssidebar in Galaxy - Select the Taxonomic rank of
Speciesand set theTaxon of interestto be Pseudomonas aeruginosa (the identity of the isolate is known, or at least strongly suspected, on the basis of polyphasic tests) - Under
Data structure for binsselectIn individual datasetsand choose theERR531380.contigs.fastafile - Click on
Run Tool
CheckM
11.3 Next steps
With the draft genome assembly you have for ERR531380 you can use pubMLST to classify your isolate in more detail, using Galaxy and the pubMLST server.